Skip to main content

Hadoop for beginners

I just completed by hadoop fundamentals course from Udemy.com . The videos were very well organized so that you will get the glance of what is this world of big data and how hadoop framework can play a major role is processing this big data. The course was insisting in downloading hortonworks hadoop development sandbox and working with it. Hortonworks are providing the hadoop environment setup to download and we can load it in a virtual machine. I have downloaded the virtual box sandbox file.

The course gave a string insight on hadoop architecture and buzz words around it. It gave a in depth idea of hive and pig tools and how they play the key role in storing and processing data in the framework.

Comments

Popular posts from this blog

UNIX : How to get record count from zipped file

Sometimes we may need to get records count from file . For that we can use wc -l , command with file name. In some situation the file will be in compressed format . wc -l will not directly work with zipped files . In this case we can do zcat the file and pipe the word count command with it. Example : Let say we have a file cricketData.dat.gz To get word count from the file use : zcat cricketData.dat.gz | wc -l This will give the record count.

UNIX : How to ignore lines with certain names

Sometimes we need to ignore multiple lines with certain words and get the list out of the file. usually it will be a log file to read . The below grep command can be used to ignore multiple words present in a text file. Lets say the file contain $ cat list.txt apple orange apple banana papaya Now we need to ignore line with orange , banana and papaya . So we can use the below grep command. $ cat list.txt | grep -Ev "orange|banana|papaya" apple apple It will ignore lines with the words in -v part of grep.

Scala

Scala is a object oriented functional type programing language. All variables declared in scala is considered as objects.