Skip to main content

PIG : Reading data from file


To read the data from a file we can use the LOAD command. Assume there is a file named player.csv (downloaded public dataset of english premier league player from one of the open data set).

Sample Data from player.csv file

Player id,Player,Position,Number,Club,Club (country),D.O.B,Age,Height (cm),Country,Caps,International goals,Plays in home country
336722,Alan PULIDO,Forward,11,Tigres UANL,Mexico,08.03.1991,23,176,Mexico,5,4,TRUE
368902,Adam TAGGART,Forward,9,Newcastle United Jets FC,Australia,02.06.1993,21,172,Australia,4,3,TRUE
362641,Reza GHOOCHANNEJAD,Forward,16,Charlton Athletic FC,England,20.09.1987,26,181,Iran,13,9,FALSE

Pig script to load the data. We must specify the record structure of the file.

grunt> player_data  = LOAD 'players.csv'
       USING PigStorage(',')
       AS
       (player_id:int,
       player:chararray,
       position:chararray,
       number:int,
       club:chararray,
       club_country:chararray,
       d_o_b:chararray,
       age:int,
       height_cm:int,
       country:chararray,
       caps:chararray,
       international_goals:chararray,
       plays_home_country:chararray);

grunt> DUMP player_data;

Sample Output

(380000,Marcelo BROZOVIC,Midfielder,14,GNK Dinamo Zagreb,Croatia,16.11.1992,21,180,Croatia,0,0,TRUE)
(380009,Luis LOPEZ,Goalkeeper,1,Real Espana,Honduras,13.09.1993,20,182,Honduras,0,0,TRUE)
(379910,Adnan JANUZAJ,Midfielder,20,Manchester United FC,England,05.02.1995,19,180,Belgium,0,0,FALSE)



Comments

Popular posts from this blog

UNIX : How to get record count from zipped file

Sometimes we may need to get records count from file . For that we can use wc -l , command with file name. In some situation the file will be in compressed format . wc -l will not directly work with zipped files . In this case we can do zcat the file and pipe the word count command with it. Example : Let say we have a file cricketData.dat.gz To get word count from the file use : zcat cricketData.dat.gz | wc -l This will give the record count.

Excel : How to pad zeros

Today I got a requirement to format the number in excel cell - to left pad number with zeros.i find the following function very useful to do it. In case one to make the number left padded with "0" s give the formula =TEXT(A1,"0000") In case two even more enhanced form to make it left padded with "0" and add two decimal places give the formula as =TEXT(A2,"0000.00")

Scala

Scala is a object oriented functional type programing language. All variables declared in scala is considered as objects.