Skip to main content

PIG : Reading data from file


To read the data from a file we can use the LOAD command. Assume there is a file named player.csv (downloaded public dataset of english premier league player from one of the open data set).

Sample Data from player.csv file

Player id,Player,Position,Number,Club,Club (country),D.O.B,Age,Height (cm),Country,Caps,International goals,Plays in home country
336722,Alan PULIDO,Forward,11,Tigres UANL,Mexico,08.03.1991,23,176,Mexico,5,4,TRUE
368902,Adam TAGGART,Forward,9,Newcastle United Jets FC,Australia,02.06.1993,21,172,Australia,4,3,TRUE
362641,Reza GHOOCHANNEJAD,Forward,16,Charlton Athletic FC,England,20.09.1987,26,181,Iran,13,9,FALSE

Pig script to load the data. We must specify the record structure of the file.

grunt> player_data  = LOAD 'players.csv'
       USING PigStorage(',')
       AS
       (player_id:int,
       player:chararray,
       position:chararray,
       number:int,
       club:chararray,
       club_country:chararray,
       d_o_b:chararray,
       age:int,
       height_cm:int,
       country:chararray,
       caps:chararray,
       international_goals:chararray,
       plays_home_country:chararray);

grunt> DUMP player_data;

Sample Output

(380000,Marcelo BROZOVIC,Midfielder,14,GNK Dinamo Zagreb,Croatia,16.11.1992,21,180,Croatia,0,0,TRUE)
(380009,Luis LOPEZ,Goalkeeper,1,Real Espana,Honduras,13.09.1993,20,182,Honduras,0,0,TRUE)
(379910,Adnan JANUZAJ,Midfielder,20,Manchester United FC,England,05.02.1995,19,180,Belgium,0,0,FALSE)



Comments

Popular posts from this blog

Pokemon Go download link

Pokemon go has become the buzz word in tech industry now. Nintendo, the Japanese video game company are the creates of this game. Its first of it kind to integrate a game with augmented reality , so people has to go out to real world with there android or iPhone to catch em all . You can download  Pokemon go  from this link. 

UNIX : How to ignore lines with certain names

Sometimes we need to ignore multiple lines with certain words and get the list out of the file. usually it will be a log file to read . The below grep command can be used to ignore multiple words present in a text file. Lets say the file contain $ cat list.txt apple orange apple banana papaya Now we need to ignore line with orange , banana and papaya . So we can use the below grep command. $ cat list.txt | grep -Ev "orange|banana|papaya" apple apple It will ignore lines with the words in -v part of grep.