Skip to main content

PIG : Reading data from file


To read the data from a file we can use the LOAD command. Assume there is a file named player.csv (downloaded public dataset of english premier league player from one of the open data set).

Sample Data from player.csv file

Player id,Player,Position,Number,Club,Club (country),D.O.B,Age,Height (cm),Country,Caps,International goals,Plays in home country
336722,Alan PULIDO,Forward,11,Tigres UANL,Mexico,08.03.1991,23,176,Mexico,5,4,TRUE
368902,Adam TAGGART,Forward,9,Newcastle United Jets FC,Australia,02.06.1993,21,172,Australia,4,3,TRUE
362641,Reza GHOOCHANNEJAD,Forward,16,Charlton Athletic FC,England,20.09.1987,26,181,Iran,13,9,FALSE

Pig script to load the data. We must specify the record structure of the file.

grunt> player_data  = LOAD 'players.csv'
       USING PigStorage(',')
       AS
       (player_id:int,
       player:chararray,
       position:chararray,
       number:int,
       club:chararray,
       club_country:chararray,
       d_o_b:chararray,
       age:int,
       height_cm:int,
       country:chararray,
       caps:chararray,
       international_goals:chararray,
       plays_home_country:chararray);

grunt> DUMP player_data;

Sample Output

(380000,Marcelo BROZOVIC,Midfielder,14,GNK Dinamo Zagreb,Croatia,16.11.1992,21,180,Croatia,0,0,TRUE)
(380009,Luis LOPEZ,Goalkeeper,1,Real Espana,Honduras,13.09.1993,20,182,Honduras,0,0,TRUE)
(379910,Adnan JANUZAJ,Midfielder,20,Manchester United FC,England,05.02.1995,19,180,Belgium,0,0,FALSE)



Comments

Popular posts from this blog

Pokemon Go download link

Pokemon go has become the buzz word in tech industry now. Nintendo, the Japanese video game company are the creates of this game. Its first of it kind to integrate a game with augmented reality , so people has to go out to real world with there android or iPhone to catch em all . You can download  Pokemon go  from this link. 

UNIX : How to ignore lines with certain names

Sometimes we need to ignore multiple lines with certain words and get the list out of the file. usually it will be a log file to read . The below grep command can be used to ignore multiple words present in a text file. Lets say the file contain $ cat list.txt apple orange apple banana papaya Now we need to ignore line with orange , banana and papaya . So we can use the below grep command. $ cat list.txt | grep -Ev "orange|banana|papaya" apple apple It will ignore lines with the words in -v part of grep.

BlackBerry Torch 9810/9850

BlackBerry Torch 9850 BlackBerry Torch 9810 BlackBerry has added two new devices to its Torch range with 9810 and 9850.Both devices have 1.2Ghz processor with 768MB RAM, expandable storage via microSD card and features 3G,Wi-Fi, Bluetooth and NFC support along with what BlackBerry calls Liquid Graphics UI.The phone run BlackBerry OS7. which offers improved browsing , a document viewer and DivX/Xvid video playback support out of the box.They also have a 5MP AF cameras with image stabilization and 720p HD video recording.The Torch 9810 has a 3.2-inch touchscreen display, a slide-out QWERTY keyboard, an optical trackpad and 8GB onboard storage.The Torch 9850 comes with a 3.7-inch touchscreen display(BlackBerry's largest so far) along with an optical trackpad and 4GB internal storage.