To read the data from a file we can use the LOAD command. Assume there is a file named player.csv (downloaded public dataset of english premier league player from one of the open data set).
Sample Data from player.csv file
Player id,Player,Position,Number,Club,Club (country),D.O.B,Age,Height (cm),Country,Caps,International goals,Plays in home country
336722,Alan PULIDO,Forward,11,Tigres UANL,Mexico,08.03.1991,23,176,Mexico,5,4,TRUE
368902,Adam TAGGART,Forward,9,Newcastle United Jets FC,Australia,02.06.1993,21,172,Australia,4,3,TRUE
362641,Reza GHOOCHANNEJAD,Forward,16,Charlton Athletic FC,England,20.09.1987,26,181,Iran,13,9,FALSE
Pig script to load the data. We must specify the record structure of the file.
grunt> player_data = LOAD 'players.csv' USING PigStorage(',') AS (player_id:int, player:chararray, position:chararray, number:int, club:chararray, club_country:chararray, d_o_b:chararray, age:int, height_cm:int, country:chararray, caps:chararray, international_goals:chararray, plays_home_country:chararray); grunt> DUMP player_data;
Sample Output
(380000,Marcelo BROZOVIC,Midfielder,14,GNK Dinamo Zagreb,Croatia,16.11.1992,21,180,Croatia,0,0,TRUE) (380009,Luis LOPEZ,Goalkeeper,1,Real Espana,Honduras,13.09.1993,20,182,Honduras,0,0,TRUE) (379910,Adnan JANUZAJ,Midfielder,20,Manchester United FC,England,05.02.1995,19,180,Belgium,0,0,FALSE)
Comments
Post a Comment