Tuesday, 11 August 2020

Create a Parquet file using DataFrame.Write in Spark with Scala

// Write the output of Dataframe into a parquet file 

scala> jsonDF.write.format("parquet").save("/user/data/customer_parquet")


$ hdfs dfs -ls /user/data/customer_parquet
Found 4 items
-rw-r--r--   1 cloudera supergroup          0 2020-08-11 04:12 /user/data/customer_parquet/_SUCCESS
-rw-r--r--   1 cloudera supergroup        597 2020-08-11 04:12 /user/data/customer_parquet/_common_metadata
-rw-r--r--   1 cloudera supergroup       1263 2020-08-11 04:12 /user/data/customer_parquet/_metadata
-rw-r--r--   1 cloudera supergroup       1550 2020-08-11 04:12 /user/data/customer_parquet/part-r-00000-93a2e236-b70a-4b59-bfa4-37d3b2988b9a.gz.parquet

// use parquet-tools to display the file content

[cloudera@quickstart Ex]$ parquet-tools head -n 5  hdfs://localhost:8020/user/data/customer_parquet/part-r-00000-93a2e236-b70a-4b59-bfa4-37d3b2988b9a.gz.parquet
emailAddress = krish.lee@learningcontainer.com
firstName = Krish
lastName = Lee
phoneNumber = 123456
userId = 1

emailAddress = racks.jacson@learningcontainer.com
firstName = racks
lastName = jacson
phoneNumber = 123456
userId = 2

emailAddress = denial.roast@learningcontainer.com
firstName = denial
lastName = roast
phoneNumber = 33333333
userId = 3

emailAddress = devid.neo@learningcontainer.com
firstName = devid
lastName = neo
phoneNumber = 222222222
userId = 4

emailAddress = jone.mac@learningcontainer.com
firstName = jone
lastName = mac
phoneNumber = 111111111
userId = 5

$ parquet-tools cat -json  hdfs://localhost:8020/user/data/customer_parquet/part-r-00000-93a2e236-b70a-4b59-bfa4-37d3b2988b9a.gz.parquet
{"emailAddress":"krish.lee@learningcontainer.com","firstName":"Krish","lastName":"Lee","phoneNumber":"123456","userId":1}
{"emailAddress":"racks.jacson@learningcontainer.com","firstName":"racks","lastName":"jacson","phoneNumber":"123456","userId":2}
{"emailAddress":"denial.roast@learningcontainer.com","firstName":"denial","lastName":"roast","phoneNumber":"33333333","userId":3}
{"emailAddress":"devid.neo@learningcontainer.com","firstName":"devid","lastName":"neo","phoneNumber":"222222222","userId":4}
{"emailAddress":"jone.mac@learningcontainer.com","firstName":"jone","lastName":"mac","phoneNumber":"111111111","userId":5}

No comments:

Post a Comment

Flume - Simple Demo

// create a folder in hdfs : $ hdfs dfs -mkdir /user/flumeExa // Create a shell script which generates : Hadoop in real world <n>...