Saturday, 26 January 2019

Load CSV, JSON, XML files into Data Frame

scala> val dfCSV = sqlContext.read.format("csv").load("/home/hadoop/Desktop/emp_data.csv")
cala> val dfJson = sqlContext.read.format("json").load("/home/hadoop/Desktop/olympic.json")
dfJson: org.apache.spark.sql.DataFrame = [age: bigint, athelete: string ... 8 more fields]

scala> val dfXML = sqlContext.read.format("xml").option("rowTag","book").load("/home/hadoop/Desktop/sample.xml")
dfXML: org.apache.spark.sql.DataFrame = [_id: string, author: string ... 5 more fields]



No comments:

Post a Comment

Flume - Simple Demo

// create a folder in hdfs : $ hdfs dfs -mkdir /user/flumeExa // Create a shell script which generates : Hadoop in real world <n>...