Header information not found initially:
scala> val df = spark.read.format("csv").load("hdfs://localhost:9000/SparkFiles/person.csv")
df: org.apache.spark.sql.DataFrame = [_c0: string, _c1: string ... 2 more fields]
scala> df.show() #Headers have _c0, _c1 etc.,
+-----+---+---+----+
| _c0|_c1|_c2| _c3|
+-----+---+---+----+
| Ravi| 23| M|5000|
|Rahul| 24| M|6300|
| Siva| 22| M|3200|
+-----+---+---+----+
#Applied Headers
hadoop@hadoop:~/scratch$ cat person.csv
Name,Age,Gender,Salary
Ravi,23,M,5000
Rahul,24,M,6300
Siva,22,M,3200
#Deleting old file
hadoop@hadoop:~/scratch$ hdfs dfs -rm /SparkFiles/person.csv
Deleted /SparkFiles/person.csv
hadoop@hadoop:~/scratch$ hdfs dfs -copyFromLocal person.csv /SparkFiles
hadoop@hadoop:~/scratch$ hdfs dfs -cat /SparkFiles/person.csv
Name,Age,Gender,Salary
Ravi,23,M,5000
Rahul,24,M,6300
Siva,22,M,3200
scala> val df = spark.read.format("csv").option("header","True").load("hdfs://localhost:9000/SparkFiles/person.csv")
df: org.apache.spark.sql.DataFrame = [Name: string, Age: string ... 2 more fields]
scala> df.show()
+-----+---+------+------+
| Name|Age|Gender|Salary|
+-----+---+------+------+
| Ravi| 23| M| 5000|
|Rahul| 24| M| 6300|
| Siva| 22| M| 3200|
+-----+---+------+------+
#write the dataframe into Hive table in rocks database
scala> df.write.saveAsTable("rocks.employee")
hadoop@hadoop:~$ hive
hive> show databases;
OK
default
rocks
hive> show tables;
OK
employee
person
hive> select * from employee;
OK
Ravi 23 M 5000
Rahul 24 M 6300
Siva 22 M 3200
Earlier we did the below one time activity:
#Downloaded mysql connector and put that jar into spark/jars folder
hadoop@hadoop:~/spark-3.0.0-preview2-bin-hadoop3.2/jars$ wget https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.20/mysql-connector-java-8.0.20.jar
Copied hive/conf/hive-site.xml into : spark/conf
cp /home/hadoop/apache-hive-3.1.2-bin/conf/hive-site.xml //home/hadoop/spark-3.0.0-preview2-bin-hadoop3.2/conf
hive-site.xml configuration used :
hive-site.xml:
--------------
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true</value>
<description>metadata is stored in a MySQL server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>MySQL JDBC driver class</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hiveuser</value>
<description>user name for connecting to mysql server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>mypassword</value>
<description>password for connecting to mysql server</description>
</property>
</configuration>
No comments:
Post a Comment