Start PyCharm :
hadoop@hadoop:~/pycharm-community-2020.1.1/bin$ ./pycharm.sh
Reading csv and export data into Hive using PyCharm
demo.py:
from pyspark.sql import SparkSession
from pyspark import SparkContext
from pyspark import SparkConf
from pyspark.sql import SQLContext
def createsparkdriver():
spark = SparkSession.builder.master("local").appName("demoApp").\
config("spark.sql.warehouse.dir", "hdfs://localhost:8020/user/hive/warehouse").\
enableHiveSupport().getOrCreate()
return spark
programming.py:
from demo import createsparkdriver
import logging
if __name__ == "__main__":
logging.basicConfig(filename="/home/hadoop/logginghere/mylog.log", level=logging.INFO)
spark = createsparkdriver()
logging.info("Spark Driver Created Successfully")
logging.info("Reading file") #hard coded the file path
df = spark.read.format("csv").option("header",True).load("hdfs://localhost:9000/SparkFiles/person.csv")
df.show()
#write the dataframe into hive table (db : rocks, table : emp )
df.write.saveAsTable("rocks.emp")
spark.stop()
logging.info("Program complete")
Run this program.
+-----+---+------+------+
| Name|Age|Gender|Salary|
+-----+---+------+------+
| Ravi| 23| M| 5000|
|Rahul| 24| M| 6300|
| Siva| 22| M| 3200|
+-----+---+------+------+
hive> use rocks;
hive> show tables;
emp
employee
person
hive> select * from emp;
Ravi 23 M 5000
Rahul 24 M 6300
Siva 22 M 3200
No comments:
Post a Comment