Sunday, 24 May 2020

Reading CSV using PySpark and Export into Hive - PyCharm Example

Start PyCharm :
hadoop@hadoop:~/pycharm-community-2020.1.1/bin$ ./pycharm.sh 
Reading csv and export data into Hive using PyCharm

demo.py:
from pyspark.sql import SparkSession
from pyspark import SparkContext
from pyspark import SparkConf
from pyspark.sql import SQLContext


def createsparkdriver():
    spark = SparkSession.builder.master("local").appName("demoApp").\
            config("spark.sql.warehouse.dir", "hdfs://localhost:8020/user/hive/warehouse").\
            enableHiveSupport().getOrCreate()
    return spark

programming.py:

from demo import createsparkdriver
import logging

if __name__ == "__main__":
    logging.basicConfig(filename="/home/hadoop/logginghere/mylog.log", level=logging.INFO)
    spark = createsparkdriver()
    logging.info("Spark Driver Created Successfully")

    logging.info("Reading file") #hard coded the file path
    df = spark.read.format("csv").option("header",True).load("hdfs://localhost:9000/SparkFiles/person.csv")
    df.show()

#write the dataframe into hive table (db : rocks, table : emp )
    df.write.saveAsTable("rocks.emp")
    spark.stop()
    logging.info("Program complete")


Run this program.

+-----+---+------+------+
| Name|Age|Gender|Salary|
+-----+---+------+------+
| Ravi| 23|     M|  5000|
|Rahul| 24|     M|  6300|
| Siva| 22|     M|  3200|
+-----+---+------+------+


hive> use rocks;
 

hive> show tables;

emp
employee
person


hive> select * from emp;

Ravi 23 M 5000
Rahul 24 M 6300
Siva 22 M 3200



No comments:

Post a Comment

Flume - Simple Demo

// create a folder in hdfs : $ hdfs dfs -mkdir /user/flumeExa // Create a shell script which generates : Hadoop in real world <n>...