Saturday, 23 May 2020

Log the INFO in a file using Pyspark with PyCharm - Log configuration

Log the INFO into a file in linux

demo.py:

from pyspark.sql import SparkSession


def createsparkdriver():
    spark = SparkSession.builder.master("local").appName("demoApp").getOrCreate()
    return spark

programming.py:

from demo import createsparkdriver
import logging #logging

if __name__ == "__main__":
    logging.basicConfig(filename="/home/hadoop/logginghere/mylog.log", level=logging.INFO)
    spark = createsparkdriver()
    logging.info("Spark Driver Created Successfully")

    logging.info("Reading input paramaters")
    file_format = input("Enter the file format\t : ")
    file_path = input("Enter the input file path\t : ")

    logging.info("Reading file")
    df = spark.read.format(file_format).option("multiline", True).load(file_path)
    df.show()

    #write the output into parquet file
    df.write.format("parquet").mode("overwrite").save("/home/hadoop/logginghere/dfout")
   
    spark.stop()
    logging.info("Program complete")


Enter the file format : csv
Enter the input file path : hdfs://localhost:9000/SparkFiles/person.csv
+-----+---+---+----+
|  _c0|_c1|_c2| _c3|
+-----+---+---+----+
| Ravi| 23|  M|5000|
|Rahul| 24|  M|6300|
| Siva| 22|  M|3200|
+-----+---+---+----+



hadoop@hadoop:~/logginghere$ cat mylog.log
INFO:root:Spark Driver Created Successfully
INFO:root:Reading input paramaters
INFO:root:Reading file
INFO:root:Program complete

No comments:

Post a Comment

Flume - Simple Demo

// create a folder in hdfs : $ hdfs dfs -mkdir /user/flumeExa // Create a shell script which generates : Hadoop in real world <n>...