Sankara's Big Data Notes: Log the INFO in a file using Pyspark with PyCharm

Saturday, 23 May 2020

Log the INFO into a file in linux

demo.py:

from pyspark.sql import SparkSession

def createsparkdriver():

spark = SparkSession.builder.master("local").appName("demoApp").getOrCreate()

return spark

programming.py:

from demo import createsparkdriver

import logging #logging

if __name__ == "__main__":

logging.basicConfig(filename="/home/hadoop/logginghere/mylog.log", level=logging.INFO)

spark = createsparkdriver()

logging.info("Spark Driver Created Successfully")

logging.info("Reading input paramaters")

file_format = input("Enter the file format\t : ")

file_path = input("Enter the input file path\t : ")

logging.info("Reading file")

df = spark.read.format(file_format).option("multiline", True).load(file_path)

df.show()

#write the output into parquet file

df.write.format("parquet").mode("overwrite").save("/home/hadoop/logginghere/dfout")

spark.stop()

logging.info("Program complete")

Enter the file format : csv

Enter the input file path : hdfs://localhost:9000/SparkFiles/person.csv

+-----+---+---+----+

| _c0|_c1|_c2| _c3|

+-----+---+---+----+

| Ravi| 23| M|5000|

|Rahul| 24| M|6300|

| Siva| 22| M|3200|

+-----+---+---+----+

hadoop@hadoop:~/logginghere$ cat mylog.log

INFO:root:Spark Driver Created Successfully

INFO:root:Reading input paramaters

INFO:root:Reading file

INFO:root:Program complete

Sankara's Big Data Notes