Log the INFO into a file in linux
demo.py:
from pyspark.sql import SparkSession
def createsparkdriver():
spark = SparkSession.builder.master("local").appName("demoApp").getOrCreate()
return spark
programming.py:
from demo import createsparkdriver
import logging #logging
if __name__ == "__main__":
logging.basicConfig(filename="/home/hadoop/logginghere/mylog.log", level=logging.INFO)
spark = createsparkdriver()
logging.info("Spark Driver Created Successfully")
logging.info("Reading input paramaters")
file_format = input("Enter the file format\t : ")
file_path = input("Enter the input file path\t : ")
logging.info("Reading file")
df = spark.read.format(file_format).option("multiline", True).load(file_path)
df.show()
#write the output into parquet file
df.write.format("parquet").mode("overwrite").save("/home/hadoop/logginghere/dfout")
spark.stop()
logging.info("Program complete")
Enter the file format : csv
Enter the input file path : hdfs://localhost:9000/SparkFiles/person.csv
+-----+---+---+----+
| _c0|_c1|_c2| _c3|
+-----+---+---+----+
| Ravi| 23| M|5000|
|Rahul| 24| M|6300|
| Siva| 22| M|3200|
+-----+---+---+----+
hadoop@hadoop:~/logginghere$ cat mylog.log
INFO:root:Spark Driver Created Successfully
INFO:root:Reading input paramaters
INFO:root:Reading file
INFO:root:Program complete
No comments:
Post a Comment