Sankara's Big Data Notes: PyCharm with PySpark

Saturday, 23 May 2020

PyCharm with PySpark - sample program

PYTHONPATH = $SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.8.1-src.zip

$SPARK_HOME = /home/hadoop/spark-3.0.0-preview2-bin-hadoop3.2

environmental variables

PYTHONUNBUFFERED=1;PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.8.1-src.zip;$SPARK_HOME=/home/hadoop/spark-3.0.0-preview2-bin-hadoop3.2

File - New - Project

File - New - Python Package

go inside the package:

New - Python file (demo.py, programming.py)

demo.py:

from pyspark.sql import SparkSession

def createsparkdriver():

spark = SparkSession.builder.master("local").appName("demoApp").getOrCreate()

return spark

programming.py:

from demo import createsparkdriver

if __name__ == "__main__":

spark = createsparkdriver()

df = spark.read.format("json").option("multiline", True).load("hdfs://localhost:9000/SparkFiles/orgs.json")

df.show()

spark.stop()

Right click - Run

Interactive one:

from demo import createsparkdriver

if __name__ == "__main__":

spark = createsparkdriver()

file_format = input("Enter the file format\t : ")

file_path = input("Enter the input file path\t : ")

df = spark.read.format(file_format).option("multiline", True).load(file_path)

df.show()

spark.stop()

Run it:

Enter the file format : json

Enter the input file path : hdfs://localhost:9000/SparkFiles/orgs.json

Sankara's Big Data Notes

Saturday, 23 May 2020

PyCharm with PySpark - sample program

No comments:

Post a Comment

Flume - Simple Demo