PYTHONPATH = $SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.8.1-src.zip
$SPARK_HOME = /home/hadoop/spark-3.0.0-preview2-bin-hadoop3.2
environmental variables
PYTHONUNBUFFERED=1;PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.8.1-src.zip;$SPARK_HOME=/home/hadoop/spark-3.0.0-preview2-bin-hadoop3.2
File - New - Project
File - New - Python Package
go inside the package:
New - Python file (demo.py, programming.py)
demo.py:
from pyspark.sql import SparkSession
def createsparkdriver():
spark = SparkSession.builder.master("local").appName("demoApp").getOrCreate()
return spark
programming.py:
from demo import createsparkdriver
if __name__ == "__main__":
spark = createsparkdriver()
df = spark.read.format("json").option("multiline", True).load("hdfs://localhost:9000/SparkFiles/orgs.json")
df.show()
spark.stop()
Right click - Run
Interactive one:
from demo import createsparkdriver
if __name__ == "__main__":
spark = createsparkdriver()
file_format = input("Enter the file format\t : ")
file_path = input("Enter the input file path\t : ")
df = spark.read.format(file_format).option("multiline", True).load(file_path)
df.show()
spark.stop()
Run it:
Enter the file format : json
Enter the input file path : hdfs://localhost:9000/SparkFiles/orgs.json
No comments:
Post a Comment