Data Storage Objects in Spark:
Spark Core : RDD (Unstructured files)
SparkSQL : DataFrame, DataSet (Semi, Structured files)
SparkStreaming : DStream (Streaming Applications)
Spark MLLib : Vectors
Spark GraphX : Graph Objects
Subscribe to:
Post Comments (Atom)
Flume - Simple Demo
// create a folder in hdfs : $ hdfs dfs -mkdir /user/flumeExa // Create a shell script which generates : Hadoop in real world <n>...
-
How to fetch Spark Application Id programmaticall while running the Spark Job? scala> spark.sparkContext.applicationId res124: String = l...
-
// Lead Example // Lead means Next row's salary value spark.sql("SELECT id, fname,lname, designation, technology,salary, LEAD(sal...
-
from pyspark.sql import SparkSession spark = SparkSession.builder.appName("LondonCrimes").getOrCreate() data = spark.read.format(...
No comments:
Post a Comment