Tuesday, 26 May 2020

Call Log data program using Pyspark

from pyspark.sql import SparkSession
from pyspark import StorageLevel

spark = SparkSession.builder.master("local").appName("demoApp").getOrCreate()
r1 = spark.sparkContext.textFile("E:\\vow\\calllogdata.txt")
#r2 = r1.map(lambda x:x.encode('utf-8'))
r1.persist(StorageLevel.MEMORY_ONLY)
r3 = r1.filter(lambda x: 'SUCCESS' in x)
print(r3.count())
r4 = r1.filter(lambda x: 'FAILED' in x)
print(r4.count())

21
6

No comments:

Post a Comment

Flume - Simple Demo

// create a folder in hdfs : $ hdfs dfs -mkdir /user/flumeExa // Create a shell script which generates : Hadoop in real world <n>...