Saturday, 22 August 2020

Lead, Lag Examples in Spark SQL with Scala

// Lead Example 
// Lead means Next row's salary value 
spark.sql("SELECT id, fname,lname, designation, technology,salary, LEAD(salary) OVER (PARTITION BY technology ORDER BY salary desc) as Lead FROM teams").show(21)

// Lead will fetch next row's salary value 
+---+------------+-----------+-----------+----------+------+-----+
| id|       fname|      lname|designation|technology|salary| Lead|
+---+------------+-----------+-----------+----------+------+-----+
|105|      Kamala|     Kamesh|  Developer|       iOS| 98000|50000|  
|107|     Praveen|       Mani|    Analyst|       iOS| 50000|40000|
|110|    Veeraiah| Dhanukkodi|    Analyst|       iOS| 40000|20000|
|104|    Pavithra|     Lokesh|  Developer|       iOS| 20000|12000|
|108|        Anbu|      Sudha|     Tester|       iOS| 12000| null|
|119|   Aishwarya|    Dhanush|  Developer|   Node JS| 69000|50000|
|120|       Vijay|      Kumar|  Developer|   Node JS| 50000|45000|
|112|       Muthu|      Kumar|    Analyst|   Node JS| 45000|45000|
|118|       Latha|Rajinikanth|     Tester|   Node JS| 45000|20000|
|109|Nagarethinam| Dhanukkodi|     Tester|   Node JS| 20000| null|
|115|      Donald|      Trump|    Analyst|   Android| 45000|45000|
|116|      Anitha|  Kuppusamy|  Developer|   Android| 45000|40000|
|101|      Ramesh|      Kumar|     Tester|   Android| 40000|38000|
|117| Pushpavanam|  Kuppusamy|    Analyst|   Android| 38000|25000|
|102|   Ganapathy|   Govindan|     Tester|   Android| 25000| null|
|114|      Salman|       Khan|    Analyst|     Spark| 67000|50000|
|100|     Sankara|  Narayanan|  Developer|     Spark| 50000|50000|
|103|        Raja|       Mani|  Developer|     Spark| 50000|50000|
|111|       Arivu|      Madhi|  Developer|     Spark| 50000|40000|
|113|   Sikkandar|       Bhai|  Developer|     Spark| 40000|25000|
|106|        Arun|    Pandian|     Tester|     Spark| 25000| null|
+---+------------+-----------+-----------+----------+------+-----+


// Lag Example
spark.sql("SELECT id, fname,lname, designation, technology,salary, LAG(salary) OVER (PARTITION BY technology ORDER BY salary desc) as Lag FROM teams").show(21)

// Lag will fetch previous row's salary value 

+---+------------+-----------+-----------+----------+------+-----+
| id|       fname|      lname|designation|technology|salary|  Lag|
+---+------------+-----------+-----------+----------+------+-----+
|105|      Kamala|     Kamesh|  Developer|       iOS| 98000| null|
|107|     Praveen|       Mani|    Analyst|       iOS| 50000|98000|
|110|    Veeraiah| Dhanukkodi|    Analyst|       iOS| 40000|50000|
|104|    Pavithra|     Lokesh|  Developer|       iOS| 20000|40000|
|108|        Anbu|      Sudha|     Tester|       iOS| 12000|20000|
|119|   Aishwarya|    Dhanush|  Developer|   Node JS| 69000| null|
|120|       Vijay|      Kumar|  Developer|   Node JS| 50000|69000|
|112|       Muthu|      Kumar|    Analyst|   Node JS| 45000|50000|
|118|       Latha|Rajinikanth|     Tester|   Node JS| 45000|45000|
|109|Nagarethinam| Dhanukkodi|     Tester|   Node JS| 20000|45000|
|115|      Donald|      Trump|    Analyst|   Android| 45000| null|
|116|      Anitha|  Kuppusamy|  Developer|   Android| 45000|45000|
|101|      Ramesh|      Kumar|     Tester|   Android| 40000|45000|
|117| Pushpavanam|  Kuppusamy|    Analyst|   Android| 38000|40000|
|102|   Ganapathy|   Govindan|     Tester|   Android| 25000|38000|
|114|      Salman|       Khan|    Analyst|     Spark| 67000| null|
|100|     Sankara|  Narayanan|  Developer|     Spark| 50000|67000|
|103|        Raja|       Mani|  Developer|     Spark| 50000|50000|
|111|       Arivu|      Madhi|  Developer|     Spark| 50000|50000|
|113|   Sikkandar|       Bhai|  Developer|     Spark| 40000|50000|
|106|        Arun|    Pandian|     Tester|     Spark| 25000|40000|
+---+------------+-----------+-----------+----------+------+-----+

// fetching salary which is lagging for 2 records
spark.sql("SELECT id, fname,lname, designation, technology,salary, LAG(salary,2) OVER (PARTITION BY technology ORDER BY salary desc) as Lag FROM teams").show(21)


+---+------------+-----------+-----------+----------+------+-----+
| id|       fname|      lname|designation|technology|salary|  Lag|
+---+------------+-----------+-----------+----------+------+-----+
|105|      Kamala|     Kamesh|  Developer|       iOS| 98000| null|
|107|     Praveen|       Mani|    Analyst|       iOS| 50000| null|
|110|    Veeraiah| Dhanukkodi|    Analyst|       iOS| 40000|98000|
|104|    Pavithra|     Lokesh|  Developer|       iOS| 20000|50000|
|108|        Anbu|      Sudha|     Tester|       iOS| 12000|40000|
|119|   Aishwarya|    Dhanush|  Developer|   Node JS| 69000| null|
|120|       Vijay|      Kumar|  Developer|   Node JS| 50000| null|
|112|       Muthu|      Kumar|    Analyst|   Node JS| 45000|69000|
|118|       Latha|Rajinikanth|     Tester|   Node JS| 45000|50000|
|109|Nagarethinam| Dhanukkodi|     Tester|   Node JS| 20000|45000|
|115|      Donald|      Trump|    Analyst|   Android| 45000| null|
|116|      Anitha|  Kuppusamy|  Developer|   Android| 45000| null|
|101|      Ramesh|      Kumar|     Tester|   Android| 40000|45000|
|117| Pushpavanam|  Kuppusamy|    Analyst|   Android| 38000|45000|
|102|   Ganapathy|   Govindan|     Tester|   Android| 25000|40000|
|114|      Salman|       Khan|    Analyst|     Spark| 67000| null|
|100|     Sankara|  Narayanan|  Developer|     Spark| 50000| null|
|103|        Raja|       Mani|  Developer|     Spark| 50000|67000|
|111|       Arivu|      Madhi|  Developer|     Spark| 50000|50000|
|113|   Sikkandar|       Bhai|  Developer|     Spark| 40000|50000|
|106|        Arun|    Pandian|     Tester|     Spark| 25000|50000|
+---+------------+-----------+-----------+----------+------+-----+

//fetching salary which is leading 3 records
spark.sql("SELECT id, fname,lname, designation, technology,salary, LEAD(salary,3) OVER (PARTITION BY technology ORDER BY salary desc) as Lead FROM teams").show(21)



+---+------------+-----------+-----------+----------+------+-----+
| id|       fname|      lname|designation|technology|salary| Lead|
+---+------------+-----------+-----------+----------+------+-----+
|105|      Kamala|     Kamesh|  Developer|       iOS| 98000|20000|
|107|     Praveen|       Mani|    Analyst|       iOS| 50000|12000|
|110|    Veeraiah| Dhanukkodi|    Analyst|       iOS| 40000| null|
|104|    Pavithra|     Lokesh|  Developer|       iOS| 20000| null|
|108|        Anbu|      Sudha|     Tester|       iOS| 12000| null|
|119|   Aishwarya|    Dhanush|  Developer|   Node JS| 69000|45000|
|120|       Vijay|      Kumar|  Developer|   Node JS| 50000|20000|
|112|       Muthu|      Kumar|    Analyst|   Node JS| 45000| null|
|118|       Latha|Rajinikanth|     Tester|   Node JS| 45000| null|
|109|Nagarethinam| Dhanukkodi|     Tester|   Node JS| 20000| null|
|115|      Donald|      Trump|    Analyst|   Android| 45000|38000|
|116|      Anitha|  Kuppusamy|  Developer|   Android| 45000|25000|
|101|      Ramesh|      Kumar|     Tester|   Android| 40000| null|
|117| Pushpavanam|  Kuppusamy|    Analyst|   Android| 38000| null|
|102|   Ganapathy|   Govindan|     Tester|   Android| 25000| null|
|114|      Salman|       Khan|    Analyst|     Spark| 67000|50000|
|100|     Sankara|  Narayanan|  Developer|     Spark| 50000|40000|
|103|        Raja|       Mani|  Developer|     Spark| 50000|25000|
|111|       Arivu|      Madhi|  Developer|     Spark| 50000| null|
|113|   Sikkandar|       Bhai|  Developer|     Spark| 40000| null|
|106|        Arun|    Pandian|     Tester|     Spark| 25000| null|
+---+------------+-----------+-----------+----------+------+-----+


// Find the difference amount of salary between LAG value and current salary value
spark.sql("SELECT id, fname,lname, designation, technology,salary, LAG(salary) OVER (PARTITION BY technology ORDER BY salary desc) as Lag, (salary - LAG(salary) OVER (PARTITION BY technology ORDER BY salary desc)) as diff FROM teams").show(21)

+---+------------+-----------+-----------+----------+------+-----+------+
| id|       fname|      lname|designation|technology|salary|  Lag|  diff|
+---+------------+-----------+-----------+----------+------+-----+------+
|105|      Kamala|     Kamesh|  Developer|       iOS| 98000| null|  null|
|107|     Praveen|       Mani|    Analyst|       iOS| 50000|98000|-48000|
|110|    Veeraiah| Dhanukkodi|    Analyst|       iOS| 40000|50000|-10000|
|104|    Pavithra|     Lokesh|  Developer|       iOS| 20000|40000|-20000|
|108|        Anbu|      Sudha|     Tester|       iOS| 12000|20000| -8000|
|119|   Aishwarya|    Dhanush|  Developer|   Node JS| 69000| null|  null|
|120|       Vijay|      Kumar|  Developer|   Node JS| 50000|69000|-19000|
|112|       Muthu|      Kumar|    Analyst|   Node JS| 45000|50000| -5000|
|118|       Latha|Rajinikanth|     Tester|   Node JS| 45000|45000|     0|
|109|Nagarethinam| Dhanukkodi|     Tester|   Node JS| 20000|45000|-25000|
|115|      Donald|      Trump|    Analyst|   Android| 45000| null|  null|
|116|      Anitha|  Kuppusamy|  Developer|   Android| 45000|45000|     0|
|101|      Ramesh|      Kumar|     Tester|   Android| 40000|45000| -5000|
|117| Pushpavanam|  Kuppusamy|    Analyst|   Android| 38000|40000| -2000|
|102|   Ganapathy|   Govindan|     Tester|   Android| 25000|38000|-13000|
|114|      Salman|       Khan|    Analyst|     Spark| 67000| null|  null|
|100|     Sankara|  Narayanan|  Developer|     Spark| 50000|67000|-17000|
|103|        Raja|       Mani|  Developer|     Spark| 50000|50000|     0|
|111|       Arivu|      Madhi|  Developer|     Spark| 50000|50000|     0|
|113|   Sikkandar|       Bhai|  Developer|     Spark| 40000|50000|-10000|
|106|        Arun|    Pandian|     Tester|     Spark| 25000|40000|-15000|
+---+------------+-----------+-----------+----------+------+-----+------+

// Find the difference amount of salary between LEAD value and current salary value
spark.sql("SELECT id, fname,lname, designation, technology,salary, LEAD(salary) OVER (PARTITION BY technology ORDER BY salary desc) as Lead, (salary - LEAD(salary) OVER (PARTITION BY technology ORDER BY salary desc) ) as diff FROM teams").show(21)

+---+------------+-----------+-----------+----------+------+-----+-----+
| id|       fname|      lname|designation|technology|salary| Lead| diff|
+---+------------+-----------+-----------+----------+------+-----+-----+
|105|      Kamala|     Kamesh|  Developer|       iOS| 98000|50000|48000|
|107|     Praveen|       Mani|    Analyst|       iOS| 50000|40000|10000|
|110|    Veeraiah| Dhanukkodi|    Analyst|       iOS| 40000|20000|20000|
|104|    Pavithra|     Lokesh|  Developer|       iOS| 20000|12000| 8000|
|108|        Anbu|      Sudha|     Tester|       iOS| 12000| null| null|
|119|   Aishwarya|    Dhanush|  Developer|   Node JS| 69000|50000|19000|
|120|       Vijay|      Kumar|  Developer|   Node JS| 50000|45000| 5000|
|112|       Muthu|      Kumar|    Analyst|   Node JS| 45000|45000|    0|
|118|       Latha|Rajinikanth|     Tester|   Node JS| 45000|20000|25000|
|109|Nagarethinam| Dhanukkodi|     Tester|   Node JS| 20000| null| null|
|115|      Donald|      Trump|    Analyst|   Android| 45000|45000|    0|
|116|      Anitha|  Kuppusamy|  Developer|   Android| 45000|40000| 5000|
|101|      Ramesh|      Kumar|     Tester|   Android| 40000|38000| 2000|
|117| Pushpavanam|  Kuppusamy|    Analyst|   Android| 38000|25000|13000|
|102|   Ganapathy|   Govindan|     Tester|   Android| 25000| null| null|
|114|      Salman|       Khan|    Analyst|     Spark| 67000|50000|17000|
|100|     Sankara|  Narayanan|  Developer|     Spark| 50000|50000|    0|
|103|        Raja|       Mani|  Developer|     Spark| 50000|50000|    0|
|111|       Arivu|      Madhi|  Developer|     Spark| 50000|40000|10000|
|113|   Sikkandar|       Bhai|  Developer|     Spark| 40000|25000|15000|
|106|        Arun|    Pandian|     Tester|     Spark| 25000| null| null|
+---+------------+-----------+-----------+----------+------+-----+-----+ 



No comments:

Post a Comment

Flume - Simple Demo

// create a folder in hdfs : $ hdfs dfs -mkdir /user/flumeExa // Create a shell script which generates : Hadoop in real world <n>...