Tuesday, 19 May 2020

Read Json and Write into MySQL table using PySpark

//Read from json

df_local = sqlContext.read.format("json").load("E:\\DataSets\\olympic.json")

df_local.printSchema()

root
 |-- age: long (nullable = true)
 |-- athelete: string (nullable = true)
 |-- bronze: long (nullable = true)
 |-- closing: string (nullable = true)
 |-- country: string (nullable = true)
 |-- gold: long (nullable = true)
 |-- silver: long (nullable = true)
 |-- sport: string (nullable = true)
 |-- total: long (nullable = true)
 |-- year: string (nullable = true)

df_local.show()

+---+--------------------+------+--------+-------------+----+------+--------------------+-----+----+
|age|            athelete|bronze| closing|      country|gold|silver|               sport|total|year|
+---+--------------------+------+--------+-------------+----+------+--------------------+-----+----+
| 30|      Inge de Bruijn|     2|08-29-04|  Netherlands|   1|     1|            Swimming|    4|2004|
| 24|         Ryan Lochte|     2|08-24-08|United States|   2|     0|            Swimming|    4|2008|
| 23|Libby Lenton-Tric...|     1|08-24-08|    Australia|   2|     1|            Swimming|    4|2008|
| 24|     Kirsty Coventry|     0|08-24-08|     Zimbabwe|   1|     3|            Swimming|    4|2008|
| 20|            Sun Yang|     1|08-12-12|        China|   2|     1|            Swimming|    4|2012|
| 29|       Marit Bjørgen|     1|02-28-10|       Norway|   3|     1|Cross Country Skiing|    5|2010|
| 18|       Nastia Liukin|     1|08-24-08|United States|   1|     3|          Gymnastics|    5|2008|
| 26|       Cindy Klassen|     2|02-26-06|       Canada|   1|     2|       Speed Skating|    5|2006|
| 33|         Dara Torres|     3|10-01-00|United States|   2|     0|            Swimming|    5|2000|
| 17|          Ian Thorpe|     0|10-01-00|    Australia|   3|     2|            Swimming|    5|2000|
| 21|    Natalie Coughlin|     1|08-29-04|United States|   2|     2|            Swimming|    5|2004|
| 22|     Allison Schmitt|     1|08-12-12|United States|   3|     1|            Swimming|    5|2012|
| 27|         Ryan Lochte|     1|08-12-12|United States|   2|     2|            Swimming|    5|2012|
| 17|      Missy Franklin|     1|08-12-12|United States|   4|     0|            Swimming|    5|2012|
| 24|       Alicia Coutts|     1|08-12-12|    Australia|   1|     3|            Swimming|    5|2012|
| 24|       Aleksey Nemov|     3|10-01-00|       Russia|   2|     1|          Gymnastics|    6|2000|
| 25|    Natalie Coughlin|     3|08-24-08|United States|   1|     2|            Swimming|    6|2008|
| 27|      Michael Phelps|     0|08-12-12|United States|   4|     2|            Swimming|    6|2012|
| 19|      Michael Phelps|     2|08-29-04|United States|   6|     0|            Swimming|    8|2004|
| 23|      Michael Phelps|     0|08-24-08|United States|   8|     0|            Swimming|    8|2008|
+---+--------------------+------+--------+-------------+----+------+--------------------+-----+----+

//Save dataframe into MySQL table
df_local.write.format("jdbc").\
option("url", "jdbc:mysql://localhost:3306/school").\
option("driver", "com.mysql.jdbc.Driver").\
option("dbtable", "olympic").\
option("user", "root").\
option("password", "Studi0Plus").save()
//Create Dataframe (Read) from MySQL
df_Olympic = sqlContext.read.format("jdbc").\
option("url", "jdbc:mysql://localhost:3306/school").\
option("driver", "com.mysql.jdbc.Driver").\
option("dbtable", "olympic").\
option("user", "root").\
option("password", "Studi0Plus").load()
df_Olympic.show(5)

+---+--------------------+------+--------+-------------+----+------+--------+-----+----+
|age|            athelete|bronze| closing|      country|gold|silver|   sport|total|year|
+---+--------------------+------+--------+-------------+----+------+--------+-----+----+
| 30|      Inge de Bruijn|     2|08-29-04|  Netherlands|   1|     1|Swimming|    4|2004|
| 24|         Ryan Lochte|     2|08-24-08|United States|   2|     0|Swimming|    4|2008|
| 23|Libby Lenton-Tric...|     1|08-24-08|    Australia|   2|     1|Swimming|    4|2008|
| 24|     Kirsty Coventry|     0|08-24-08|     Zimbabwe|   1|     3|Swimming|    4|2008|
| 20|            Sun Yang|     1|08-12-12|        China|   2|     1|Swimming|    4|2012|
+---+--------------------+------+--------+-------------+----+------+--------+-----+----+
only showing top 5 rows


df_Olympic.filter("country = 'United States' and sport='Swimming' and year=2008").show()


+---+----------------+------+--------+-------------+----+------+--------+-----+----+
|age|        athelete|bronze| closing|      country|gold|silver|   sport|total|year|
+---+----------------+------+--------+-------------+----+------+--------+-----+----+
| 24|     Ryan Lochte|     2|08-24-08|United States|   2|     0|Swimming|    4|2008|
| 25|Natalie Coughlin|     3|08-24-08|United States|   1|     2|Swimming|    6|2008|
| 23|  Michael Phelps|     0|08-24-08|United States|   8|     0|Swimming|    8|2008|
+---+----------------+------+--------+-------------+----+------+--------+-----+----+

No comments:

Post a Comment

Flume - Simple Demo

// create a folder in hdfs : $ hdfs dfs -mkdir /user/flumeExa // Create a shell script which generates : Hadoop in real world <n>...