Monday, 3 August 2020

SQOOP Import problem and solution #1

SQOOP problem : #1
Connect to MySQL database using SQOOP, import all orders
from orders table whose order_status is COMPLETE.

Data Description:
A MySQL instance is running on the localhost.In that instance, you will find orders table that 
contains order's data.

Installation : LocalHost
Database Name : retail_db
Table Name : Orders
User Name : root
Password : cloudera

Output requirement:
Place the order's files in HDFS directory:
/user/cloudera/problem1/orders/parquetdata
Use parquet format with tab delimiter and snappy compression/

NULL values are represented as -1 for numbers and "NA" for strings.


Solution:
$ sqoop import \
-connect "jdbc:mysql://localhost/retail_db" \
-username root \
-password cloudera \
-table orders \
-target-dir /user/cloudera/problem1/orders/parquetdata \
-delete-target-dir \
-fields-terminated-by "\t" \
-where "order_status='COMPLETE'" \
-null-string "NA" \
-null-non-string -1 \
-compress \
-compression-codec snappy \
-as-parquetfile

$ hdfs dfs -ls /user/cloudera/problem1/orders/parquetdataFound 6 items
drwxr-xr-x   - cloudera cloudera          0 2020-08-03 08:09 /user/cloudera/problem1/orders/parquetdata/.metadata
drwxr-xr-x   - cloudera cloudera          0 2020-08-03 08:11 /user/cloudera/problem1/orders/parquetdata/.signals
-rw-r--r--   1 cloudera cloudera      50233 2020-08-03 08:11 /user/cloudera/problem1/orders/parquetdata/1fa590f9-281d-4d62-9650-ddfe04d7eae2.parquet
-rw-r--r--   1 cloudera cloudera      47508 2020-08-03 08:11 /user/cloudera/problem1/orders/parquetdata/217e22ee-25b2-41a6-8bf9-f6fee4c7d0c4.parquet
-rw-r--r--   1 cloudera cloudera      46231 2020-08-03 08:11 /user/cloudera/problem1/orders/parquetdata/826a8254-9700-4edd-afdb-b8fb58b2fd49.parquet
-rw-r--r--   1 cloudera cloudera      46406 2020-08-03 08:11 /user/cloudera/problem1/orders/parquetdata/84355f27-d1a4-4c1b-ac9d-84fd5139a912.parquet



// View the parquet file content using parquet-tools:

[cloudera@quickstart ~]$ parquet-tools head -n5 hdfs://localhost/user/cloudera/problem1/orders/parquetdata/1fa590f9-281d-4d62-9650-ddfe04d7eae2.parquet
order_id = 51665
order_date = 1402729200000
order_customer_id = 8645
order_status = COMPLETE

order_id = 51670
order_date = 1402729200000
order_customer_id = 9322
order_status = COMPLETE

order_id = 51671
order_date = 1402729200000
order_customer_id = 9000
order_status = COMPLETE

order_id = 51673
order_date = 1402729200000
order_customer_id = 7538
order_status = COMPLETE

order_id = 51677
order_date = 1402729200000
order_customer_id = 6321
order_status = COMPLETE

No comments:

Post a Comment

Flume - Simple Demo

// create a folder in hdfs : $ hdfs dfs -mkdir /user/flumeExa // Create a shell script which generates : Hadoop in real world <n>...