SQOOP problem : #1
Connect to MySQL database using SQOOP, import all orders
from orders table whose order_status is COMPLETE.
Data Description:
A MySQL instance is running on the localhost.In that instance, you will find orders table that
contains order's data.
Installation : LocalHost
Database Name : retail_db
Table Name : Orders
User Name : root
Password : cloudera
Output requirement:
Place the order's files in HDFS directory:
/user/cloudera/problem1/orders/parquetdata
Use parquet format with tab delimiter and snappy compression/
NULL values are represented as -1 for numbers and "NA" for strings.
Solution:
$ sqoop import \
-connect "jdbc:mysql://localhost/retail_db" \
-username root \
-password cloudera \
-table orders \
-target-dir /user/cloudera/problem1/orders/parquetdata \
-delete-target-dir \
-fields-terminated-by "\t" \
-where "order_status='COMPLETE'" \
-null-string "NA" \
-null-non-string -1 \
-compress \
-compression-codec snappy \
-as-parquetfile
$ hdfs dfs -ls /user/cloudera/problem1/orders/parquetdataFound 6 items
drwxr-xr-x - cloudera cloudera 0 2020-08-03 08:09 /user/cloudera/problem1/orders/parquetdata/.metadata
drwxr-xr-x - cloudera cloudera 0 2020-08-03 08:11 /user/cloudera/problem1/orders/parquetdata/.signals
-rw-r--r-- 1 cloudera cloudera 50233 2020-08-03 08:11 /user/cloudera/problem1/orders/parquetdata/1fa590f9-281d-4d62-9650-ddfe04d7eae2.parquet
-rw-r--r-- 1 cloudera cloudera 47508 2020-08-03 08:11 /user/cloudera/problem1/orders/parquetdata/217e22ee-25b2-41a6-8bf9-f6fee4c7d0c4.parquet
-rw-r--r-- 1 cloudera cloudera 46231 2020-08-03 08:11 /user/cloudera/problem1/orders/parquetdata/826a8254-9700-4edd-afdb-b8fb58b2fd49.parquet
-rw-r--r-- 1 cloudera cloudera 46406 2020-08-03 08:11 /user/cloudera/problem1/orders/parquetdata/84355f27-d1a4-4c1b-ac9d-84fd5139a912.parquet
// View the parquet file content using parquet-tools:
[cloudera@quickstart ~]$ parquet-tools head -n5 hdfs://localhost/user/cloudera/problem1/orders/parquetdata/1fa590f9-281d-4d62-9650-ddfe04d7eae2.parquet
order_id = 51665
order_date = 1402729200000
order_customer_id = 8645
order_status = COMPLETE
order_id = 51670
order_date = 1402729200000
order_customer_id = 9322
order_status = COMPLETE
order_id = 51671
order_date = 1402729200000
order_customer_id = 9000
order_status = COMPLETE
order_id = 51673
order_date = 1402729200000
order_customer_id = 7538
order_status = COMPLETE
order_id = 51677
order_date = 1402729200000
order_customer_id = 6321
order_status = COMPLETE
No comments:
Post a Comment