Friday, 7 August 2020

Import all the tables from MySQL to Hive with Snappy Compression and Parquet file creation using SQOOP

mysql> use retail_db;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables;
+---------------------+
| Tables_in_retail_db |
+---------------------+
| categories          |
| customers           |
| departments         |
| order_items         |
| orders              |
| products            |
+---------------------+
6 rows in set (0.00 sec)


Here we are going to import all the tables of retail_db database of MySQL into Hive.

Additional options we have used are : Snappy Compression, Save as Parquet file 


// Delete the database in Hive

hive> drop database ohm cascade;

OK

Time taken: 1.187 seconds

 

 

hive> create database ohm;

OK



sqoop import-all-tables \

-connect jdbc:mysql://localhost:3306/retail_db \

-username root \

-password cloudera \

-warehouse-dir /user/hive/warehouse \

-m 1 \

-hive-database ohm \

-hive-import \

-hive-overwrite \

-create-hive-table \

-compress \

-compression-codec snappy \

-as-parquetfile \

-outdir java_out ;





hive> use ohm;

OK

Time taken: 0.01 seconds 

hive> show tables;

OK

categories

customers

departments

order_items

orders

products

Time taken: 0.029 seconds, Fetched: 6 row(s)



hive> describe database ohm;

OK

ohm hdfs://quickstart.cloudera:8020/user/hive/warehouse/ohm.db cloudera USER

Time taken: 0.011 seconds, Fetched: 1 row(s)




The following is the way to import all the tables one by one

sqoop import \
-connect "jdbc:mysql://localhost:3306/retail_db" \
-username root \
-password cloudera \
-table categories \
-target-dir /user/cloudera/retail_db\
-hive-import \
-create-hive-table \
-hive-database ohm \
-hive-table categories;


sqoop import \
-connect "jdbc:mysql://localhost:3306/retail_db" \
-username root \
-password cloudera \
-table customers \
-target-dir /user/cloudera/retail_db\
-hive-import \
-create-hive-table \
-hive-database ohm \
-hive-table customers;



sqoop import \
-connect "jdbc:mysql://localhost:3306/retail_db" \
-username root \
-password cloudera \
-table departments \
-target-dir /user/cloudera/retail_db/dept \
-hive-import \
-create-hive-table \
-hive-database ohm \
-hive-table departments;



sqoop import \
-connect "jdbc:mysql://localhost:3306/retail_db" \
-username root \
-password cloudera \
-table order_items \
-target-dir /user/cloudera/retail_db/order_items \
-hive-import \
-create-hive-table \
-hive-database ohm \
-hive-table order_items;



sqoop import \
-connect "jdbc:mysql://localhost:3306/retail_db" \
-username root \
-password cloudera \
-table orders \
-target-dir /user/cloudera/retail_db/orders \
-hive-import \
-create-hive-table \
-hive-database ohm \
-hive-table orders;


sqoop import \
-connect "jdbc:mysql://localhost:3306/retail_db" \
-username root \
-password cloudera \
-table products \
-target-dir /user/cloudera/retail_db\
-hive-import \
-create-hive-table \
-hive-database ohm \
-hive-table products;





sqoop import-all-tables \
-connect jdbc:mysql://localhost:3306/retail_db \
-username root \
-password cloudera  
-target-dir /user/cloudera/retail_db\
-hive-import \
-create-hive-table \
-hive-database ohm \
-hive-table products;


hive> show databases;
OK
default
ohm


hive> use ohm;
OK



hive> show tables;
OK
categories
cust_parquet
customer_parquet_snappy
customers
departments
order_items
person_fixed
products


 

No comments:

Post a Comment

Flume - Simple Demo

// create a folder in hdfs : $ hdfs dfs -mkdir /user/flumeExa // Create a shell script which generates : Hadoop in real world <n>...