Wednesday, 5 August 2020

Import all tables from MySQL to HDFS (Exclude Specific tables while importing)

//Import all the tables from MySQL to HDFS using SQOOP and exclude specific tables while importing

mysql> use retail_db;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

show Database changed
mysql> show tables;
+---------------------+
| Tables_in_retail_db |
+---------------------+
| categories          |
| customers           |
| departments         |
| order_items         |
| orders              |
| products            |
+---------------------+
6 rows in set (0.00 sec)


// Import all tables from retail_db 

sqoop import-all-tables \
-connect jdbc:mysql://localhost:3306/retail_db \
-username root \
-password cloudera   


// imported into /user/cloudera/categories,
/user/cloudera/customers,
/user/cloudera/departments,
/user/cloudera/order_items,
/user/cloudera/orders,
/user/cloudera/products

// Testing purpose - excluded all the tables 
$ sqoop import-all-tables \
--connect jdbc:mysql://localhost:3306/retail_db \
--username root \
--password cloudera   \
--exclude-tables "categories,customers,departments,order_items,orders,products"

Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
20/08/05 09:05:02 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.0
20/08/05 09:05:02 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
20/08/05 09:05:05 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
Skipping table: categories
Skipping table: customers
Skipping table: departments
Skipping table: order_items
Skipping table: orders
Skipping table: products



// only categories table will be imported - except categories all other tables names are mentioned in exclude-tables list.
sqoop import-all-tables \
--connect jdbc:mysql://localhost:3306/retail_db \
--username root \
--password cloudera   \
--exclude-tables "customers,departments,order_items,orders,products"


Skipping table: customers
Skipping table: departments
Skipping table: order_items
Skipping table: orders
Skipping table: products


Use --exclude-tables "table1,table2" option to ignore the table1 and table2.

Do NOT add white-space between the table names (aka. "table1, table2")


$ hdfs dfs -ls /user/cloudera/
Found 1 items
drwxr-xr-x   - cloudera supergroup          0 2020-08-05 09:12 /user/cloudera/categories

$ hdfs dfs -ls /user/cloudera/categories/*
[cloudera@quickstart ~]$ hdfs dfs -ls /user/cloudera/categories/*
-rw-r--r--   1 cloudera supergroup          0 2020-08-05 09:12 /user/cloudera/categories/_SUCCESS
-rw-r--r--   1 cloudera supergroup        271 2020-08-05 09:12 /user/cloudera/categories/part-m-00000
-rw-r--r--   1 cloudera supergroup        263 2020-08-05 09:12 /user/cloudera/categories/part-m-00001
-rw-r--r--   1 cloudera supergroup        266 2020-08-05 09:12 /user/cloudera/categories/part-m-00002
-rw-r--r--   1 cloudera supergroup        229 2020-08-05 09:12 /user/cloudera/categories/part-m-00003

No comments:

Post a Comment

Flume - Simple Demo

// create a folder in hdfs : $ hdfs dfs -mkdir /user/flumeExa // Create a shell script which generates : Hadoop in real world <n>...