mysql> use retail_db;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> show tables;
+---------------------+
| Tables_in_retail_db |
+---------------------+
| categories |
| customers |
| departments |
| order_items |
| orders |
| products |
+---------------------+
6 rows in set (0.00 sec)
mysql> describe orders;
+-------------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+-------------+------+-----+---------+----------------+
| order_id | int(11) | NO | PRI | NULL | auto_increment |
| order_date | datetime | NO | | NULL | |
| order_customer_id | int(11) | NO | | NULL | |
| order_status | varchar(45) | NO | | NULL | |
+-------------------+-------------+------+-----+---------+----------------+
// create a staging table with the same structure of orders table
mysql> create table retail_db.orders_staging as select * from retail_db.orders where 1 = 0;
Query OK, 0 rows affected (0.10 sec)
Records: 0 Duplicates: 0 Warnings: 0
// create a new table : orders_exported which has the same structure of retail_db.orders table
mysql> create table retail_db.orders_exported as select * from retail_db.orders where 1 = 0;
Query OK, 0 rows affected (0.00 sec)
Records: 0 Duplicates: 0 Warnings: 0
// verify the structure of newly created orders_staging table
mysql> describe orders_staging;
+-------------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+-------------+------+-----+---------+-------+
| order_id | int(11) | NO | | 0 | |
| order_date | datetime | NO | | NULL | |
| order_customer_id | int(11) | NO | | NULL | |
| order_status | varchar(45) | NO | | NULL | |
+-------------------+-------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
// I already have the orders data in HDFS
$ hdfs dfs -ls /user/cloudera/orders
Found 5 items
-rw-r--r-- 1 cloudera cloudera 0 2020-08-06 05:52 /user/cloudera/orders/_SUCCESS
-rw-r--r-- 1 cloudera cloudera 741614 2020-08-06 05:52 /user/cloudera/orders/part-m-00000
-rw-r--r-- 1 cloudera cloudera 753022 2020-08-06 05:52 /user/cloudera/orders/part-m-00001
-rw-r--r-- 1 cloudera cloudera 752368 2020-08-06 05:52 /user/cloudera/orders/part-m-00002
-rw-r--r-- 1 cloudera cloudera 752940 2020-08-06 05:52 /user/cloudera/orders/part-m-00003
// Export HDFS data into MySQL table using Staging stable approach
sqoop export \
-connect "jdbc:mysql://localhost/retail_db" \
-username root \
-password cloudera \
-table orders_exported \
-export-dir hdfs://localhost:8020/user/cloudera/orders/ \
-staging-table orders_staging \
-clear-staging-table \
-input-fields-terminated-by ','
20/08/08 00:24:45 INFO mapreduce.ExportJobBase: Exported 68883 records.
20/08/08 00:24:45 INFO mapreduce.ExportJobBase: Starting to migrate data from staging table to destination.
20/08/08 00:24:45 INFO manager.SqlManager: Migrated 68883 records from `orders_staging` to `orders_exported`
mysql> select * from retail_db.orders_exported limit 5;
+----------+---------------------+-------------------+-----------------+
| order_id | order_date | order_customer_id | order_status |
+----------+---------------------+-------------------+-----------------+
| 1 | 2013-07-25 00:00:00 | 11599 | CLOSED |
| 2 | 2013-07-25 00:00:00 | 256 | PENDING_PAYMENT |
| 3 | 2013-07-25 00:00:00 | 12111 | COMPLETE |
| 4 | 2013-07-25 00:00:00 | 8827 | CLOSED |
| 5 | 2013-07-25 00:00:00 | 11318 | COMPLETE |
+----------+---------------------+-------------------+-----------------+
5 rows in set (0.00 sec)
mysql> select count(1) from retail_db.orders_exported limit 5;
+----------+
| count(1) |
+----------+
| 68883 |
+----------+
1 row in set (0.01 sec)
Line count in HDFS:
$ hdfs dfs -cat hdfs://localhost:8020/user/cloudera/orders/* | wc -l
68883
Subscribe to:
Post Comments (Atom)
Flume - Simple Demo
// create a folder in hdfs : $ hdfs dfs -mkdir /user/flumeExa // Create a shell script which generates : Hadoop in real world <n>...
-
How to fetch Spark Application Id programmaticall while running the Spark Job? scala> spark.sparkContext.applicationId res124: String = l...
-
input data: ---------- customerID, itemID, amount 44,8602,37.19 35,5368,65.89 2,3391,40.64 47,6694,14.98 29,680,13.08 91,8900,24.59 ...
-
pattern matching is similar to switch statements in C#, Java no fall-through - at least one condition matched no breaks object PatternExa { ...
No comments:
Post a Comment