Showing posts with label hdfs. Show all posts
Showing posts with label hdfs. Show all posts

Monday, 10 December 2018

MySQL to HDFS : SQOOP Import Examples

MYSQL

to logon to mysql:
mysql -pcloudera

create database my1st;
create table emp (id int, name varchar(50), deg varchar(50), salary int, dept varchar(50));

describe emp;

insert into emp values (1201,'Gopal','Manager',50000,'TP');
insert into emp values (1202, 'Manisha', 'Proof Reader', 50000,'TP'),(1203,'Khalil', 'PHP dev', 30000,'AC'), (1204, 'Prasanth', 'PHP dev', 30000,'AC'), (1205,'Raja','Admin',20000,'TP');

select * from emp;

create table emp_add (id int, hno varchar(10), street varchar(50), city varchar(50));
describe emp_add;

insert into emp_add values (1201,'288A','RajaRajeswariNagar','Bangalore');


insert into emp_add values(1202, '108I','Kaggadasapura','Bangalore'), (1203, '144Z','CV Raman Nagar','Bangalore'),(1204,'78B','Udhayanagar','Bangalore'),(1205,'720X','Dooravani Nagar','Bangalore');

    create table emp_contact(id int, phno varchar(50), email varchar(50));

describe emp_contact;

insert into emp_contact values(1201,'11111111','gopal@tp.com'), (1202, '222222222','manisha@tp.com'),(1203,'3333333','khalil@ac.com'),(1204,'44444444','prasanth@ac.com'),(1205,'55555555','raja@tp.com');


connect with my1st Database and display all its tables:

sqoop list-tables --connect jdbc:mysql://localhost/my1st --username=root    --password=cloudera

Import from MySQL:
emp,emp_add, emp_contact

import a complete table with data to default hdfs folder:
sqoop import --connect jdbc:mysql://localhost/my1st --username=root    --password=cloudera --table emp -m 1
sqoop import --connect jdbc:mysql://localhost/my1st --username=root    --password=cloudera --table emp_add -m 1
sqoop import --connect jdbc:mysql://localhost/my1st --username=root    --password=cloudera --table emp_contact -m 1

all imported data will be placed in : hdfs dfs -ls /user/root/
/user/root/emp/
/user/root/emp_add
/user/root/emp_contact

To view the imported data in hdfs folders:
hdfs dfs -cat /user/root/emp/p*
hdfs dfs -cat /user/root/emp_add/p*
hdfs dfs -cat /user/root/emp_contact/p*


Set Target Directory:
imported data will be placed in /qr folder
sqoop import --connect jdbc:mysql://localhost/my1st --username=root    --password=cloudera --table emp_contact -m 1 --target-dir /qr
hdfs dfs -ls /qr

import subset of table data (partial data with where condition)
--where "email='gopal@tp.com' --target-dir "wherequery"
sqoop import --connect jdbc:mysql://localhost/my1st --username=root    --password=cloudera --table emp_contact -m 1 --where "email='gopal@tp.com'" --target-dir /wherequery

hdfs dfs -cat /wherequery/p*


incremental import:

mysql -pcloudera
use my1st;
mysql> select * from emp;
+------+----------+--------------+--------+------+
| id   | name     | deg          | salary | dept |
+------+----------+--------------+--------+------+
| 1201 | Gopal    | Manager      |  50000 | TP   |
| 1202 | Manisha  | Proof Reader |  50000 | TP   |
| 1203 | Khalil   | PHP dev      |  30000 | AC   |
| 1204 | Prasanth | PHP dev      |  30000 | AC   |
| 1205 | Raja     | Admin        |  20000 | TP   |

insert into emp values(1206,'Satish','team leader',20000,'GR');

mysql> select * from emp;
+------+----------+--------------+--------+------+
| id   | name     | deg          | salary | dept |
+------+----------+--------------+--------+------+
| 1201 | Gopal    | Manager      |  50000 | TP   |
| 1202 | Manisha  | Proof Reader |  50000 | TP   |
| 1203 | Khalil   | PHP dev      |  30000 | AC   |
| 1204 | Prasanth | PHP dev      |  30000 | AC   |
| 1205 | Raja     | Admin        |  20000 | TP   |
| 1206 | Satish   | team leader  |  20000 | GR   |
+------+----------+--------------+--------+------+

1206 is the newly created record in mysql table named : emp in database named :my1st

Incremental import:
--incremental <mode>
--check-column <column name>
--last value <last check column value>

id greater than 1205 will be appended
sqoop import --connect jdbc:mysql://localhost/my1st --username=root    --password=cloudera --table emp -m 1 --incremental append --check-column id --last-value 1205

display the contents taken from emp table:
hdfs dfs -cat /user/root/emp/p*

Delete a folder / directory which is not empty
hdfs dfs -rm -r /user/root/emp
hdfs dfs -rm -r /user/root/emp_add
hdfs dfs -rm -r /user/root/emp_contact

HDFS Commands - Notes

HDFS Commands - hands on


create a directory in hdfs:
hdfs dfs -mkdir /user/dir1

hdfs dfs -ls /user


create an empty file in local file system:
touch test_file11.txt

copy the local file into hdfs: (upload a file to hdfs)
hdfs dfs -put /root/test_file11.txt /user/dir1

hdfs dfs -ls /user/dir1


create a directory in hdfs:
hdfs dfs -mkdir /user/dir2

hdfs dfs -ls /user

copy a file within hdfs: dir1 to dir2:

hdfs fs -cp   /user/dir1/test_file11.txt   /user/dir2

hdfs fs -ls /user/dir2

fsck : to display the summary of file system health

hdfs fsck /user/dir1/sample_data/salary/salary.csv
file is HEALTHY
Missing blocks.

To know the Size of a given file:
hdfs dfs -du /user/sample_data/salary/salary.csv

Remove file(s) in hdfs:
hdfs dfs -rm /user/dir2/*

Download a file from hdfs to local directory:
hdfs dfs -get /user/sample_data/salary/salary.csv /root/

rm /root/salary.csv  --> remove local file

hdfs dfs -get /user/sample_data/salary/salary.csv /root/



Delete existing things:
hdfs dfs -rm -r  /user/*

Create Directory:
hdfs dfs -mkdir /user

Create Directories:
hdfs dfs -mkdir /user/dir1 /user/dir2

List directory:
hdfs dfs -ls /
hdfs dfs -ls /user

[cloudera@quickstart ~]$ hdfs dfs -ls /user
Found 2 items
drwxr-xr-x   - cloudera cloudera          0 2018-07-13 22:48 user/dir1
drwxr-xr-x   - cloudera cloudera          0 2018-07-13 22:48 user/dir2

Create an empty file inside ./user/dir1
hdfs dfs -touchz /user/dir1/my1st.txt

list it again
hdfs dfs -ls /user/dir1
Found 1 items
-rw-r--r--   1 cloudera cloudera          0 2018-07-13 22:50 user/dir1/my1st.txt

Remove a file:
hdfs dfs -rm /user/dir1/my1st.txt
Deleted user/dir1/my1st.txt

Remove Directory:
hdfs dfs -rmdir /user/dir1 /user/dir1


Create a file in local linux:
cat > salary.csv
Name,Age,Sex,Salary
Arun,25,M,2000
Balu,30,M,2500
Sara,42,M,2050
Lara,40,F,1980
^c  (Ctrl + c)
 
Display the content of file in local linux:
cat salary.csv
Name,Age,Sex,Salary
Arun,25,M,2000
Balu,30,M,2500
Sara,42,M,2050
Lara,40,F,1980

Display the present working directory in local linux:
pwd
/home/cloudera
Display the files, folders in local linux:
ls -l
total 192
-rwxrwxr-x 1 cloudera cloudera  5387 Jul 19  2017 cloudera-manager
-rwxrwxr-x 1 cloudera cloudera  9964 Jul 19  2017 cm_api.py
drwxrwxr-x 3 cloudera cloudera  4096 Jul 13 22:43 Desktop
drwxrwxr-x 4 cloudera cloudera  4096 Jul 19  2017 Documents
drwxr-xr-x 2 cloudera cloudera  4096 Sep  5  2017 Downloads
drwxrwsr-x 9 cloudera cloudera  4096 Feb 19  2015 eclipse
-rw-rw-r-- 1 cloudera cloudera 53655 Jul 19  2017 enterprise-deployment.json
-rw-rw-r-- 1 cloudera cloudera 50515 Jul 19  2017 express-deployment.json
-rwxrwxr-x 1 cloudera cloudera  5007 Jul 19  2017 kerberos
drwxrwxr-x 2 cloudera cloudera  4096 Jul 19  2017 lib
drwxr-xr-x 2 cloudera cloudera  4096 Sep  5  2017 Music
-rwxrwxr-x 1 cloudera cloudera  4228 Jul 19  2017 parcels
drwxr-xr-x 2 cloudera cloudera  4096 Sep  5  2017 Pictures
drwxr-xr-x 2 cloudera cloudera  4096 Sep  5  2017 Public
-rw-rw-r-- 1 cloudera cloudera    80 Jul 13 22:57 salary.csv  /// we recently created it
drwxr-xr-x 2 cloudera cloudera  4096 Sep  5  2017 Templates
drwxr-xr-x 2 cloudera cloudera  4096 Sep  5  2017 Videos
drwxrwxr-x 4 cloudera cloudera  4096 Jul 19  2017 workspace



Create a folder in hdfs:
hdfs dfs -mkdir /user/dir1
[cloudera@quickstart ~]$ hdfs dfs -ls /user/
Found 2 items
drwxr-xr-x   - cloudera cloudera          0 2018-07-13 23:11 user/dir1

Display the content a file which is in local linux:
cat /home/cloudera/salary.csv
Name,Age,Sex,Salary
Arun,25,M,2000
Balu,30,M,2500
Sara,42,M,2050
Lara,40,F,1980


Upload file:
hdfs dfs -put <local file> <hdfs dest>
hdfs dfs -put /home/cloudera/salary.csv /user/dir1/salary.csv
hdfs dfs -put /home/cloudera/apple_daily.csv /user/dir1/apple_daily.csv

Display the content of a file which is in hdfs:
hdfs dfs -cat /user/dir1/salary.csv
Name,Age,Sex,Salary
Arun,25,M,2000
Balu,30,M,2500
Sara,42,M,2050
Lara,40,F,1980

downlaod file:
hdfs dfs -get <hdfs source> <local dest>
hdfs dfs -get /user/dir1/salary.csv /home/cloudera/mysal.csv

Display the content of downloaded file:
cat /home/cloudera/mysal.csv
Name,Age,Sex,Salary
Arun,25,M,2000
Balu,30,M,2500
Sara,42,M,2050
Lara,40,F,1980


File size:
hdfs dfs -du <filename>
hdfs dfs -du /user/dir1/salary.csv
80  80  user/dir1/salary.csv


Delete file:
hdfs dfs -rm /user/dir1/salary.csv
Deleted user/dir1/salary.csv


get Help:
hdfs dfs -help

Create zero size file:
hdfs dfs -touchz /user/dir1/sa.txt

delete it
        hdfs dfs -rm /user/dir1/sa.txt

copy file within hdfs folders:
hdfs dfs -cp /user/dir1/my.txt /user/dir2/my2.txt
hdfs dfs -ls /user/dir2 :: see it here

hdfs fsck /user/dir2

hadoop fs -du /user/dir1/apple.csv
hdfs dfs -du /user/dir1/apple.csv

hdfs dfs -ls
hdfs dfs -ls /user/

hdfs dfs -cat /user/hue/my.csv
hadoop fs -cat /user/hue/my.csv

hdfs dfs -help

hdfs dfs -ls

hdfs dfs -touchz <file> : creates file with zero length.

hdfs dfs -cat <file> :: displays the file content to the screen

hdfs dfs -mkdir /newfolder :: create directory / folder

hdfs dfs -cp <source file> <target file>   /// within hdfs

hdfs dfs -mv <old file> <new file>  :: different directory can be

 hdfs dfs -mv /user/dir1/apple_daily.csv /user/dir1/apple.csv

hdfs dfs -help mkdir

hdfs dfs -ls

hdfs dfs -ls /user/

hdfs dfs -ls /user/hue

hdfs dfs -cat /user/hue/my.csv

hdfs dfs -cp /user/hue/my.csv /user/hue/ours.csv

hdfs dfs -mkdir /user/hue/hdfs-course

hdfs dfs -put apple.csv  /user/hue/hdfs-course/test.csv

hdfs dfs -mv /user/hue/hdfs-course/test.csv awe.csv


hdfs dfs -put /tmp/file /user/hue/file

hdfs dfs -get /user/hdfs/file /tmp/file



hdfs dfs -mkdir /user/thehenson
hdfs dfs -moveFromLocal apple.csv /user/thehenson/appledaily.csv
hdfs dfs -get /user/thehenson/appledaily.csv /tmp/apple.csv

hdfs dfs -mkdir /user/thehenson

Empty the recycle bin : hdfs dfs -expunge



hdfs file location :
/user/sample_data/salary/


hdfs fsck /
hdfs dfs –ls /
hdfs dfs –mkdir /new_edureka
hdfs dfs –touchz /new_edureka/sample
hdfs dfs –du –s /new_edureka/sample
hdfs dfs –cat /new_edureka/test
 hdfs dfs –copyFromLocal /home/edureka/test /new_edureka
 hdfs dfs –copyToLocal /new_edureka/test /home/edureka
 hdfs dfs –put /home/edureka/test /user
 hdfs dfs –get /user/test /home/edureka
 hdfs dfs –count /user
  hdfs dfs –rm /new_edureka/test
  hdfs dfs -rm -r  /new_edureka  (remove folder and its contents everything)

  hdfs dfs -cp /user/hadoop/file1 /user/hadoop/file2 /user/hadoop/dir
  hdfs dfs -mv /user/hadoop/file1 /user/hadoop/file2

  hdfs dfs -expunge  --> Trash empty

  hdfs dfs –rmdir /user/hadoop
   hdfs dfs -usage mkdir
   hdfs dfs -help
 
   hadoop namenode -format
   hadoop secondrynamenode
   hadoop namenode

Flume - Simple Demo

// create a folder in hdfs : $ hdfs dfs -mkdir /user/flumeExa // Create a shell script which generates : Hadoop in real world <n>...