>> hbase shell
hbase(main):001:0> status
>> whoami
>> list
fix the error while running hbase:
service hbase-master restart # On Master host
service hbase-regionserver restart # On all RS hosts
create table:
create 'census', 'personal','professional'
list all tables:
list
display the structure:
describe 'census'
Row count / record count:
count 'census';
Insert / update data using the Hbase shell
Add rows, each row represents data specific to one person
fields will be created on the fly:
put 'census',1,'personal:name','Mike Jones'
put 'census',1,'personal:marital_status','unmarried'
put 'census',1,'personal:qualification','Mphil'
put 'census',2,'personal:name','Raja'
put 'census',2,'personal:age','22'
put 'census',2,'personal:city','Bengaluru'
put 'census',1,'personal:gender','male'
put 'census',2,'professional:employed','yes'
put 'census', 2,'professional:education_level','high school'
put 'census', 2,'professional:field','construction'
put 'census', 3, 'personal:name','Jill Tang'
put 'census',3,'personal:marital_status','married'
put 'census',3,'personal:spouse','Jim Tang'
1 and 2 : row keys
at a time one column can be added - dynamically too not like traditional insert
display the records:
scan 'census'
put command can be used to update values in cells as well.
put 'census',2,'personal:education_level','PG'
put 'census',2,'professional:education_level','post graduate'
put 'census',4,'personal:name','ben'
put 'census',4,'personal:marital_status','divorced'
scan 'census' ---> retrive all records from a table
get all data related to one row
>>> get 'census',1
get 'census',2
get 'census',3,'personal:name' :: get data for row #3 and column family named personal and its column specified as name;
get 'census',3,'personal:marital_status' ==> it will display single column value
get 'census',3,'professional:education_level'
display only names:
scan 'census', {COLUMNS => ['personal:name']}
dictionary specifying what info you're interested in
with number of rows to be retrieved ==> with the help of LIMIT
scan 'census', {COLUMNS => ['personal:name'],LIMIT =>2}
scan 'census', {COLUMNS => ['personal:name'],LIMIT =>2, STARTROW => '2'}
scan 'census', {COLUMNS => ['personal:name'],LIMIT =>3, STARTROW => '3'}
from which Row to Which Row : STARTROW and STOPROW
scan 'census', {COLUMNS => ['personal:name'],STARTROW =>'2', STOPROW => '4'}
Delete individual cells:
get 'census', 1
delete 'census',1,'personal:marital_status'
delete 'census',1,'personal:gender'
delete 'census',1,'personal:name'
delete 'census',1,'personal:qualification'
Disable and Drop tables:
--------------------------
disable 'census'
disable removes the index from memory and flushes the recent chagnes to disk
enable 'census'
exists 'census' ===>>> table exists
drop 'census' without disable gives warning
disable 'census'
drop 'census'
General:
>> status
version
table_help
whoami
DDL:
create
list
disable
is_disabled
enable
is_enabled
describe
alter
exists
drop
drop_all
org.apache.hadoop.hbase.client
DML:
put
get
delete
deleteall
scan
count
truncate
create ‘<table name>’,’<column family>’
create 'emp', 'personal data', 'professional data'
list
disable ‘emp’
scan 'emp' ==> ERROR: emp is disabled.
is_disabled 'table name'
is_disabled 'emp' ==> true
disable_all 'r.*' ==> disable all tables which are starts with letter 'r'
enable ‘emp’
scan 'emp'
is_enabled 'table name ==> is_enabled 'emp' ==> true / false
describe 'table name' ==> describe 'emp'
Delete a column family:
alter ‘ table name ’, ‘delete’ ⇒ ‘ column family ’
alter 'employee','delete'⇒'professional'
exists 'emp'
disable 'emp'
drop 'emp'
drop_all ‘t.*’
disable_all 'raj.*'
insert rows:
put ’<table name>’,’row1’,’<colfamily:colname>’,’<value>’
put 'emp','1','personal data:name','raju'
update:
put ‘table name’,’row ’,'Column family:column name',’new value’
put 'emp','row1','personal:city','Delhi'
Retrieve:
get ’<table name>’,’row1’
get 'emp', '1'
Retrieving only specific column:
get 'table name', ‘rowid’, {COLUMN ⇒ ‘column family:column name ’}
get 'emp', 'row1', {COLUMN ⇒ 'personal:name'}
Delete specific cell:
delete ‘<table name>’, ‘<row>’, ‘<column name >’, ‘<time stamp>’
delete 'emp', '1', 'personal data:city',
Delete all cells in a row:
deleteall ‘<table name>’, ‘<row>’
deleteall 'emp','1'
Count:
count ‘<table name>’
count 'emp'
truncate 'table name'
truncate 'emp'
scan ‘emp’
RWXCA => Read Write Execute Create priviledge Access priviledge
hbase shell --> command used to enter command line interface
hbase>
Create Table:
create 'tableName', 'columnName'
insert data in row and column:
put 'tableName','rowNumber','columnName','value'
Query table and Row:
get 'tableName','rowNumber'
Display table columns and rows:
scan 'tableName'
Drop table and records:
disable 'tableName'
drop 'tableName'
show the currently running services : <==> : services.msc
service --status-all
fix the error while running hbase:
service hbase-master restart # On Master host
service hbase-regionserver restart # On all RS hosts
create 'test','city'
insert new rows:
put 'test','r1','city','pallathur'
put 'test','r2','city','Bangalore'
scan 'test'
get 'test','r1'
get 'test','r2'
disable 'test'
update existing rows:
put 'test','r1','firstname','aravinda'
put 'test','r2','firstname','desilva'
scan 'test' - to display all rows
have to do both operations together:
disable 'test'
drop 'test'
scan 'test'
apple.csv:
create "stock","date","open","high","low","close","volume","adj_close"
put "stock","...","..","..."...........................................
Use Pig to copy apple.csv contents into HBASE
---------------------------------------------
hdfs dfs -ls /user/thehenson/stock_values/apple.csv
hdfs dfs -cp /user/thehenson/apple.csv /user/thehenson/stock_values/apple.csv
hdfs dfs -mv /user/thehenson/appledaily.csv /user/thehenson/apple.csv
$ hbase shell
> create "app_stock","info"
> scan "app_stock"
exit
clear
create a pig script and execute it in command prompt
vim hbase_pigloader.pig
a = LOAD '/user/thehenson/stock_values/apple.csv' USING PigStorage (',') as (date:chararray, open:chararray, high:chararray, low:chararray, close:chararray, volume:chararray, adj_close:chararray);
STORE a INTO 'hbase://app_stock' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('info:open info:high info:low info:close info:volume info:adj_close');
execute pig script file in command prompt:
pig -f hbase_pigloader.pig
Total Number of records written : row_count
hbase shell
scan 'app_stock', {'LIMIT' => 10};
MYSQL
to logon to mysql:
mysql -pcloudera
create database my1st;
create table emp (id int, name varchar(50), deg varchar(50), salary int, dept varchar(50));
describe emp;
insert into emp values (1201,'Gopal','Manager',50000,'TP');
insert into emp values (1202, 'Manisha', 'Proof Reader', 50000,'TP'),(1203,'Khalil', 'PHP dev', 30000,'AC'), (1204, 'Prasanth', 'PHP dev', 30000,'AC'), (1205,'Raja','Admin',20000,'TP');
select * from emp;
create table emp_add (id int, hno varchar(10), street varchar(50), city varchar(50));
describe emp_add;
insert into emp_add values (1201,'288A','RajaRajeswariNagar','Bangalore');
insert into emp_add values(1202, '108I','Kaggadasapura','Bangalore'), (1203, '144Z','CV Raman Nagar','Bangalore'),(1204,'78B','Udhayanagar','Bangalore'),(1205,'720X','Dooravani Nagar','Bangalore');
create table emp_contact(id int, phno varchar(50), email varchar(50));
describe emp_contact;
insert into emp_contact values(1201,'11111111','gopal@tp.com'), (1202, '222222222','manisha@tp.com'),(1203,'3333333','khalil@ac.com'),(1204,'44444444','prasanth@ac.com'),(1205,'55555555','raja@tp.com');
connect with my1st Database and display all its tables:
sqoop list-tables --connect jdbc:mysql://localhost/my1st --username=root --password=cloudera
Import from MySQL:
emp,emp_add, emp_contact
import a complete table with data to default hdfs folder:
sqoop import --connect jdbc:mysql://localhost/my1st --username=root --password=cloudera --table emp -m 1
sqoop import --connect jdbc:mysql://localhost/my1st --username=root --password=cloudera --table emp_add -m 1
sqoop import --connect jdbc:mysql://localhost/my1st --username=root --password=cloudera --table emp_contact -m 1
all imported data will be placed in : hdfs dfs -ls /user/root/
/user/root/emp/
/user/root/emp_add
/user/root/emp_contact
To view the imported data in hdfs folders:
hdfs dfs -cat /user/root/emp/p*
hdfs dfs -cat /user/root/emp_add/p*
hdfs dfs -cat /user/root/emp_contact/p*
Set Target Directory:
imported data will be placed in /qr folder
sqoop import --connect jdbc:mysql://localhost/my1st --username=root --password=cloudera --table emp_contact -m 1 --target-dir /qr
hdfs dfs -ls /qr
import subset of table data (partial data with where condition)
--where "email='gopal@tp.com' --target-dir "wherequery"
sqoop import --connect jdbc:mysql://localhost/my1st --username=root --password=cloudera --table emp_contact -m 1 --where "email='gopal@tp.com'" --target-dir /wherequery
hdfs dfs -cat /wherequery/p*