Sankara's Big Data Notes: HBase Notes

>> hbase shell
hbase(main):001:0> status

>> whoami

>> list

fix the error while running hbase:
service hbase-master restart # On Master host
service hbase-regionserver restart # On all RS hosts

create table:
create 'census', 'personal','professional'
list all tables:
list

display the structure:
describe 'census'

Row count / record count:
count 'census';

Insert / update data using the Hbase shell
Add rows, each row represents data specific to one person

fields will be created on the fly:
put 'census',1,'personal:name','Mike Jones'

put 'census',1,'personal:marital_status','unmarried'

put 'census',1,'personal:qualification','Mphil'

put 'census',2,'personal:name','Raja'
put 'census',2,'personal:age','22'
put 'census',2,'personal:city','Bengaluru'

put 'census',1,'personal:gender','male'
put 'census',2,'professional:employed','yes'
put 'census', 2,'professional:education_level','high school'
put 'census', 2,'professional:field','construction'

put 'census', 3, 'personal:name','Jill Tang'
put 'census',3,'personal:marital_status','married'
put 'census',3,'personal:spouse','Jim Tang'

1 and 2 : row keys

at a time one column can be added - dynamically too not like traditional insert

display the records:
scan 'census'

put command can be used to update values in cells as well.

put 'census',2,'personal:education_level','PG'
put 'census',2,'professional:education_level','post graduate'

put 'census',4,'personal:name','ben'
put 'census',4,'personal:marital_status','divorced'

scan 'census' ---> retrive all records from a table

get all data related to one row

>>> get 'census',1
get 'census',2

get 'census',3,'personal:name' :: get data for row #3 and column family named personal and its column specified as name;

get 'census',3,'personal:marital_status' ==> it will display single column value

get 'census',3,'professional:education_level'

display only names:
scan 'census', {COLUMNS => ['personal:name']}

dictionary specifying what info you're interested in

with number of rows to be retrieved ==> with the help of LIMIT

scan 'census', {COLUMNS => ['personal:name'],LIMIT =>2}

scan 'census', {COLUMNS => ['personal:name'],LIMIT =>2, STARTROW => '2'}

scan 'census', {COLUMNS => ['personal:name'],LIMIT =>3, STARTROW => '3'}

from which Row to Which Row : STARTROW and STOPROW

scan 'census', {COLUMNS => ['personal:name'],STARTROW =>'2', STOPROW => '4'}

Delete individual cells:

get 'census', 1

delete 'census',1,'personal:marital_status'
delete 'census',1,'personal:gender'
delete 'census',1,'personal:name'
delete 'census',1,'personal:qualification'

Disable and Drop tables:
--------------------------

disable 'census'
disable removes the index from memory and flushes the recent chagnes to disk
enable 'census'

exists 'census' ===>>> table exists

drop 'census' without disable gives warning

disable 'census'
drop 'census'
General:
>> status
version

table_help

whoami

DDL:
create
list
disable
is_disabled

enable
is_enabled

describe
alter
exists

drop
drop_all

org.apache.hadoop.hbase.client

DML:
put
get
delete
deleteall
scan
count
truncate

create ‘<table name>’,’<column family>’
create 'emp', 'personal data', 'professional data'

list

disable ‘emp’
scan 'emp' ==> ERROR: emp is disabled.

is_disabled 'table name'
is_disabled 'emp' ==> true

disable_all 'r.*' ==> disable all tables which are starts with letter 'r'

enable ‘emp’
scan 'emp'

is_enabled 'table name ==> is_enabled 'emp' ==> true / false

describe 'table name' ==> describe 'emp'

Delete a column family:
alter ‘ table name ’, ‘delete’ ⇒ ‘ column family ’
alter 'employee','delete'⇒'professional'

exists 'emp'

disable 'emp'
drop 'emp'

drop_all ‘t.*’

disable_all 'raj.*'

insert rows:
put ’<table name>’,’row1’,’<colfamily:colname>’,’<value>’
put 'emp','1','personal data:name','raju'

update:
put ‘table name’,’row ’,'Column family:column name',’new value’
put 'emp','row1','personal:city','Delhi'

Retrieve:
get ’<table name>’,’row1’
get 'emp', '1'

Retrieving only specific column:
get 'table name', ‘rowid’, {COLUMN ⇒ ‘column family:column name ’}

get 'emp', 'row1', {COLUMN ⇒ 'personal:name'}

Delete specific cell:
delete ‘<table name>’, ‘<row>’, ‘<column name >’, ‘<time stamp>’
delete 'emp', '1', 'personal data:city',

Delete all cells in a row:
deleteall ‘<table name>’, ‘<row>’
deleteall 'emp','1'

Count:
count ‘<table name>’
count 'emp'

truncate 'table name'
truncate 'emp'

scan ‘emp’

RWXCA => Read Write Execute Create priviledge Access priviledge

hbase shell --> command used to enter command line interface

hbase>
Create Table:
create 'tableName', 'columnName'

insert data in row and column:
put 'tableName','rowNumber','columnName','value'

Query table and Row:
get 'tableName','rowNumber'

Display table columns and rows:
scan 'tableName'

Drop table and records:
disable 'tableName'
drop 'tableName'

show the currently running services : <==> : services.msc
service --status-all

fix the error while running hbase:
service hbase-master restart # On Master host
service hbase-regionserver restart # On all RS hosts

create 'test','city'

insert new rows:
put 'test','r1','city','pallathur'
put 'test','r2','city','Bangalore'

scan 'test'
get 'test','r1'
get 'test','r2'

disable 'test'

update existing rows:
put 'test','r1','firstname','aravinda'
put 'test','r2','firstname','desilva'

scan 'test' - to display all rows

have to do both operations together:
disable 'test'
drop 'test'

scan 'test'

apple.csv:
create "stock","date","open","high","low","close","volume","adj_close"

put "stock","...","..","..."...........................................

Use Pig to copy apple.csv contents into HBASE
---------------------------------------------

hdfs dfs -ls /user/thehenson/stock_values/apple.csv
hdfs dfs -cp /user/thehenson/apple.csv /user/thehenson/stock_values/apple.csv

hdfs dfs -mv /user/thehenson/appledaily.csv /user/thehenson/apple.csv

$ hbase shell
> create "app_stock","info"
> scan "app_stock"

exit
clear

create a pig script and execute it in command prompt
vim hbase_pigloader.pig

a = LOAD '/user/thehenson/stock_values/apple.csv' USING PigStorage (',') as (date:chararray, open:chararray, high:chararray, low:chararray, close:chararray, volume:chararray, adj_close:chararray);

STORE a INTO 'hbase://app_stock' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('info:open info:high info:low info:close info:volume info:adj_close');

execute pig script file in command prompt:
pig -f hbase_pigloader.pig

Total Number of records written : row_count

hbase shell

scan 'app_stock', {'LIMIT' => 10};

MYSQL

to logon to mysql:
mysql -pcloudera

create database my1st;
create table emp (id int, name varchar(50), deg varchar(50), salary int, dept varchar(50));

describe emp;

insert into emp values (1201,'Gopal','Manager',50000,'TP');
insert into emp values (1202, 'Manisha', 'Proof Reader', 50000,'TP'),(1203,'Khalil', 'PHP dev', 30000,'AC'), (1204, 'Prasanth', 'PHP dev', 30000,'AC'), (1205,'Raja','Admin',20000,'TP');

select * from emp;

create table emp_add (id int, hno varchar(10), street varchar(50), city varchar(50));
describe emp_add;

insert into emp_add values (1201,'288A','RajaRajeswariNagar','Bangalore');

insert into emp_add values(1202, '108I','Kaggadasapura','Bangalore'), (1203, '144Z','CV Raman Nagar','Bangalore'),(1204,'78B','Udhayanagar','Bangalore'),(1205,'720X','Dooravani Nagar','Bangalore');

create table emp_contact(id int, phno varchar(50), email varchar(50));

describe emp_contact;

insert into emp_contact values(1201,'11111111','gopal@tp.com'), (1202, '222222222','manisha@tp.com'),(1203,'3333333','khalil@ac.com'),(1204,'44444444','prasanth@ac.com'),(1205,'55555555','raja@tp.com');

connect with my1st Database and display all its tables:

sqoop list-tables --connect jdbc:mysql://localhost/my1st --username=root --password=cloudera

Import from MySQL:
emp,emp_add, emp_contact

import a complete table with data to default hdfs folder:
sqoop import --connect jdbc:mysql://localhost/my1st --username=root --password=cloudera --table emp -m 1
sqoop import --connect jdbc:mysql://localhost/my1st --username=root --password=cloudera --table emp_add -m 1
sqoop import --connect jdbc:mysql://localhost/my1st --username=root --password=cloudera --table emp_contact -m 1

all imported data will be placed in : hdfs dfs -ls /user/root/
/user/root/emp/
/user/root/emp_add
/user/root/emp_contact

To view the imported data in hdfs folders:
hdfs dfs -cat /user/root/emp/p*
hdfs dfs -cat /user/root/emp_add/p*
hdfs dfs -cat /user/root/emp_contact/p*

Set Target Directory:
imported data will be placed in /qr folder
sqoop import --connect jdbc:mysql://localhost/my1st --username=root --password=cloudera --table emp_contact -m 1 --target-dir /qr
hdfs dfs -ls /qr

import subset of table data (partial data with where condition)
--where "email='gopal@tp.com' --target-dir "wherequery"
sqoop import --connect jdbc:mysql://localhost/my1st --username=root --password=cloudera --table emp_contact -m 1 --where "email='gopal@tp.com'" --target-dir /wherequery

hdfs dfs -cat /wherequery/p*

Sankara's Big Data Notes

Monday, 10 December 2018

HBase Notes - Part 1

No comments:

Post a Comment

Flume - Simple Demo