Sankara's Big Data Notes: Hive Notes

create a .csv file now.
gedit products.csv :: save the below info

iphone7, iphone 7, 950
camera_canon, canon 570x, 1000
Washingmachine_samsung, Samsung Swift, 400
tv_vu, Vu 56 Inch, 600

copy this products.csv to hdfs's /data/ folder

hdfs dfs -mkdir /data
hdfs dfs -copyFromLocal prodcuts.csv /data/
hdfs dfs -ls /data/

create external table if not exists products
(
id string, title string, cost float
)
comment "blah blah blah!"
location '/data/';

(external and location are mandatory for external tables)

select * from products; // data taken from products.csv

drop table products; ----> files will be there in /data/ folder.

create external table if not exists products (brand string, title string, cost float) comment "blah blah blah!" row format delimited fields terminated by ',' stored as textfile location '/user/sample_data/products/';

gedit products.csv
hdfs dfs -mkdir /user/sample_data/products
hdfs dfs -copyFromLocal products.csv /user/sample_data/products/
hdfs dfs -cat /user/sample_data/products/products.csv

start HiveServer2 :
beeline -u jdbc:hive2://

show databases;

use sara;
select * from products;

make a copy of existing table:
create table if not exists fresh_products like products;

describe products;
describe fresh_products;

alter table fresh_products rename to freshproducts;

alter table freshproducts add columns ( expiry_date date comment "Expiry Date of fresh produce" );

describe freshproducts;

Move the column position: (not working *******)
alter table freshproducts (change column brand brand string after title);

Temporary Tables:
Tables created within a Hive session, deleted when the session ends.
store temporary data;
Tables of the same name can be created by different users
It doesn't support partitions, indexes.

can have the same name as permanent table.
Temp table hides the permanent table if same name used.

create temporary table test_customers like customers;

show tables;
describe test_customers;
insert into test_customers values(1000,'Harish','MN'),(1001,'Mani','LA');

Quitting / Exiting Hive : !q

create temporary table customers like orders; (customers (temp table) will hide the original customers table)

show tables;
describe customers; ==> it will hide customers (permanent tables) and shows temp table structure
describe customers; ==> table structure copied from orders table;

!q : quit hive

relogin to hive and show the structure of customers again:

use sara;
describe customers : ---> now this is original structure of permanent table

Hive works on huge datasets / bulk load

standalone - previously describled.

Loading data from files / other existing tables.

hive > gedit freshproducts.csv
broccoli, Broccoli, 5
Spinach, Spinach, 7
Carrot, Local Carrots, 4
Potato, Idaho Potatoes, 4

Bulk load data into table from local file system:

load data local inpath 'freshproducts.csv' into table freshproducts;
local inpath - file is located in local file system not on hdfs
contents of .csv file will be copied into hive's warehouse (internal) folder.

select * from freshproducts; //// with null values;

hdfs dfs -cat /user/hive/warehouse/sara.db/freshproducts/freshproducts.csv

hdfs dfs -copyFromLocal freshproducts.csv /user/sample_data/products
hdfs dfs -ls /user/sample_data/products
hdfs dfs -cat /user/sample_data/products/freshproducts.csv

beeline>
create table freshproducts like products;
describe freshproducts;

> use sara;
load data inpath '/user/sample_data/products/freshproducts.csv' into table freshproducts;
select * from freshproducts;

!q
hdfs dfs -ls /user/hive/warehouse/sara.db/freshproducts/
hdfs dfs -cat /user/hive/warehouse/sara.db/freshproducts/freshproducts.csv

load data inpath '/user/sample_data/products/freshproducts.csv' into table freshproducts;
select * from freshproducts; /// here data got appended... duplicate rows.

overwrite option while bulk loading:

drop table freshproducts;
create table freshproducts like products;

load data local inpath 'freshproducts.csv' overwrite into table freshproducts;
select * from freshproducts;
select * from products;

fetch data from different table and put it into one more table
insert into select

merge data

insert into products select * from freshproducts; /// append model
insert overwrite table products select * from freshproducts; // overwrite existing data

Load multiple tables from a single table
delete data from tables;

create new tables:
create table product_name (brand string, name string);
create table product_cost (brand string, cost float);

splitting Products ===> product_name and product_cost
Take data from single table and put them into multiple tables:

from products
insert into table product_name
select brand,title
insert into table product_cost
select brand,cost;

Truncate command: delete all the records in single shot.
truncate table freshproducts;

Hive tables do not support row level delete and updates by default

Sankara's Big Data Notes

Monday, 10 December 2018

Hive Notes - Part 2

No comments:

Post a Comment

Flume - Simple Demo