Array, Map, Struct
insert into and query from tables with complex data types.
Table Generating functions
Array - List
collection of entity / data types
No fixed sizes
u can specify as many entities
Entities of the same types
Student Names
only primitive types allowed (int,varchar)
Create a column with an array data type
Insert list data into this table using
-select
-load
=> create table mobilephones(id string, title string, cost float, colors array<string>, screensize array<float>);
=> show tables;
insert into table mobilephones select "Redminote7","Redmi Note 7", 300, array("white","silver","black"), array(float(4.5)) UNION ALL select "motoGplus","Moto G Plus", 200, array("black","gold"), array(float(4.5),float(5.5));
select colors[0],colors[1],screensize[0],screensize[1] from mobilephones;
select * from mobilephones;
select id,colors from mobilephones;
select id,colors[0] from mobilephones;
drop table mobilephones;
create table mobilephones (id string, title string, cost float, colors array<string>, screensize array<float>) row format delimited fields terminated by ',' collection items terminated by '#';
===>>>> Fields separated by commas (,); collection items separated by Hash (#)
===>>>>> If only one element there will be no HASH
gedit mobilephones.csv
samsungj7, Samsung J7, 250, red#blue#black,5.5
OnePlusThree,One Plus Three, 450,gold#silver,4.5#5.5
load data local inpath 'mobilephones.csv' into table mobilephones;
select * from mobilephones;
select id, colors, screen from mobilephones;
select id, colors[0],screensize[0] from mobilephones;
Map - Key Value Pair
Unordered collection of pairs
No fixed size
every entity is a key,value pair
Keys and values have their own data type
Key (int), value (string) - valid
Key (string) value (int) - valid
drop table mobilphones;
===> create table mobilephones (id string, title string, cost float,colors array<string>,screensize array<float>,features map<string,boolean>) row format delimited fields terminated by ',' collection items terminated by '#' map keys terminated by ':';
==> !q
gedit mobilephones.csv
samsungj7, Samsung J7, 250, red#blue#black,5.5,camera:true#dualsim:false
OnePlusThree,One Plus Three, 450,gold#silver,4.5#5.5,autofocus:true
load data local inpath 'mobilephones.csv' into table mobilephones;
select * from mobilephones;
select features['camera'] from mobilephones;
Struct - Logical grouping of data
equivalent in class
can have different data types
whole struct is a single entity
can hold any number of values
create a column with a struct data type
insert data into this table from a CSV file
Query struct values from the Hive table.
> drop table mobilephones;
create table mobilephones
(
id string,
title string,
cost float,
colors array<string>,
screensize array<float>,
features map<string,boolean>,
information struct<battery:string,camera:string>
) row format delimited fields terminated by ','
collection items terminated by '#'
map keys terminated by ':';
describe mobilephones;
!q
gedit mobilephones.csv
samsungj7, Samsung J7, 250, red#blue#black,5.5,camera:true#dualsim:false,24 hours#2MP
OnePlusThree,One Plus Three, 450,gold#silver,4.5#5.5,autofocus:true,12 hours#8MP
==> use sara;
==> load data local inpath 'mobilephones.csv' into table mobilephones;
select id,title,cost,colors[0] as colorone,screensize[1] as screen1, information from mobilephones;
select information.battery from mobilephones;
select information.battery as battery, information.camera as camera from mobilephones;
select id, screensize[0] as screen1, colors[0] as color1,information.battery as battery, information.camera as camera,features['camera'] as camera, features['dualsim'] as dualsim from mobilephones;
hive-site.xml => change hive.cbo.enable ===> false; ::: to disable warning
Table Generating functions
Builtin functions UDF - User Defined functions
UDAF - user defined Aggregate functions
UDTF - User Defined Table Generating functions
UDF - User Defined Functions
works on a single row and
outputs a single row
trim(),concat(),length(),round(),floor()
UDAF : User Defined Aggregate Functions
- works on multiple rows and
outputs a single row.
sum(),avg(),count()
UDTF : User Defined Table Generating Function
works on a single row,
outputs multiple rows
explode(), posexplode()
Explode : flatten the data in arrays and maps
Manager : SubordinateList
-------------------------
Larry : [Sundar,Eric,John]
Sergey : [Ruth,Urs]
Sundar : [Susan,Alan,Lazlo]
Explode: Table Generating function
--------
Sundar
Eric
John
Ruth
Urs
Susan
Alan
Lazlo
Employee | Details
Larry : {"office":"271B","numReports":8,"salary":1}
Sergey : {"office":"271B","numReports":5,"salary":1}
Sundar : {"office":"285","numReports":12}
Key : value
office 217B
numReports 8
salary 1
office 271B
numReports 5
salary 1
select explode(colors) as variants from mobilephones;
select explode(features) from mobilephones;
select explode(features) as (index,variants) from mobilephones; // flatten key value pairs
select explode(features) as (feature,present) from mobilephones where id='samsung7';
select posexplode(colors) as (index,variants) from mobilephones;
UDTF - specify only that column. do not include anything else.
Lateral view:
virtual table formed by the exploded view
Array, Map column can be exploded and then using Lateral view - we can make virtual table, we can join it
which can be joined with the original table to allow complex queries.
Manager : SubordinateList
-------------------------------
Larry : [Sundar, Eric, John]
Exploded view based on SubordinateList column: flattened results
Sundar
Eric
John
virtual table can be joined with the original table
Manager : SubordinateList : Subordinate
Larry : [Sundar, Eric, John] : Sundar
Larry : [Sundar, Eric, John] : Eric
Larry : [Sundar, Eric, John] : John
a) Create a table with Array, Map column
b) Load data into the table
c) create lateral views with exploded arrays, maps
d) join lateral views with the original table for more complex queries
describe mobilephones;
id string
title string
cost float
colors array<string>
screensize array<float>
features map<string,boolean>
information struct<battery:string,camera:string>
select id, variants from mobilephones lateral view explode(colors) colorsTable as variants;
+---------------+-----------+--+
| id | variants |
+---------------+-----------+--+
| samsungj7 | red |
| samsungj7 | blue |
| samsungj7 | black |
| OnePlusThree | gold |
| OnePlusThree | silver |
+---------------+-----------+--+
>> beeline -u jdbc:hive2://
>> use sara;
>> select id, feature, present from mobilephones lateral view explode(features) featuresTable as feature, present;
create table ads (page_id string, ad_list array<string>) row format delimited fields terminated by ',' collection items terminated by '#';
load data local inpath 'ads.csv' into table ads;
select explode(ad_list) from ads;
select * from ads;
select page_id, ad_id from ads lateral view explode(ad_list) adListTable as ad_id;
group by : display number of times the ad displayed for each campaign
select ad_id,count(*) from ads lateral view explode(ad_list) adListTable as ad_id group by ad_id;
Multiple Lateral View:
----------------------
complex combined lateral views
use multiple lateral views in a query
use details from one lateral view in another lateral view
create table student_subjects (names array<string>,subjects array<string>) row format delimited fields terminated by ',' collection items terminated by '#';
describe student_subjects;
!q
gedit student_subjects.csv
beeline -u jdbc:hive2://
load data local inpath 'student_subjects.csv' into table student_subjects;
select n,subjects from student_subjects lateral view explode(names) nameTable as n;
select n,s from student_subjects lateral view explode(names) nameTable as n lateral view explode(subjects) subjectTable as s;
===> single query with multiple lateral view and join the output with the original table.
create table example (double_nested array<array<int>>);
insert into table example select array(array(1,2,3), array(4,5,6),array(7,8,9));
select * from example lateral view explode(double_nested) nestedTable as single_nested lateral view explode(single_nested) flatTable as num;
No comments:
Post a Comment