Sankara's Big Data Notes: Create Parquet file in Hive

Thursday, 6 August 2020

Create Parquet file in Hive

// Create a PARQUET file using the output of SELECT query

// output of SELECT query will be written as a parquet file in hdfs

hive> create table cust_parquet

stored as parquet

location '/user/cloudera/cust_parquet'

as SELECT * FROM CUSTOMERS;

hive> show tables;

cust_parquet

customers

hive> describe formatted cust_parquet;

# col_name data_type comment

id int

fname string

lname string

email string

password string

street string

city string

state string

zipcode string

# Detailed Table Information

Database: ohm

Owner: cloudera

CreateTime: Thu Aug 06 19:04:16 PDT 2020

LastAccessTime: UNKNOWN

Protect Mode: None

Retention: 0

Location: hdfs://quickstart.cloudera:8020/user/cloudera/cust_parquet

Table Type: MANAGED_TABLE

Table Parameters:

COLUMN_STATS_ACCURATE true

numFiles 1

numRows 12435

rawDataSize 111915

totalSize 334655

transient_lastDdlTime 1596765856

# Storage Information

SerDe Library: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe

InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat

OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat

Compressed: No

Num Buckets: -1

Bucket Columns: []

Sort Columns: []

Storage Desc Params:

serialization.format 1

Time taken: 0.091 seconds, Fetched: 39 row(s)

// Display the parquet file in hdfs:

[cloudera@quickstart ~]$ hdfs dfs -ls hdfs://localhost:8020/user/cloudera/cust_parquet

Found 1 items

-rwxr-xr-x 1 cloudera cloudera 334655 2020-08-06 19:04 hdfs://localhost:8020/user/cloudera/cust_parquet/000000_0

// Display the content of Parquet file using Parquet-tools

[cloudera@quickstart ~]$ parquet-tools head -n3 hdfs://localhost:8020/user/cloudera/cust_parquet/000000_0

id = 1

fname = Richard

lname = Hernandez

email = XXXXXXXXX

password = XXXXXXXXX

street = 6303 Heather Plaza

city = Brownsville

state = TX

zipcode = 78521

id = 2

fname = Mary

lname = Barrett

email = XXXXXXXXX

password = XXXXXXXXX

street = 9526 Noble Embers Ridge

city = Littleton

state = CO

zipcode = 80126

id = 3

fname = Ann

lname = Smith

email = XXXXXXXXX

password = XXXXXXXXX

street = 3422 Blue Pioneer Bend

city = Caguas

state = PR

zipcode = 00725

Sankara's Big Data Notes

Thursday, 6 August 2020

Create Parquet file in Hive

No comments:

Post a Comment

Flume - Simple Demo