Partition in Hive
What is Partition?
Partition is physical and
logical separation of data in Hive
Why do
we need partition?
to increase the performance of analysis
What are the different types of partition?
There are two types of partition in Hive
Static partition
Dynamic Partition
How to create partition table
Below is the query to create partition table
$ create EXTERNAL table stu_name_par (id int , name String)
partitioned by (age int) row format delimited fields terminated by ‘,’ lines
terminated by ‘\n’ stored as textfile;
Static Partition
How to load data
Below command used to load the data (Note: stu_name.txt
contains only 2 fileds id & name)
$ load data local inpath
‘/root/TEST_DATA/Hive/Join_data/stu_name.txt’ into table stu_name_par partition
(age = 25);
Dynamic partition
set the following property to enable dynamic partition (By default dynamic partition are disabled to
avoid multiple creation of partition , many partition means many number of
files which leads to many number of IO operation which is not recommended in
Hadoop environment )
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions=1000;
set hive.exec.max.dynamic.partitions.pernode=1000;
You cannot load the flat file data in to the dynamic
partition table.
You should have a table which has data
Following command used to create to create the table
$create EXTERNAL table stu_name_no_par (id int , name
String,age int) row format delimited fields terminated by ‘,’ lines terminated
by ‘\n’ stored as textfile;
$load data local inpath
‘/root/TEST_DATA/Hive/Join_data/age_grp_rand.txt’ into table stu_name_no_par;
We have loaded different age group student data.
Now, Create table which has partition
$ create EXTERNAL table stu_name_dynamic_par (id int , name
String) partitioned by (age int) row format delimited fields terminated by ‘,’
lines terminated by ‘\n’ stored as textfile;
(Note: stu_name_dynamic_par & stu_name_par both tables
are same , creating another table to clarity )
Loading data to the dynamic partition table
$ INSERT INTO TABLE stu_name_dynamic_par PARTITION (age)
SELECT id,name,age FROM stu_name_no_par;
We can use the similar command for static partition also
(Note : age column missing in projection)
INSERT overwrite TABLE stu_name_dynamic_par PARTITION (age =
26 ) SELECT id,name FROM stu_name_no_par
where age = 26;
How to check my partitions
$ show partitions stu_name_dynamic_par
$ show partitions stu_name_par
Above command will give you the partitions in the table