Wednesday, March 2, 2016

Hive partitions (Static and Dynamic)

Partition in Hive

What is Partition?

     Partition is physical and logical separation of data in Hive

Why do we need partition?

   to increase the performance of analysis

What are the different types of partition?

There are two types of partition in Hive

    Static partition

    Dynamic Partition


How to create partition table

Below is the query to create partition table

$ create EXTERNAL table stu_name_par (id int , name String) partitioned by (age int) row format delimited fields terminated by ‘,’ lines terminated by ‘\n’ stored as textfile;

Static Partition

How to load data

Below command used to load the data (Note: stu_name.txt contains only 2 fileds  id & name)

$ load data local inpath ‘/root/TEST_DATA/Hive/Join_data/stu_name.txt’ into table stu_name_par partition (age = 25);

  
Dynamic partition

set the following property to enable dynamic partition  (By default dynamic partition are disabled to avoid multiple creation of partition , many partition means many number of files which leads to many number of IO operation which is not recommended in Hadoop environment )

set hive.exec.dynamic.partition=true;

set hive.exec.dynamic.partition.mode=nonstrict;

set hive.exec.max.dynamic.partitions=1000;

set hive.exec.max.dynamic.partitions.pernode=1000;

You cannot load the flat file data in to the dynamic partition table.

You should have a table which has data



Following command used to create to create the table

$create EXTERNAL table stu_name_no_par (id int , name String,age int) row format delimited fields terminated by ‘,’ lines terminated by ‘\n’ stored as textfile;

$load data local inpath ‘/root/TEST_DATA/Hive/Join_data/age_grp_rand.txt’ into table stu_name_no_par;

We have loaded different age group student data.

Now, Create table which has partition

$ create EXTERNAL table stu_name_dynamic_par (id int , name String) partitioned by (age int) row format delimited fields terminated by ‘,’ lines terminated by ‘\n’ stored as textfile;

(Note: stu_name_dynamic_par & stu_name_par both tables are same , creating another table to clarity )



Loading data to the dynamic partition table

$ INSERT INTO TABLE stu_name_dynamic_par PARTITION (age) SELECT  id,name,age FROM stu_name_no_par;



We can use the similar command for static partition also (Note : age column missing in projection)

INSERT overwrite TABLE stu_name_dynamic_par PARTITION (age = 26 ) SELECT  id,name FROM stu_name_no_par where age = 26;



How to check my partitions

$ show partitions stu_name_dynamic_par

$ show partitions stu_name_par

 Above command will give you the partitions in the table

No comments:

Post a Comment