Hive union all 与 multi insert

Hive union all 与 multi insert执行测试

 

A  insert overwrite table temp_test  select accountbin,accounttype from s_tbl_cardbininfo where accounttype = 'C' union all select accountbin,accounttype from s_tbl_cardbininfo where accounttype = 'D';

 

执行计划:

Hive union all 与 multi insert

 

B from s_tbl_cardbininfo  insert overwrite table test_lxw partition (flag = 'C')   select accountbin,accounttype  where accounttype = 'C' insert overwrite table test_lxw partition (flag = 'D')   select accountbin,accounttype  where accounttype = 'D'

 

执行计划:

Hive union all 与 multi insert

 

Hive union all 和 multi insert 对于同表都会只加载一次,hive对union all做了很多优化性能要优于multi insert,multi insert用到分区操作复杂了些。测试结果也是union all 要优于multi insert。

 

 

C  insert overwrite table temp_test  select 'countaccountbin', accounttype from s_tbl_cardbininfo where accounttype = 'C' group by accounttype  union all select 'countaccountbin',accounttype from s_tbl_cardbininfo where accounttype = 'D' group by accounttype;

执行计划:

Hive union all 与 multi insert

Hive union all 与 multi insert

 

D explain from s_tbl_cardbininfo  insert overwrite table test_lxw partition (flag = 'C')   select 'accountbin',accounttype  where accounttype = 'C' group by accounttype insert overwrite table test_lxw partition (flag = 'D')   select 'accountbin',accounttype  where accounttype = 'D' group by accounttype

执行计划:

Hive union all 与 multi insert

Hive union all 与 multi insert

Hive count,distinct,group by会导致union all 同表关联扫描多次表,multi insert 不会,可以通过调整count,distinct,group by 位置,先执行union all后进行count,distinct,group by来避免这种情况发生。