Hive union all 与 multi insert
Hive union all 与 multi insert执行测试
A insert overwrite table temp_test select accountbin,accounttype from s_tbl_cardbininfo where accounttype = 'C' union all select accountbin,accounttype from s_tbl_cardbininfo where accounttype = 'D';
执行计划:
B from s_tbl_cardbininfo insert overwrite table test_lxw partition (flag = 'C') select accountbin,accounttype where accounttype = 'C' insert overwrite table test_lxw partition (flag = 'D') select accountbin,accounttype where accounttype = 'D'
执行计划:
Hive union all 和 multi insert 对于同表都会只加载一次,hive对union all做了很多优化性能要优于multi insert,multi insert用到分区操作复杂了些。测试结果也是union all 要优于multi insert。
C insert overwrite table temp_test select 'countaccountbin', accounttype from s_tbl_cardbininfo where accounttype = 'C' group by accounttype union all select 'countaccountbin',accounttype from s_tbl_cardbininfo where accounttype = 'D' group by accounttype;
执行计划:
D explain from s_tbl_cardbininfo insert overwrite table test_lxw partition (flag = 'C') select 'accountbin',accounttype where accounttype = 'C' group by accounttype insert overwrite table test_lxw partition (flag = 'D') select 'accountbin',accounttype where accounttype = 'D' group by accounttype
执行计划:
Hive count,distinct,group by会导致union all 同表关联扫描多次表,multi insert 不会,可以通过调整count,distinct,group by 位置,先执行union all后进行count,distinct,group by来避免这种情况发生。