SAP HANA grouping sets

原文链接:https://my.oschina.net/corleone/blog/131576
2019独角兽企业重金招聘Python工程师标准>>> hot3.png

本文只是对SAP HANA SQL Reference Manual 中关于group by的部分做一些个人解释。

数据准备:
create column table t1 ( id int primary key, customer varchar(5), year int, product
varchar(5), sales int );
insert into t1 values(1, ‘C1’, 2009, ‘P1’, 100);
insert into t1 values(2, ‘C1’, 2009, ‘P2’, 200);
insert into t1 values(3, ‘C1’, 2010, ‘P1’, 50);
insert into t1 values(4, ‘C1’, 2010, ‘P2’, 150);
insert into t1 values(5, ‘C2’, 2009, ‘P1’, 200);
insert into t1 values(6, ‘C2’, 2009, ‘P2’, 300);
insert into t1 values(7, ‘C2’, 2010, ‘P1’, 100);
insert into t1 values(8, ‘C2’, 2010, ‘P2’, 150);

GROUPING SETS
select customer, year, product, sum(sales) from t1
group by GROUPING SETS ((customer, year),(customer, product));
SAP HANA grouping sets

select customer, year, NULL, sum(sales) from t1 group by customer, year

union all
select customer, NULL, product, sum(sales) from t1 group by customer, product;
SAP HANA grouping sets

两结果集是一致的。

ROLLUP
select customer, year, sum(sales) from t1 group by ROLLUP(customer, year)
SAP HANA grouping sets

类似于:select customer, year, sum(sales)

from t1 group by grouping sets ( (customer, year), (customer))
union all
select NULL, NULL, sum(sales) from t1;

CUBE
SAP HANA grouping sets

select customer, year, sum(sales) from t1 group by CUBE(customer, year);
按顺序组合group by并union结果集,非group by 字段且不被聚合的列,默认为null.

类似于 select customer, year, sum(sales)
from t1
group by grouping sets
(
(customer, year),
(customer),
(year)
)
union all
select NULL, NULL, sum(sales)
from t1;

BEST
文档中给出的例子不是特别容易说明问题,换一个例子

select customer, year, sum(sales) from t1 group by CUBE best 1 (customer, year);

select customer, year, sum(sales) from t1 group by CUBE best 2 (customer, year);
SAP HANA grouping setsSAP HANA grouping sets

由上面的图,对照前面的 cube 结果集,可知,best N是取 前N个 group by 组合的结果并union。

同时也支持N 为负数,如果负数的话,即为逆序的前N个group by 组合的结果并union。

limit
select customer, year, product, sum(sales) from t1 group by grouping sets LIMIT 2
( (customer, year), (product) );
只是简单的,每种分组取N条结果集并合并成一个大结果集。

SAP HANA grouping sets

至此,还是比较简单的。下面开始稍微有点意思了。

with subtotal
select customer, year, product, sum(sales)
from t1 group by grouping sets LIMIT 2 WITH SUBTOTAL
( (customer, year), (product) );

SAP HANA grouping sets
由此结果集可以看出,在每种group by 下都有一行记录,且记录是此group by 返回结果集的total.在分组中,但是因为limit 而没有返回的结果集,不算在total中,因此顾名思义,为subtotal。那有朋友就能想象出来,with total就是包含该分组的所有结果集的total。下面会提到。

with balance
文档给出的例子也不好,再改:

select customer, year, sum(sales) from t1 group by CUBE limit 1 with balance (customer, year)
SAP HANA grouping sets
和上面的很类似,也在每中分组下多了一条对该分组总结性的记录。不过这条记录是平衡的,或者是补差值的记录。

由前面结果集可见,sum(sales)=1250.由于每种分组的只取一条。因此每条都会有一个非null的差值记录来平衡。

官方的解释是这条记录来表示没有返回的结果集的subtotal.说的更好。

with total
select customer, year, product, sum(sales)
from t1 group by grouping sets LIMIT 2 WITH TOTAL
( (customer, year), (product) )
对照前面的with subtotal , 顾名思义,这会就不是subtotal。不多解释。
SAP HANA grouping sets

text_filter
select customer, year, product, sum(sales), text_filter(customer),
text_filter(product) from t1 group by grouping sets TEXT_FILTER ‘*2’
( (customer, year), (product) )
官方定义为按每中分组的第一个分组列来过滤,返回符合的结果集
SAP HANA grouping sets

fill up
select customer, year, product, sum(sales),
text_filter(customer), text_filter(product)
from t1 group by grouping sets TEXT_FILTER ‘*2’ FILL UP
( (customer, year), (product) )
与text_filter对照看,这里返回的结果集里没有过滤掉不符合条件的。
SAP HANA grouping sets

sort matchs to top
结合fill up 看 ,顾名思义,就是按分组把匹配的记录置顶
SAP HANA grouping sets

转载于:https://my.oschina.net/corleone/blog/131576