蜂巢查找开始和集团的结束或更改点
问题描述:
下面是表:蜂巢查找开始和集团的结束或更改点
+------+------+
| Name | Time |
+------+------+
| A | 1 |
| A | 2 |
| A | 3 |
| A | 4 |
| B | 5 |
| B | 6 |
| A | 7 |
| B | 8 |
| B | 9 |
| B | 10 |
+------+------+
我想编写一个查询来获得:
+-------+--------+-----+
| Name | Start | End |
+-------+--------+-----+
| A | 1 | 4 |
| B | 5 | 6 |
| A | 7 | 7 |
| B | 8 | 10 |
+-------+--------+-----+
有谁知道怎么办呢?
答
这不是最有效的方式,但它的工作原理。
SELECT name, min(time) AS start,max(time) As end
FROM (
SELECT name,time, time- DENSE_RANK() OVER (partition by name ORDER BY
time) AS diff
FROM foo
) t
GROUP BY name,diff;
我建议尝试以下查询,并建立一个GenericUDF找出差距,更容易:)
SELECT name, sort_array(collect_list(time)) FROM foo GROUP BY name;
这就是所谓的岛屿问题。我不知道'HIVE',但在'SQL Server'中我们使用窗口函数来解决它。这里是** [DEMO](http://www.sqlfiddle.com/#!3/9eecb7db59d16c80417c72d1/6243)** –
非常感谢! – GoGoGo