蜂巢查找开始和集团的结束或更改点

蜂巢查找开始和集团的结束或更改点

问题描述:

下面是表:蜂巢查找开始和集团的结束或更改点

+------+------+ 
| Name | Time | 
+------+------+ 
| A | 1 | 
| A | 2 | 
| A | 3 | 
| A | 4 | 
| B | 5 | 
| B | 6 | 
| A | 7 | 
| B | 8 | 
| B | 9 | 
| B | 10 | 
+------+------+ 

我想编写一个查询来获得:

+-------+--------+-----+ 
| Name | Start | End | 
+-------+--------+-----+ 
| A  |  1 | 4 | 
| B  |  5 | 6 | 
| A  |  7 | 7 | 
| B  |  8 | 10 | 
+-------+--------+-----+ 

有谁知道怎么办呢?

+3

这就是所谓的岛屿问题。我不知道'HIVE',但在'SQL Server'中我们使用窗口函数来解决它。这里是** [DEMO](http://www.sqlfiddle.com/#!3/9eecb7db59d16c80417c72d1/6243)** –

+0

非常感谢! – GoGoGo

这不是最有效的方式,但它的工作原理。

SELECT name, min(time) AS start,max(time) As end 
FROM (
    SELECT name,time, time- DENSE_RANK() OVER (partition by name ORDER BY 
    time) AS diff 
    FROM foo 
) t 
GROUP BY name,diff; 

我建议尝试以下查询,并建立一个GenericUDF找出差距,更容易:)

SELECT name, sort_array(collect_list(time)) FROM foo GROUP BY name;