数据分组根据在SQL Server
的数据是15分钟的时间间隔相关的值:数据分组根据在SQL Server
Time Value 2010-01-01 00:15 3 2010-01-01 00:30 2 2010-01-01 00:45 4 2010-01-01 01:00 5 2010-01-01 01:15 1 2010-01-01 01:30 3 2010-01-01 01:45 4 2010-01-01 02:00 12 2010-01-01 02:15 13 2010-01-01 02:30 12 2010-01-01 02:45 14 2010-01-01 03:00 15 2010-01-01 03:15 3 2010-01-01 03:30 2 2010-01-01 03:45 3 2010-01-01 04:00 5 .......... .......... .......... 2010-01-02 00:00
通常,将存在96点。
根据这些值,我们可能注意到从00:15到01:45的值彼此接近,并且从02:00到03:00它们彼此接近,从03:15到04:00他们彼此接近。
基于 “相互靠近” 的规则,我想对数据进行 “分组” 3个部分:
- 00:15至01:45
- 02:00至03: 00
- 3点15到04:00
请考虑该数据可以是随机的,并且可以根据以上定义的规则被分成大于3份,但最大不应超过10部分。并且分组必须遵守时间顺序,例如,您不能仅将00:15/02:30/04:45分组为1组,因为这3个点不是连续的。
请介绍一下如何在t-sql中实现它。
更新: 的值可以是:
Time Value 2010-01-01 00:15 3 2010-01-01 00:30 2 2010-01-01 00:45 4 2010-01-01 01:00 5 2010-01-01 01:15 1 2010-01-01 01:30 3 2010-01-01 01:45 4 2010-01-01 02:00 12 2010-01-01 02:15 13 2010-01-01 02:30 4 --suddenly decreased 2010-01-01 02:45 14 2010-01-01 03:00 15 2010-01-01 03:15 3 2010-01-01 03:30 2 2010-01-01 03:45 3 2010-01-01 04:00 5 .......... .......... .......... 2010-01-02 00:00
对这类情况,我们不应该组单独02:30,因为我们想组的大小必须至少为3分,我们将把这一点(02:30)放到上一组(从02:00到03:00)。
由于您的问题改变了这么多,这里是一个新的答案,新问题,我只包含代码部分。
declare @t table(time datetime, value int)
declare @variation float
set @variation = 2
set nocount on
insert @t values('2010-01-01 00:15',3)
insert @t values('2010-01-01 00:30',2)
insert @t values('2010-01-01 00:45',4)
insert @t values('2010-01-01 01:00',5)
insert @t values('2010-01-01 01:15',1)
insert @t values('2010-01-01 01:30',3)
insert @t values('2010-01-01 01:45',4)
insert @t values('2010-01-01 02:00',52)
insert @t values('2010-01-01 02:15',5)
insert @t values('2010-01-01 02:30',52)
insert @t values('2010-01-01 02:45',54)
insert @t values('2010-01-01 03:00',55)
insert @t values('2010-01-01 03:15',3)
insert @t values('2010-01-01 03:30',2)
insert @t values('2010-01-01 03:45',3)
insert @t values('2010-01-01 04:00',5)
declare @result table(mintime datetime, maxtime datetime)
a:
delete @result
;with t as
(
select *, rn = row_number() over(order by time), log(value) lv from @t where datediff(day, time, '2010-01-01') = 0
), a as
(
select time, lv, rn, 0 grp from t where rn = 1
union all
select t1.time, a.lv, t1.rn,
case when exists (select 1 from t t2 where t1.rn between rn + 1 and rn + 3 and
lv between t1.lv - @variation and t1.lv [email protected]) then grp else grp + 1 end
from t t1 join a on
t1.rn = a.rn +1
)
insert @result
select min(time), max(time) from a group by grp
if @@rowcount > 10
begin
set @[email protected] + .5
goto a
end
select * from @result
结果:
mintime maxtime
2010-01-01 00:15:00.000 2010-01-01 01:45:00.000
2010-01-01 02:00:00.000 2010-01-01 03:00:00.000
2010-01-01 03:15:00.000 2010-01-01 04:00:00.000
声明并填充TESTDATA:
set nocount on
declare @result table(mintime datetime, maxtime datetime)
declare @t table(time datetime, value int)
-- variation is how much difference will be allowed from one row to the next
declare @variation int
set @variation = 5
insert @t values('2010-01-01 00:15',3)
insert @t values('2010-01-01 00:30',2)
insert @t values('2010-01-01 00:45',4)
insert @t values('2010-01-01 01:00',5)
insert @t values('2010-01-01 01:15',1)
insert @t values('2010-01-01 01:30',3)
insert @t values('2010-01-01 01:45',4)
insert @t values('2010-01-01 02:00',12)
insert @t values('2010-01-01 02:15',13)
insert @t values('2010-01-01 02:30',12)
insert @t values('2010-01-01 02:45',14)
insert @t values('2010-01-01 03:00',15)
insert @t values('2010-01-01 03:15',3)
insert @t values('2010-01-01 03:30',2)
insert @t values('2010-01-01 03:45',3)
insert @t values('2010-01-01 04:00',5)
代码:
a:
;with t as
(-- add a rownumber
select *, rn = row_number() over(order by time) from @t
), a as
(-- increase group if current row's value varies more than @variation from last row's value
select time, value, rn, 0 grp from t where rn = 1
union all
select t.time, t.value, t.rn, case when t.value between
a.value - @variation and a.value [email protected]
then grp else grp+1 end
from t join a on
t.rn = a.rn +1
)
insert @result
select min(time), max(time) from a group by grp
if @@rowcount > 10
begin
-- this will activate if more than 10 groups of numbers are found
-- start over with higher tolerance for variation
set @[email protected] + 1
delete @result
goto a
end
select convert(char(5), mintime,114) + ' to ' + convert(char(5), maxtime,114)
from @result
结果在这里: http://data.stackexchange.com/stackoverflow/q/110891/declare-and-populate-testdata
这绝对是我想要的!你绝对是冠军!虽然我需要修改它来处理不同的“变化”,因为实际数据会以不同的变化结束,例如,值可能类似于:0.005,0.004,0.006,0.003,0.007等,或者5222,3122,4522, 4221,5521,1100,998,4221等。 – unruledboy
对不起,最后一件事,我更新了主帖,请参考ot it。主要想法是团体规模必须至少为3,这意味着每个团体必须至少有3分。 – unruledboy
哇,奇妙的结果!非常感谢你! – unruledboy
,如果你是有关“相互靠近”的定义更清晰这可能会有帮助。你认为“接近”最大的数值差异是什么? –
也定义了“分组”。分组是指像子弹列表一样的报告吗?是否有最小数量的最小组数? – Paparazzi
如果我有一个序列,例如1,2,3,4,5,6,7,8,9,那么该怎么办?每一个都与前一个“接近”,但是9可能不会被认为接近于1编程中最困难的部分通常是搞清楚你想要解决什么问题。 –