合并重叠日期间隔

问题描述:

合并重叠日期间隔是否有更好的方法?
我提出的解决方案非常简单,现在我想知道是否有其他人更好地了解如何做到这一点。合并重叠日期间隔

/***** DATA EXAMPLE *****/ 
DECLARE @T TABLE (d1 DATETIME, d2 DATETIME) 
INSERT INTO @T (d1, d2) 
     SELECT '2010-01-01','2010-03-31' UNION SELECT '2010-04-01','2010-05-31' 
    UNION SELECT '2010-06-15','2010-06-25' UNION SELECT '2010-06-26','2010-07-10' 
    UNION SELECT '2010-08-01','2010-08-05' UNION SELECT '2010-08-01','2010-08-09' 
    UNION SELECT '2010-08-02','2010-08-07' UNION SELECT '2010-08-08','2010-08-08' 
    UNION SELECT '2010-08-09','2010-08-12' UNION SELECT '2010-07-04','2010-08-16' 
    UNION SELECT '2010-11-01','2010-12-31' UNION SELECT '2010-03-01','2010-06-13' 

/***** INTERVAL ANALYSIS *****/ 
WHILE (1=1) BEGIN 
    UPDATE t1 SET t1.d2 = t2.d2 
    FROM @T AS t1 INNER JOIN @T AS t2 ON 
      DATEADD(day, 1, t1.d2) BETWEEN t2.d1 AND t2.d2 
    IF @@ROWCOUNT = 0 BREAK 
END 

/***** RESULT *****/ 
SELECT StartDate = MIN(d1) , EndDate = d2 
FROM @T 
GROUP BY d2 
ORDER BY StartDate, EndDate 

/***** OUTPUT *****/ 
/***** 
StartDate EndDate 
2010-01-01 2010-06-13 
2010-06-15 2010-08-16 
2010-11-01 2010-12-31 
*****/ 
+1

是间隔开开,闭闭,开闭或闭开?这很重要,因为最终条件略有不同。出于多种目的,开放式关闭(包括第一次约会,不包括第二次约会)是最好的表现形式;公开(包括两端)往往是人们想到的。 – 2010-04-01 14:56:33

+0

乔纳森,我在考虑当(开始日期和结束日期)天都是期间的一部分的情况。 – leoinfo 2010-04-01 15:09:52

+0

可以单程执行,但它是一个游标实现,因此它取决于数据集的大小。 – 2010-05-02 11:22:06

在此解决方案中,我创建了一个临时日历表,该表存储一个范围内每天的值。这种类型的表可以是静态的。另外,从2009-12-31开始,我只能存储400个奇怪的日期。很明显,如果你的日期跨度较大,你需要更多的价值。

此外,该解决方案将只使用SQL Server 2005+在我使用的是CTE。

With Calendar As 
    (
    Select DateAdd(d, ROW_NUMBER() OVER (ORDER BY s1.object_id), '1900-01-01') As [Date] 
    From sys.columns as s1 
     Cross Join sys.columns as s2 
    ) 
    , StopDates As 
    (
    Select C.[Date] 
    From Calendar As C 
     Left Join @T As T 
      On C.[Date] Between T.d1 And T.d2 
    Where C.[Date] >= (Select Min(T2.d1) From @T As T2) 
     And C.[Date] <= (Select Max(T2.d2) From @T As T2) 
     And T.d1 Is Null 
    ) 
    , StopDatesInUse As 
    (
    Select D1.[Date] 
    From StopDates As D1 
     Left Join StopDates As D2 
      On D1.[Date] = DateAdd(d,1,D2.Date) 
    Where D2.[Date] Is Null 
    ) 
    , DataWithEariestStopDate As 
    (
    Select * 
    , (Select Min(SD2.[Date]) 
     From StopDatesInUse As SD2 
     Where T.d2 < SD2.[Date]) As StopDate 
    From @T As T 
    ) 
Select Min(d1), Max(d2) 
From DataWithEariestStopDate 
Group By StopDate 
Order By Min(d1) 

编辑在2009年使用日期的问题无关,与最终的查询。问题是日历表不够大。我在2009-12-31开始了日程表。我已经在1900-01-01开始修改它。

+0

您的代码正在合并不应该合并的时间间隔。使用此初始间隔/ **/SELECT'2009-01-01','2009-01-01'UNION SELECT'2009-01-03','2009-01-03'/ ** /代码将返回单个期间:2009-01-01至2009-01-03。在这种情况下,2009-01-02不应包含在结果区间中。 – leoinfo 2010-04-07 19:07:35

+0

首先,您应该添加模式,具体是否D1 = D2。您的示例数据都没有表明这一点。其次,如果您**将** {2010-01-01,2010-01-01}添加到现有示例数据中,则第一个范围仍应为2010-01-01至2010-06-13,因为第一个条目在你的例子中涵盖2010-01-01至2010-03-31。第三,如果您将**示例中的第一个条目替换为{2010-01-01,2010-01-01},{2010-03-01,2010-03-01},则我的查询结果仍然是正确的。做出这一改变后,前两项出现为{2010-01-01,2010-01-01},{2010-03-01,2010-06-13}。 – Thomas 2010-04-07 20:40:08

+0

还有一种情况,如果您仅替换{2010-01-01,2010-01-01},{2010-03-01,2010-03-01}中的所有条目,则会得到相同的两个条目。 – Thomas 2010-04-07 20:42:28

试试这个

;WITH T1 AS 
(
    SELECT d1, d2, ROW_NUMBER() OVER(ORDER BY (SELECT 0)) AS R 
    FROM @T 
), NUMS AS 
(
    SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 0)) AS R 
    FROM T1 A 
    CROSS JOIN T1 B 
    CROSS JOIN T1 C 
), ONERANGE AS 
(
    SELECT DISTINCT DATEADD(DAY, ROW_NUMBER() OVER(PARTITION BY T1.R ORDER BY (SELECT 0)) - 1, T1.D1) AS ELEMENT 
    FROM T1 
    CROSS JOIN NUMS 
    WHERE NUMS.R <= DATEDIFF(DAY, d1, d2) + 1 
), SEQUENCE AS 
(
    SELECT ELEMENT, DATEDIFF(DAY, '19000101', ELEMENT) - ROW_NUMBER() OVER(ORDER BY ELEMENT) AS rownum 
    FROM ONERANGE 
) 
SELECT MIN(ELEMENT) AS StartDate, MAX(ELEMENT) as EndDate 
FROM SEQUENCE 
GROUP BY rownum 

的基本思路是,首先展开的现有数据,所以你得到一个单独的行的每一天。这在ONERANGE

完成之后,确定之间的关系如何日期增量和行号做的方式。 现有范围/岛内的差异保持不变。只要你到一个新的数据孤岛,它们之间的差异增加,因为超过1日期增量,同时通过1

行数递增我一直在寻找同样的解决办法和整个这个职位上来到Combine overlapping datetime to return single overlapping range record

上有Packing Date Intervals另一个线程。

我与不同日期范围,包括这里列出的进行了测试,它每次都正常工作。


SELECT 
     s1.StartDate, 
     --t1.EndDate 
     MIN(t1.EndDate) AS EndDate 
FROM @T s1 
INNER JOIN @T t1 ON s1.StartDate <= t1.EndDate 
    AND NOT EXISTS(SELECT * FROM @T t2 
       WHERE t1.EndDate >= t2.StartDate AND t1.EndDate < t2.EndDate) 
WHERE NOT EXISTS(SELECT * FROM @T s2 
       WHERE s1.StartDate > s2.StartDate AND s1.StartDate <= s2.EndDate) 
GROUP BY s1.StartDate 
ORDER BY s1.StartDate 

结果是:

StartDate | EndDate 
2010-01-01 | 2010-06-13 
2010-06-15 | 2010-06-25 
2010-06-26 | 2010-08-16 
2010-11-01 | 2010-12-31 
+0

另外,在这里找到另一个解释如何实现这个的例子:http://www.sqlmag.com/blog/puzzled-by-t-sql-blog-15/tsql/packing-date-intervals-136831 – user1045402 2011-11-23 11:50:29

+0

你可以编辑自己的答案以添加更多信息,只需点击答案底部的“编辑”链接即可。 – ForceMagic 2012-11-06 23:15:41

+0

作品完美,简洁! – ensisNoctis 2016-09-08 12:41:19