意外的结果使用CTE在所有行的两个表上执行随机连接一对多

问题描述:

我试图随机连接两个表(TableA和TableB)的行,使TableA中的每一行都只连接到一个TableB中的每一行,并且TableB中的每一行都连接到TableA中的至少一行。意外的结果使用CTE在所有行的两个表上执行随机连接一对多

例如,随机加入与5个不同的行和表B具有3个不同的行表A的应导致这样的事情:

TableA TableB 
1  3 
2  1 
3  1 
4  2 
5  1 

然而,有时不是所有的从表B中的行包括在最终结果;因此在上面的示例中可能会丢失TableB中的第2行,因为它的位置是第1行或第3行连接到TableA上的第4行。您可以通过多次执行脚本并检查结果来看到发生这种情况。似乎有必要使用临时表(@Q)来确保返回包含TableA和TableB中所有行的正确结果。

有人可以解释为什么会发生这种情况吗?

另外,有人可以请建议什么会是一个更好的方式来获得所需的结果?

据我所知,有时没有结果返回,因为交叉应用中的某种失败以及我尚未确定的顺序,并且我确信有更好的方法来执行此操作。我希望这是有道理的。提前致谢!

declare @TableA table (
     ID int 
     ); 
    declare @TableB table (
     ID int 
     ); 
    declare @Q table (
     RN int, 
     TableAID int, 
     TableBID int 
     ); 

    with cte as (
     select 
      1 as ID 
     union all 
     select 
      ID + 1 
     from cte 
     where ID < 5 
     ) 
    insert @TableA (ID) 
    select ID from cte; 

    with cte as (
     select 
      1 as ID 
     union all 
     select 
      ID + 1 
     from cte 
     where ID < 3 
     ) 
    insert @TableB (ID) 
    select ID from cte; 

    select * from @TableA; 
    select * from @TableB; 

    with cte as (
     select 
      row_number() over (partition by TableAID order by newid()) as RN, 
      TableAID, 
      TableBID 
     from (
      select 
       a.ID as TableAID, 
       b.ID as TableBID 
      from @TableA as a 
      cross apply @TableB as b 
      ) as M 
     ) 
    select --All rows from TableB not always included 
     TableAID, 
     TableBID 
    from cte 
    where RN in (
     select 
      top 1 
       iCTE.RN 
     from cte as iCTE 
     group by iCTE.RN 
     having count(distinct iCTE.TableBID) = (
      select count(1) from @TableB 
      ) 
     ) 
    order by TableAID; 

    with cte as (
     select 
      row_number() over (partition by TableAID order by newid()) as RN, 
      TableAID, 
      TableBID 
     from (
      select 
       a.ID as TableAID, 
       b.ID as TableBID 
      from @TableA as a 
      cross apply @TableB as b 
      ) as M 
     ) 
    insert @Q 
    select 
     RN, 
     TableAID, 
     TableBID 
    from cte; 

    select * from @Q; 

    select --All rows from both TableA and TableB included 
     TableAID, 
     TableBID 
    from @Q 
    where RN in (
     select 
      top 1 
       iQ.RN 
     from @Q as iQ 
     group by iQ.RN 
     having count(distinct iQ.TableBID) = (
      select count(1) from @TableB 
      ) 
     ) 
    order by TableAID; 

看看这给了你,你找什么?

DECLARE 
    @CountA INT = (SELECT COUNT(*) FROM @TableA ta), 
    @CountB INT = (SELECT COUNT(*) FROM @TableB tb), 
    @MinCount INT; 

SELECT @MinCount = CASE WHEN @CountA < @CountB THEN @CountA ELSE @CountB END; 

WITH 
    cte_A1 AS (
     SELECT 
      *, 
      rn = ROW_NUMBER() OVER (ORDER BY NEWID()) 
     FROM 
      @TableA ta 
     ), 
    cte_B1 AS (
     SELECT 
      *, 
      rn = ROW_NUMBER() OVER (ORDER BY NEWID()) 
     FROM 
      @TableB tb 
     ), 
    cte_A2 AS (
     SELECT 
      a1.ID, 
      rn = CASE WHEN a1.rn > @MinCount THEN a1.rn - @MinCount ELSE a1.rn end 
     FROM 
      cte_A1 a1 
     ), 
    cte_B2 AS (
     SELECT 
      b1.ID, 
      rn = CASE WHEN b1.rn > @MinCount THEN b1.rn - @MinCount ELSE b1.rn end 
     FROM 
      cte_B1 b1 
     ) 
SELECT 
    A = a.ID, 
    B = b.ID 
FROM 
    cte_A2 a 
    JOIN cte_B2 b 
     ON a.rn = b.rn; 
+0

看起来像一个很好的解决方案,以防止在没有返回结果的事件,由于没有全套TableBID值的产生;但是,在我的机器上(我正在使用MSSMS 2008 R2版本10.50.4000.0)仍然需要使用临时表来确保每个表的完整集都被返回; (否则,TableAID在8/10次执行中缺失一个随机值):插入@Q select cn_A2中的rn,ID,NULL union select rn,NULL,cte_B2中的ID select * from @Q select q1.TableAID,q2.TableBID from @Q as q1 join @Q as q2 on q1.RN = q2.RN其中q1.TableAID不为NULL且q2.TableBID不为NULL; – Erg

+0

我无法访问2008R2实例进行验证,但是在代码中看不到与2008R2不兼容的任何内容。 –

+0

好的;所以这个答案看起来像是我最好的,所以我会接受它;我只是重新确认,无论出于什么原因都需要使用临时表来确保每个表的完整集都按照我以前的评论中的指示返回。 – Erg