问题描述：

我似乎无法找到与我的在线类似的情况。我有一个名为Order的'orders'表和一个表，用于详细说明这些订单，称为'order detail'。某种订单类型的定义是，它是否具有两对订单明细（价值单位对）中的一个。所以，我的订单详细信息表可能是这样的：SQL - 查找一对行是否不存在的最有效方法

order_id | detail 
---------|------- 
1  | X 
1  | Y 
1  | Z 
2  | X 
2  | Z 
2  | B 
3  | A 
3  | Z 
3  | B

的两对一起去的（X & Y）和（A & B）。什么是仅检索那些不包含这些对中的任何一个的order_id的有效方式？例如对于上面的表格，我需要只接收ORDER_ID 2.

唯一的解决办法我能想出基本上是使用两个查询，并进行自连接：

select distinct o.order_id 
from orders o 
where o.order_id not in (
    select distinct order_id 
    from order_detail od1 where od1.detail=X 
    join order_detail od2 on od2.order_id = od1.order_id and od2.detail=Y 
) 
and o.order_id not in (
    select distinct order_id 
    from order_detail od1 where od1.detail=A 
    join order_detail od2 on od2.order_id = od1.order_id and od2.detail=B 
)

的问题是，性能是一个问题，我的order_detail表格很大，而且我在查询语言方面很缺乏经验。有更快的方式来做到这一点与较低的基数？我也对表的模式有零控制，所以我不能在那里改变任何东西。

子查询内部的不同是无用的，可能不是你的DBMS –

答

首先，我想强调的是，寻找最有效的查询是一个很好的查询和的良好指标的组合。我经常在这里看到一些问题，人们只在其中一方寻找魔法。

E.g.在各种解决方案中，当没有索引时，您的速度最慢（在修复语法错误之后），但在索引上更好一些(detail, order_id)

请注意，您拥有实际的数据和表结构。您需要尝试各种查询和索引组合以找到最佳效果;不仅仅是因为你没有指出你使用的平台，结果可能会因平台而异。

[/ ranf断]

查询

事不宜迟，戈登·利诺夫已经提供了一些良好suggestions。还有另一种选择可能会提供类似的表现。你说你不能控制模式;但是您可以使用子查询将数据转换为“友好结构”。

具体来说，如果您：

支点的数据，所以你必须每order_id
和列一排的每个detail要检查
和路口是多少订单的数量有那个细节...

然后你的查询很简单：where (x=0 or y=0) and (a=0 or b=0)。以下使用SQL Server的临时表来演示示例数据。下面的查询不管重复id, val对。

/*Set up sample data*/ 
declare @t table (
    id int, 
    val char(1) 
) 
insert @t(id, val) 
values (1, 'x'), (1, 'y'), (1, 'z'), 
     (2, 'x'), (2, 'z'), (2, 'b'), 
     (3, 'a'), (3, 'z'), (3, 'b') 

/*Option 1 manual pivoting*/ 
select t.id 
from (
     select o.id, 
       sum(case when o.val = 'a' then 1 else 0 end) as a, 
       sum(case when o.val = 'b' then 1 else 0 end) as b, 
       sum(case when o.val = 'x' then 1 else 0 end) as x, 
       sum(case when o.val = 'y' then 1 else 0 end) as y 
     from @t o 
     group by o.id 
     ) t 
where (x = 0 or y = 0) and (a = 0 or b = 0) 

/*Option 2 using Sql Server PIVOT feature*/ 
select t.id 
from (
     select id ,[a],[b],[x],[y] 
     from (select id, val from @t) src 
       pivot (count(val) for val in ([a],[b],[x],[y])) pvt 
     ) t 
where (x = 0 or y = 0) and (a = 0 or b = 0)

有趣的是，上面的选项1和2的查询计划略有不同。这表明与大型数据集不同的性能特征的可能性。

指标

注意上面可能会处理整个表。所以从索引中获得的东西很少。但是，如果表格有“长行”，那么仅处理2列的索引意味着需要从磁盘读取更少的数据。

您提供的查询结构可能受益于诸如(detail, order_id)之类的索引。这是因为服务器可以更有效地检查NOT IN子查询条件。如何受益取决于表中数据的分布。

作为一个便笺，我测试了各种查询选项，包括您的和戈登的固定版本。（尽管只有很小的数据大小）

没有上述索引，您的查询在批处理中是最慢的。
有了上述指标，Gordon的第二个查询是最慢的。

的替代查询

您的查询（固定）：戈登的第一和第二查询之间

select distinct o.id 
from @t o 
where o.id not in (
    select od1.id 
    from @t od1 
      inner join @t od2 on 
       od2.id = od1.id 
      and od2.val='Y' 
    where od1.val= 'X' 
) 
and o.id not in (
    select od1.id 
    from @t od1 
      inner join @t od2 on 
       od2.id = od1.id 
      and od2.val='a' 
    where od1.val= 'b' 
)

混合物。修复了第一个重复的问题，并在第二个表现：

select id 
from @t od 
group by id 
having ( sum(case when val in ('X') then 1 else 0 end) = 0 
     or sum(case when val in ('Y') then 1 else 0 end) = 0 
     ) 
    and( sum(case when val in ('A') then 1 else 0 end) = 0 
     or sum(case when val in ('B') then 1 else 0 end) = 0 
     )

使用INTERSECT和EXCEPT：

select id 
from @t 
except 
(
    select id 
    from @t 
    where val = 'a' 
    intersect 
    select id 
    from @t 
    where val = 'b' 
) 
except 
(
    select id 
    from @t 
    where val = 'x' 
    intersect 
    select id 
    from @t 
    where val = 'y' 
)

答

我会用聚集和having：

select order_id 
from order_detail od 
group by order_id 
having sum(case when detail in ('X', 'Y') then 1 else 0 end) < 2 and 
     sum(case when detail in ('A', 'B') then 1 else 0 end) < 2;

这假定订单不具有相同detail重复行。如果这是可能的：

select order_id 
from order_detail od 
group by order_id 
having count(distinct case when detail in ('X', 'Y') then detail end) < 2 and 
     count(distinct case when detail in ('A', 'B') then detail end) < 2;

被优化掉你的第二个查询会比第一个查询慢。尝试以下方法：当（'（X'）中的val'then 1 else 0 end'）= 0 或总和（当（'Y'）中的val then 1 else 0 end）= 0 ）和（总和（当（'A'）中的val'then 1 else 0 end'）= 0 或总和（当（'B'）中的val then 1 else 0 end）= 0 ）' –

@CraigYoung。。。样本数据没有重复，所以很有可能是OP的第一部作品。 –

同意。但是你提出了重复的可能性，并提出了第二个查询作为解决这个问题的方法。我建议对第二个查询的having子句进行更改，以达到相同的效果，而不会影响'distinct'的性能。 –

SQL - 查找一对行是否不存在的最有效方法

指标

的替代查询

相关推荐