in和exists

先创建测试数据 :

create table student
(
id int primary key not null ,
sid int
)

          create table test
(
    id int primary key not null ,
    sid int
)

create procedure insertDate()
BEGIN
    DECLARE v_id int;
    set v_id = 0;
    while v_id < 100000
    DO
       insert into student values(v_id,v_id );
       set v_id = v_id + 1;
    end while;
END ;

-- 插入数据

call insertDate ()

insert into test select * from student where sid < 100

-- 更新统计信息

runstats on table nbadv . student with distribution and detailed indexes all

runstats on table nbadv . test with distribution and detailed indexes all

1 、主查询小表，子查询大表

测试 in 和 exists

select * from test a where a . sid in ( select sid from student )

和

select * from test a where exists ( select b . sid from student b where b . sid = a . sid )

执行计划相同，如下：

可以看到 in 和 exists 都是将两个表进行连接，然后选择满足条件的记录。而且可以看到，这里使用的是嵌套循环连接（ Nest Loop Join ，简写 NLJOIN ）

测试 not in not exists

select * from test a where a . sid not in ( select sid from student )

in和exists
select

* from test a where not exists ( select b . sid from student b where b . sid = a . sid )

可以看到 not in 的左边比右边多一步，而 not exists 的右边比左边多一步。 NOT IN 是自内向外的操作，即先得到子查询的结果，然后执行最外层的查询，而 NOT EXISTS 恰好相反，是自外向内的操作 .

2 、主查询大表，子查询小表

select * from student a where a . sid in ( select sid from test )

和

select * from student a where exists ( select b . sid from test b where b . sid = a . sid )

执行计划相同 , 如下：

select * from student a where a . sid not in ( select sid from test )

select * from student a where not exists ( select b . sid from test b where b . sid = a . sid )

可以看到 not in 会取出主表的每一条记录，然后扫描从表，找出结果。而 not exists 是连接两个表找出记录，这里采用的是嵌套循环连接 (NLJOIN)

3 、主查询大表，子查询大表

select * from student a where a . sid in ( select sid from student )

和

select * from student a where exists ( select b . sid from student b where b . sid = a . sid )

执行计划一样，如下图：

可以看到 in 和 exists 都是将两个表进行连接，然后选择满足条件的记录。而且可以看到，这里使用的是哈希连接（ Hash Join ，简写 HSJOIN ）

select * from student a where a . sid not in ( select sid from student )

select * from student a where not exists ( select b . sid from student b where b . sid = a . sid )

4 、主查询小表，子查询小表

select * from test a where a . sid in ( select sid from test )

和

select * from test a where exists ( select b . sid from test b where b . sid = a . sid )

执行计划一样，如下图：

select * from test a where a . sid not in ( select sid from test )

select * from test a where not exists ( select b . sid from test b where b . sid = a . sid )

in和exists

in 和 exists 效率分析 :

当主表大表，从表是小表时， 103.54, 且都是嵌套循环连接 (NLJOIN)

当主表小表，从表是大表时， 109.96, 且都是嵌套循环连接 (NLJOIN)

当主表大表，从表是大表时， 1821.93, 且都是哈希连接（ Hash Join ，简写 HSJOIN ）

当主表小表，从表是小表时， 128.74, 且都是哈希连接（ Hash Join ，简写 HSJOIN ）

可以看出 in 和 exists 速度一样，且都是得到两边查询结果后（主查询结果和子查询结果）进行表连接来得到结果，至于采用哪种连接方式系统会自动选择。总之， in 和 exists 效率是一样的。

not in 效率分析 :

当主表大表，从表是小表时， 1332.85

当主表小表，从表是大表时， 4090.92

当主表大表，从表是大表时， 270913.84

当主表小表，从表是小表时， 131.87

可以看出主查询的结果是大表时，子查询的记录由小变大时，效率降低了 270913.84/1332.85 =203 ，也可以看出主查询的结果是小表时，子查询的记录由小变大时，效率几乎没变。

所以影响 not in 的主要是主查询表的大小。

not exists 效率分析 :

当主表大表，从表是小表时， 103.54

当主表小表，从表是大表时， 973.35

当主表大表，从表是大表时， 1822.34

当主表小表，从表是小表时， 128.74

可以看到表的大小对 not exists 没什么影响。另外，可以发现， not exists 和 exists 效率几乎一样。

注：以上测试都是在比较列有索引的情况下进行的。

总结：

in 和 exists 效率一样，用哪个都可以。但是一般来说， not exists 比 not in 速度快（在 not exists 子查询里的语句比较复杂时，有可能比 not in 慢）， 最好将 not in 转化成 not exists 。另外，要注意的是 ,in 或者 not in 里面都不能有 null 值，否则会得不到结果。

相关推荐