都是标量子查询惹的祸
都是标量子查询惹的祸
系统又报了一个跑的慢的sql语句,看图就知道这个很恐怖的,已经跑了1天了,还需要跑6个月的时间,
把sql语句拿出来瞅瞅:
--201406 XX
select SUM(UNPOSTING_AMT_CYC) 分期未结清余额,
SUM(greatest(nvl(BAL_TOTAL,0),0)) 余额,COUNT(1),DELQ_LEVEL_CI,city,FQ_TOTAL,
CASE WHEN greatest(nvl(BAL_TOTAL,0),0) < 500 THEN '1.[0,500)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 500
AND greatest(nvl(BAL_TOTAL,0),0) < 5000 THEN '2.[500,5000)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 5000
AND greatest(nvl(BAL_TOTAL,0),0) < 20000 THEN '3.[5000,2W)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 20000
AND greatest(nvl(BAL_TOTAL,0),0) < 50000 THEN '4[2W,5W)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 50000 THEN '5[5W,+)'
else '其他' end 额度区间,
CASE WHEN EXISTS (SELECT 1 FROM RISKPUBSC.MICRO_BUSI_DATABASE B WHERE STATUS_CURR IS NOT NULL AND A.ACCT= B.ACCT) THEN '小企业主'
else '非小企业主' end 小企业主标识
from riskpubstrategy.lwt_ambs_cc_201406 A
group by DELQ_LEVEL_CI,city,FQ_TOTAL,
CASE WHEN greatest(nvl(BAL_TOTAL,0),0) < 500 THEN '1.[0,500)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 500
AND greatest(nvl(BAL_TOTAL,0),0) < 5000 THEN '2.[500,5000)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 5000
AND greatest(nvl(BAL_TOTAL,0),0) < 20000 THEN '3.[5000,2W)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 20000
AND greatest(nvl(BAL_TOTAL,0),0) < 50000 THEN '4[2W,5W)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 50000 THEN '5[5W,+)'
else '其他' end,
CASE WHEN EXISTS (SELECT 1 FROM RISKPUBSC.MICRO_BUSI_DATABASE B WHERE STATUS_CURR IS NOT NULL AND A.ACCT= B.ACCT) THEN '小企业主'
else '非小企业主' end;
内存中的执行计划:SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('a50xqp45uy256',0,'advanced'));
简单分析下,查询的只有一个表,这就很奇怪了,不管这个表多么大,也不可能跑这么久的吧,但是有一个标量子查询,根据经验,如果标量子查询的表数据量很多,或者关联的列没有索引的话,标量子查询的性能将会是非常差的,我们由执行计划也可以看出标量子查询的表RISKPUBSC.MICRO_BUSI_DATABASE的关联列没有索引,至此,我们猜测可能是由于这个标量子查询引起sql语句性能低下,猜测归猜测,那我们实验一下呢?
先去掉标量子查询,然后执行一下,如下:
--201406 XX
select SUM(UNPOSTING_AMT_CYC) 分期未结清余额,
SUM(greatest(nvl(BAL_TOTAL,0),0)) 余额,COUNT(1),DELQ_LEVEL_CI,city,FQ_TOTAL,
CASE WHEN greatest(nvl(BAL_TOTAL,0),0) < 500 THEN '1.[0,500)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 500
AND greatest(nvl(BAL_TOTAL,0),0) < 5000 THEN '2.[500,5000)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 5000
AND greatest(nvl(BAL_TOTAL,0),0) < 20000 THEN '3.[5000,2W)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 20000
AND greatest(nvl(BAL_TOTAL,0),0) < 50000 THEN '4[2W,5W)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 50000 THEN '5[5W,+)'
else '其他' end 额度区间/*,
CASE WHEN EXISTS (SELECT 1 FROM RISKPUBSC.MICRO_BUSI_DATABASE B WHERE STATUS_CURR IS NOT NULL AND A.ACCT= B.ACCT) THEN '小企业主'
else '非小企业主' end 小企业主标识*/
from riskpubstrategy.lwt_ambs_cc_201406 A
group by DELQ_LEVEL_CI,city,FQ_TOTAL,
CASE WHEN greatest(nvl(BAL_TOTAL,0),0) < 500 THEN '1.[0,500)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 500
AND greatest(nvl(BAL_TOTAL,0),0) < 5000 THEN '2.[500,5000)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 5000
AND greatest(nvl(BAL_TOTAL,0),0) < 20000 THEN '3.[5000,2W)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 20000
AND greatest(nvl(BAL_TOTAL,0),0) < 50000 THEN '4[2W,5W)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 50000 THEN '5[5W,+)'
else '其他' end/*,
CASE WHEN EXISTS (SELECT 1 FROM RISKPUBSC.MICRO_BUSI_DATABASE B WHERE STATUS_CURR IS NOT NULL AND A.ACCT= B.ACCT) THEN '小企业主'
else '非小企业主' end*/
执行一下: ,发现这个sql很快的,大约274秒,也就是5分钟左右,好吧,看来真是标量子查询惹的祸,,,,,,那我们首先建立索引看看性能如何呢?
create index ind_MICRO_BUSI_DATABASE_acct on RISKPUBSC.MICRO_BUSI_DATABASE(acct) NOLOGGING parallel 8;
alter index ind_MICRO_BUSI_DATABASE_acct NOPARALLEL;
然后再重新执行一下之前的语句:
--201406 XX
select SUM(UNPOSTING_AMT_CYC) 分期未结清余额,
SUM(greatest(nvl(BAL_TOTAL,0),0)) 余额,COUNT(1),DELQ_LEVEL_CI,city,FQ_TOTAL,
CASE WHEN greatest(nvl(BAL_TOTAL,0),0) < 500 THEN '1.[0,500)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 500
AND greatest(nvl(BAL_TOTAL,0),0) < 5000 THEN '2.[500,5000)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 5000
AND greatest(nvl(BAL_TOTAL,0),0) < 20000 THEN '3.[5000,2W)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 20000
AND greatest(nvl(BAL_TOTAL,0),0) < 50000 THEN '4[2W,5W)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 50000 THEN '5[5W,+)'
else '其他' end 额度区间,
CASE WHEN EXISTS (SELECT 1 FROM RISKPUBSC.MICRO_BUSI_DATABASE B WHERE STATUS_CURR IS NOT NULL AND A.ACCT= B.ACCT) THEN '小企业主'
else '非小企业主' end 小企业主标识
from riskpubstrategy.lwt_ambs_cc_201406 A
group by DELQ_LEVEL_CI,city,FQ_TOTAL,
CASE WHEN greatest(nvl(BAL_TOTAL,0),0) < 500 THEN '1.[0,500)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 500
AND greatest(nvl(BAL_TOTAL,0),0) < 5000 THEN '2.[500,5000)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 5000
AND greatest(nvl(BAL_TOTAL,0),0) < 20000 THEN '3.[5000,2W)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 20000
AND greatest(nvl(BAL_TOTAL,0),0) < 50000 THEN '4[2W,5W)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 50000 THEN '5[5W,+)'
else '其他' end,
CASE WHEN EXISTS (SELECT 1 FROM RISKPUBSC.MICRO_BUSI_DATABASE B WHERE STATUS_CURR IS NOT NULL AND A.ACCT= B.ACCT) THEN '小企业主'
else '非小企业主' end;
加了索引后其执行计划:
发现性能比较好,,大约就8分钟搞定,,,,,,至此其实这个sql加个索引就算优化好了,但是根据经验,如果标量子查询的表的数据量超大(一般认为超过100W,不是绝对的)的时候,而且关联的列又不是主键的时候,标量子查询的性能其实不是怎么好,,,,,,好吧,那我们来看看他们的数据量如何呢?????
首先看看数据量,
SELECT v.TABLE_NAME,
v.BLOCKS,
v.table_size2,
v.NUM_ROWS,
v.LAST_ANALYZED
FROM vw_table_lhr v
WHERE v.TABLE_NAME IN ('LWT_AMBS_CC_201406',
'MICRO_BUSI_DATABASE');
发现标量子查询的表是有点大,300W的数据量,如果是标量子查询的话,大约要对标量子查询的表(MICRO_BUSI_DATABASE)扫描1988W次,相当于NL连接了,所以这个性能不怎么好,那么就修改为左外连接呗,修改后的sql:
--201406 XX
select SUM(UNPOSTING_AMT_CYC) 分期未结清余额,
SUM(greatest(nvl(BAL_TOTAL,0),0)) 余额,COUNT(1),DELQ_LEVEL_CI,city,FQ_TOTAL,
CASE WHEN greatest(nvl(BAL_TOTAL,0),0) < 500 THEN '1.[0,500)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 500
AND greatest(nvl(BAL_TOTAL,0),0) < 5000 THEN '2.[500,5000)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 5000
AND greatest(nvl(BAL_TOTAL,0),0) < 20000 THEN '3.[5000,2W)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 20000
AND greatest(nvl(BAL_TOTAL,0),0) < 50000 THEN '4[2W,5W)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 50000 THEN '5[5W,+)'
else '其他' end 额度区间,
CASE WHEN b.ACCT is not null THEN '小企业主'
else '非小企业主' end 小企业主标识
from riskpubstrategy.lwt_ambs_cc_201406 A
LEFT outer join RISKPUBSC.MICRO_BUSI_DATABASE B
on (a.ACCT=b.ACCT)
group by DELQ_LEVEL_CI,city,FQ_TOTAL,
CASE WHEN greatest(nvl(BAL_TOTAL,0),0) < 500 THEN '1.[0,500)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 500
AND greatest(nvl(BAL_TOTAL,0),0) < 5000 THEN '2.[500,5000)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 5000
AND greatest(nvl(BAL_TOTAL,0),0) < 20000 THEN '3.[5000,2W)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 20000
AND greatest(nvl(BAL_TOTAL,0),0) < 50000 THEN '4[2W,5W)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 50000 THEN '5[5W,+)'
else '其他' end,
CASE WHEN b.ACCT is not null THEN '小企业主'
else '非小企业主' end
执行计划:
时间呢?????尼玛,,,,,,,这么快,,,,,由6个月转换为44秒,,,,,,又是由自行车的速度到飞船的速度呀,,,,,,飞跃吧,,,,,,,,
这里细心的人会发现,我去掉了b.STATUS_CURR is not null 这个条件,因为我查询过这个列本身就没有为空的数据,,,,,这个又是个小小的优化哟,,,,,
最后简单总结下:
标量子查询的使用是有条件的:
①标量子查询的涉及的表的关联列必须要有索引或者关联列是主键列
②标量子查询的涉及的表的数据量不能太大,数据量太大即使关联列有索引依然很慢的
③对同一个表不能有多个标量子查询,这样性能也不会很高
第三个注意事项是神马意思???????简单举个例子:
Select (select a from taba t1 where t1.id=t.id) a,
(select b from taba t1 where t1.id=t.id) b,
(select c from taba t1 where t1.id=t.id) c
……….
From tabb ;
这种情况大家还是修改为左外连接比较快。。。。。。如果taba稍微大点的话就会导致到大表的多次扫描,这个严重影响性能。。。。。。。。。
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/26736162/viewspace-1222431/,如需转载,请注明出处,否则将追究法律责任。