基于SET的查询性能低下COUNT
我有以下存储过程,它评估临时表中的所有帐户并确定它们是否适合导入或不适用。如果是,它会将它们标记为适当的导入=真。如果不是会给出一个理由为什么不。基于SET的查询性能低下COUNT
虽然它的SET基础仍然非常缓慢。我试过移动到EXIST而不是COUNT,但测试似乎并没有表明它会带来很大的改变。
关于可以做什么的任何建议?
CREATE OR REPLACE FUNCTION assessInclusionOfAccountsFromStaging() RETURNS BOOLEAN AS $$ /*Only new accounts are valid, detailed issues checking and budge checking duplicates*/
DECLARE
countOfAccountsInStaging INTEGER;
BEGIN
/*Check that we have data to process*/
countOfAccountsInStaging = COUNT(*) FROM importAccountsStaging;
IF(countOfAccountsInStaging) = 0 THEN
RAISE EXCEPTION 'No accounts available';
END IF;
/*SET SuitableToImport*/
RAISE NOTICE 'Processing accounts...';
UPDATE importAccountsStaging SET suitableToImport = TRUE
WHERE
AND ((SELECT COUNT (*) FROM importAccountsStaging as accountsIterated /*Check for duplicates against staging enviroment at org level*/
WHERE
(accountsIterated.code1 = importAccountsStaging.code1)
OR (accountsIterated.code2 = importAccountsStaging.code2)
)=1)
/*Check for duplicate in masterdb*/
AND ((SELECT COUNT (*) FROM masterAccounts /*Check for any potential duplicate at org level*/
WHERE
(importAccountsStaging.code1 = masterAccounts.code1)
OR (importAccountsStaging.code2 = masterAccounts.code2)
)=0)
;
/*SET COMMENT on why it's not suitable to import*/
UPDATE importAccountsStaging SET reason = CONCAT(reason , 'existing account in staging|')
WHERE
NOT ((SELECT COUNT (*) FROM importAccountsStaging as tempAccounts
WHERE
tempAccounts.code1 = importAccountsStaging.code1
OR tempAccounts.code2 = importAccountsStaging.code2
)=1);
/*SET COMMENT on why it's not suitable to import*/
UPDATE importAccountsStaging SET reason = CONCAT(reason , 'existing account in main|')
WHERE
NOT ((SELECT COUNT (*) FROM masterAccounts
WHERE
importAccountsStaging.code1 = masterAccounts.code1
OR importAccountsStaging.code2 = masterAccounts.code2
)=0)
;
/*Return values*/
RAISE NOTICE 'Assessment completed human! ';
RETURN TRUE;
END; $$
LANGUAGE plpgsql;
非常感谢!
这是优化的查询,我从所有答案中获得了输入,所以非常感谢您的帮助。这些更改使查询在10小时内执行大约7分钟。
CREATE OR REPLACE FUNCTION assessInclusionOfAccountsFromStaging() RETURNS BOOLEAN AS $$ /*Only new accounts are valid, detailed issues checking and budge checking duplicates*/
DECLARE
countOfAccountsInStaging INTEGER;
BEGIN
/*Check that we have data to process*/
countOfAccountsInStaging = COUNT(*) FROM importAccountsStaging;
IF(countOfAccountsInStaging) = 0 THEN
RAISE EXCEPTION 'No accounts available';
END IF;
RAISE NOTICE 'Processing accounts...';
/*Checking value of row against the table the row belongs to for potential duplicates*/
UPDATE importAccountsStaging SET reason = CONCAT(importAccountsStaging.reason , 'existing code1 in staging|')
WHERE ((SELECT COUNT (*) FROM importAccountsStaging as tempAccounts
WHERE
tempAccounts.code1 = importAccountsStaging.code1
)>1);
UPDATE importAccountsStaging SET reason = CONCAT(importAccountsStaging.reason , 'existing code2 in staging|')
WHERE ((SELECT COUNT (*) FROM importAccountsStaging as tempAccounts
WHERE
tempAccounts.code2 = importAccountsStaging.code2
)>1);
/*Checking value of row against another table*/
UPDATE importAccountsStaging SET reason = CONCAT(importAccountsStaging.reason , 'existing code1 in masterDB|')
FROM masterAccounts WHERE importAccountsStaging.code1 = masterAccounts.code1;
UPDATE importAccountsStaging SET reason = CONCAT(importAccountsStaging.reason , 'existing code2 in masterDB|')
FROM masterAccounts WHERE importAccountsStaging.code2 = masterAccounts.code2;
/*Final flag where no issues were found*/
UPDATE importAccountsStaging SET suitableToImport = TRUE
WHERE reason IS NULL;
/*Return values*/
RAISE NOTICE 'Assessment complete, all done! ';
RETURN TRUE;
END; $$
LANGUAGE plpgsql;
相关子句中的OR
是一个杀手 - 它可能会导致正在更新的表中的每个记录的全表扫描。
假设你只是寻找生存,而不是实际的数量,我建议:
WHERE (EXISTS (SELECT 1
FROM importAccountsStaging ias
WHERE ias.code1 = importAccountsStaging.code1
) OR
EXISTS (SELECT 1
FROM importAccountsStaging ias
WHERE ias.code2 = importAccountsStaging.code2
)
) AND
(NOT EXISTS (SELECT 1
FROM masterAccounts ma
WHERE importAccountsStaging.code1 = ma.code1
) AND
NOT EXISTS (SELECT 1
FROM masterAccounts ma
WHERE importAccountsStaging.code2 = ma.code2
)
)
那么你一定要在importAccountsStaging(code1)
,importAccountsStaging(code2)
,masterAccounts(code1)
,并masterAccounts(code2)
指标。
如果你正在寻找具体的计数,你可以修改它的逻辑(它应该几乎一样快)。
您可能想要考虑完全重新设计以完全摆脱OR。我强烈怀疑,如果将操作分解成更小的块,它将运行得更快。例如,而不是SELECT COUNT(*)上masterAccounts,为什么不这样做:
UPDATE importAccountsStaging SET原因= CONCAT(理由, '现有的帐户在主|') FROM masterAccounts WHERE importAccountsStaging.code1 = masterAccounts.code1 ;
UPDATE importAccountsStaging SET原因= CONCAT(原因,现有的帐户在主|') FROM masterAccounts WHERE importAccountsStaging.code2 = masterAccounts.code2;
与同类您的其他检查......然后简单地结束
UPDATE importAccountsStaging SET suitableToImport = TRUE WHERE原因是空
据了解反模式 - 通常COUNT(*)
可以运行相当缓慢的,因为它必须扫描所有可能的行。基于EXISTS
的测试应该非常快,因为执行在第一行停止。所以更新使用COUNT测试是否存在或不存在!总是使用EXISTS
。
“count(*)”是查询中真正的性能问题的可能性极小。看到我的答案。 –
带有'exists'和'count'的查询应该有不同的计划 - 当然,可能还有其他问题 - 'OR'运算符没有帮助,但主要问题是使用'count' –
这确实有帮助,但没有足够的提高性能。我们仍然在6个多小时。 – MMV
你也可能想尝试对分期检查以下内容: UPDATE importAccountsStaging SET原因= CONCAT(importAccountsStaging.reason, '在现有的分级编码1 |') FROM(SELECT代码1 FROM importAccountsStaging GROUP BY代码1 HAVING(COUNT( *)> 1))AS重复WHERE importAccountsStaging.code1 = duplicates.code1 我想我的语法是正确的!这可能会更快。 无论如何10小时到7分钟是一个值得骄傲的改进!请记住将问题标记为已回答。 – 2016-08-03 14:38:20
谢谢@JonathanWillcock,会考虑一下。再一次感谢你的帮助! – MMV