基于SET的查询性能低下COUNT

问题描述:

我有以下存储过程,它评估临时表中的所有帐户并确定它们是否适合导入或不适用。如果是,它会将它们标记为适当的导入=真。如果不是会给出一个理由为什么不。基于SET的查询性能低下COUNT

虽然它的SET基础仍然非常缓慢。我试过移动到EXIST而不是COUNT,但测试似乎并没有表明它会带来很大的改变。

关于可以做什么的任何建议?

CREATE OR REPLACE FUNCTION assessInclusionOfAccountsFromStaging() RETURNS BOOLEAN AS $$ /*Only new accounts are valid, detailed issues checking and budge checking duplicates*/ 

DECLARE 
    countOfAccountsInStaging INTEGER; 
BEGIN 

    /*Check that we have data to process*/ 
    countOfAccountsInStaging = COUNT(*) FROM importAccountsStaging; 
    IF(countOfAccountsInStaging) = 0 THEN 
     RAISE EXCEPTION 'No accounts available'; 
    END IF; 

    /*SET SuitableToImport*/ 
    RAISE NOTICE 'Processing accounts...'; 
    UPDATE importAccountsStaging SET suitableToImport = TRUE 
    WHERE 
     AND ((SELECT COUNT (*) FROM importAccountsStaging as accountsIterated /*Check for duplicates against staging enviroment at org level*/ 
      WHERE 
       (accountsIterated.code1 = importAccountsStaging.code1) 
       OR (accountsIterated.code2 = importAccountsStaging.code2) 
      )=1) 

     /*Check for duplicate in masterdb*/ 
     AND ((SELECT COUNT (*) FROM masterAccounts /*Check for any potential duplicate at org level*/ 
      WHERE 
       (importAccountsStaging.code1 = masterAccounts.code1) 
       OR (importAccountsStaging.code2 = masterAccounts.code2) 
      )=0) 
     ; 

     /*SET COMMENT on why it's not suitable to import*/ 
     UPDATE importAccountsStaging SET reason = CONCAT(reason , 'existing account in staging|') 
     WHERE 
      NOT ((SELECT COUNT (*) FROM importAccountsStaging as tempAccounts 
      WHERE 
       tempAccounts.code1 = importAccountsStaging.code1 
       OR tempAccounts.code2 = importAccountsStaging.code2 
      )=1); 


      /*SET COMMENT on why it's not suitable to import*/ 
     UPDATE importAccountsStaging SET reason = CONCAT(reason , 'existing account in main|') 
     WHERE 
     NOT ((SELECT COUNT (*) FROM masterAccounts 
     WHERE 
      importAccountsStaging.code1 = masterAccounts.code1 
      OR importAccountsStaging.code2 = masterAccounts.code2 
     )=0) 
     ; 

    /*Return values*/ 
RAISE NOTICE 'Assessment completed human! '; 
RETURN TRUE; 
END; $$ 
LANGUAGE plpgsql; 

非常感谢!

这是优化的查询,我从所有答案中获得了输入,所以非常感谢您的帮助。这些更改使查询在10小时内执行大约7分钟。

CREATE OR REPLACE FUNCTION assessInclusionOfAccountsFromStaging() RETURNS BOOLEAN AS $$ /*Only new accounts are valid, detailed issues checking and budge checking duplicates*/ 

DECLARE 
    countOfAccountsInStaging INTEGER; 
BEGIN 

    /*Check that we have data to process*/ 
    countOfAccountsInStaging = COUNT(*) FROM importAccountsStaging; 
    IF(countOfAccountsInStaging) = 0 THEN 
     RAISE EXCEPTION 'No accounts available'; 
    END IF; 

    RAISE NOTICE 'Processing accounts...'; 
    /*Checking value of row against the table the row belongs to for potential duplicates*/ 
    UPDATE importAccountsStaging SET reason = CONCAT(importAccountsStaging.reason , 'existing code1 in staging|') 
    WHERE ((SELECT COUNT (*) FROM importAccountsStaging as tempAccounts 
     WHERE 
      tempAccounts.code1 = importAccountsStaging.code1 
     )>1); 
    UPDATE importAccountsStaging SET reason = CONCAT(importAccountsStaging.reason , 'existing code2 in staging|') 
    WHERE ((SELECT COUNT (*) FROM importAccountsStaging as tempAccounts 
     WHERE 
      tempAccounts.code2 = importAccountsStaging.code2 
     )>1); 

    /*Checking value of row against another table*/ 
    UPDATE importAccountsStaging SET reason = CONCAT(importAccountsStaging.reason , 'existing code1 in masterDB|') 
    FROM masterAccounts WHERE importAccountsStaging.code1 = masterAccounts.code1; 
    UPDATE importAccountsStaging SET reason = CONCAT(importAccountsStaging.reason , 'existing code2 in masterDB|') 
    FROM masterAccounts WHERE importAccountsStaging.code2 = masterAccounts.code2; 

    /*Final flag where no issues were found*/ 
    UPDATE importAccountsStaging SET suitableToImport = TRUE 
    WHERE reason IS NULL; 

    /*Return values*/ 
RAISE NOTICE 'Assessment complete, all done! '; 
RETURN TRUE; 
END; $$ 
LANGUAGE plpgsql; 
+0

你也可能想尝试对分期检查以下内容: UPDATE importAccountsStaging SET原因= CONCAT(importAccountsStaging.reason, '在现有的分级编码1 |') FROM(SELECT代码1 FROM importAccountsStaging GROUP BY代码1 HAVING(COUNT( *)> 1))AS重复WHERE importAccountsStaging.code1 = duplicates.code1 我想我的语法是正确的!这可能会更快。 无论如何10小时到7分钟是一个值得骄傲的改进!请记住将问题标记为已回答。 – 2016-08-03 14:38:20

+0

谢谢@JonathanWillcock,会考虑一下。再一次感谢你的帮助! – MMV

相关子句中的OR是一个杀手 - 它可能会导致正在更新的表中的每个记录的全表扫描。

假设你只是寻找生存,而不是实际的数量,我建议:

WHERE (EXISTS (SELECT 1 
       FROM importAccountsStaging ias 
       WHERE ias.code1 = importAccountsStaging.code1 
      ) OR 
     EXISTS (SELECT 1 
       FROM importAccountsStaging ias 
       WHERE ias.code2 = importAccountsStaging.code2 
      ) 
    ) AND 
     (NOT EXISTS (SELECT 1 
        FROM masterAccounts ma 
        WHERE importAccountsStaging.code1 = ma.code1 
       ) AND 
     NOT EXISTS (SELECT 1 
        FROM masterAccounts ma 
        WHERE importAccountsStaging.code2 = ma.code2 
       ) 
    ) 

那么你一定要在importAccountsStaging(code1)importAccountsStaging(code2)masterAccounts(code1),并masterAccounts(code2)指标。

如果你正在寻找具体的计数,你可以修改它的逻辑(它应该几乎一样快)。

+0

谢谢!现在看,它会报告回来。 – MMV

+0

虽然这确实提高了性能,但主要问题是某些检查必须验证该行对应的表中的一行数据点。这意味着该行总是至少存在一次,我想知道它是否存在两次或更多次。所以我被迫仍然使用COUNT来处理这些情况。 – MMV

您可能想要考虑完全重新设计以完全摆脱OR。我强烈怀疑,如果将操作分解成更小的块,它将运行得更快。例如,而不是SELECT COUNT(*)上masterAccounts,为什么不这样做:

UPDATE importAccountsStaging SET原因= CONCAT(理由, '现有的帐户在主|') FROM masterAccounts WHERE importAccountsStaging.code1 = masterAccounts.code1 ;

UPDATE importAccountsStaging SET原因= CONCAT(原因,现有的帐户在主|') FROM masterAccounts WHERE importAccountsStaging.code2 = masterAccounts.code2;

与同类您的其他检查......然后简单地结束

UPDATE importAccountsStaging SET suitableToImport = TRUE WHERE原因是空

+0

这是非常有用的,我几乎所有的检查,以及上面提出的索引,它计算方式更快。 – MMV

+0

请注意,上面刚刚提到的组合很有用,当要测试的值是一个整数时,而不是当它的varchar时。方式较慢。 – MMV

+0

@MMV感谢您的反馈。很高兴帮助 – 2016-08-02 07:50:41

据了解反模式 - 通常COUNT(*)可以运行相当缓慢的,因为它必须扫描所有可能的行。基于EXISTS的测试应该非常快,因为执行在第一行停止。所以更新使用COUNT测试是否存在或不存在!总是使用EXISTS

+0

“count(*)”是查询中真正的性能问题的可能性极小。看到我的答案。 –

+0

带有'exists'和'count'的查询应该有不同的计划 - 当然,可能还有其他问题 - 'OR'运算符没有帮助,但主要问题是使用'count' –

+0

这确实有帮助,但没有足够的提高性能。我们仍然在6个多小时。 – MMV