在数百万行的表格上执行聚合函数
我遇到一些数百万行表的严重性能问题,我觉得我应该可以从相当快的速度得到结果。这里有一个跑步的我有什么了,怎么我查询它,它采取了多久:在数百万行的表格上执行聚合函数
我运行SQL Server 2008标准版,所以分区目前还不是一个选项
我试图为过去30天内特定帐户的所有广告资源汇总所有视图。
所有视图都存储在如下表所示:
CREATE TABLE [dbo].[LogInvSearches_Daily]( [ID] [bigint] IDENTITY(1,1) NOT NULL, [Inv_ID] [int] NOT NULL, [Site_ID] [int] NOT NULL, [LogCount] [int] NOT NULL, [LogDay] [smalldatetime] NOT NULL, CONSTRAINT [PK_LogInvSearches_Daily] PRIMARY KEY CLUSTERED ( [ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY] ) ON [PRIMARY]
这个表有1.32亿分的记录,超过4场音乐会。
10行从表中的样品:
ID Inv_ID Site_ID LogCount LogDay -------------------- ----------- ----------- ----------- ----------------------- 1 486752 48 14 2009-07-21 00:00:00 2 119314 51 16 2009-07-21 00:00:00 3 313678 48 25 2009-07-21 00:00:00 4 298863 0 1 2009-07-21 00:00:00 5 119996 0 2 2009-07-21 00:00:00 6 463777 534 7 2009-07-21 00:00:00 7 339976 503 2 2009-07-21 00:00:00 8 333501 570 4 2009-07-21 00:00:00 9 453955 0 12 2009-07-21 00:00:00 10 443291 0 4 2009-07-21 00:00:00 (10 row(s) affected)
- 我已经在LogInvSearches_Daily下列指数:
/****** Object: Index [IX_LogInvSearches_Daily_LogDay] Script Date: 05/12/2010 11:08:22 ******/ CREATE NONCLUSTERED INDEX [IX_LogInvSearches_Daily_LogDay] ON [dbo].[LogInvSearches_Daily] ( [LogDay] ASC ) INCLUDE ([Inv_ID], [LogCount]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
- 我需要拉库存只能从库存中为特定的帐户编号。我在库存上也有一个索引。
我正在使用以下查询来汇总数据并给出前5条记录。此查询目前正在24秒返回5行:
StmtText ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- SELECT TOP 5 Sum(LogCount) AS Views , DENSE_RANK() OVER(ORDER BY Sum(LogCount) DESC, Inv_ID DESC) AS Rank , Inv_ID FROM LogInvSearches_Daily D (NOLOCK) WHERE LogDay > DateAdd(d, -30, getdate()) AND EXISTS( SELECT NULL FROM propertyControlCenter.dbo.Inventory (NOLOCK) WHERE Acct_ID = 18731 AND Inv_ID = D.Inv_ID ) GROUP BY Inv_ID (1 row(s) affected) StmtText ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |--Top(TOP EXPRESSION:((5))) |--Sequence Project(DEFINE:([Expr1007]=dense_rank)) |--Segment |--Segment |--Sort(ORDER BY:([Expr1006] DESC, [D].[Inv_ID] DESC)) |--Stream Aggregate(GROUP BY:([D].[Inv_ID]) DEFINE:([Expr1006]=SUM([LOALogs].[dbo].[LogInvSearches_Daily].[LogCount] as [D].[LogCount]))) |--Sort(ORDER BY:([D].[Inv_ID] ASC)) |--Nested Loops(Inner Join, OUTER REFERENCES:([D].[Inv_ID])) |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1011], [Expr1012], [Expr1010])) | |--Compute Scalar(DEFINE:(([Expr1011],[Expr1012],[Expr1010])=GetRangeWithMismatchedTypes(dateadd(day,(-30),getdate()),NULL,(6)))) | | |--Constant Scan | |--Index Seek(OBJECT:([LOALogs].[dbo].[LogInvSearches_Daily].[IX_LogInvSearches_Daily_LogDay] AS [D]), SEEK:([D].[LogDay] > [Expr1011] AND [D].[LogDay] < [Expr1012]) ORDERED FORWARD) |--Index Seek(OBJECT:([propertyControlCenter].[dbo].[Inventory].[IX_Inventory_Acct_ID]), SEEK:([propertyControlCenter].[dbo].[Inventory].[Acct_ID]=(18731) AND [propertyControlCenter].[dbo].[Inventory].[Inv_ID]=[LOA (13 row(s) affected)
我使用CTE先拿起行,它们聚集试过,但没有跑得更快,并给我本质上是相同的执行计划。
(1 row(s) affected) StmtText ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --SET SHOWPLAN_TEXT ON; WITH getSearches AS ( SELECT LogCount -- , DENSE_RANK() OVER(ORDER BY Sum(LogCount) DESC, Inv_ID DESC) AS Rank , D.Inv_ID FROM LogInvSearches_Daily D (NOLOCK) INNER JOIN propertyControlCenter.dbo.Inventory I (NOLOCK) ON Acct_ID = 18731 AND I.Inv_ID = D.Inv_ID WHERE LogDay > DateAdd(d, -30, getdate()) -- GROUP BY Inv_ID ) SELECT Sum(LogCount) AS Views, Inv_ID FROM getSearches GROUP BY Inv_ID (1 row(s) affected) StmtText ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |--Stream Aggregate(GROUP BY:([D].[Inv_ID]) DEFINE:([Expr1004]=SUM([LOALogs].[dbo].[LogInvSearches_Daily].[LogCount] as [D].[LogCount]))) |--Sort(ORDER BY:([D].[Inv_ID] ASC)) |--Nested Loops(Inner Join, OUTER REFERENCES:([D].[Inv_ID])) |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1008], [Expr1009], [Expr1007])) | |--Compute Scalar(DEFINE:(([Expr1008],[Expr1009],[Expr1007])=GetRangeWithMismatchedTypes(dateadd(day,(-30),getdate()),NULL,(6)))) | | |--Constant Scan | |--Index Seek(OBJECT:([LOALogs].[dbo].[LogInvSearches_Daily].[IX_LogInvSearches_Daily_LogDay] AS [D]), SEEK:([D].[LogDay] > [Expr1008] AND [D].[LogDay] < [Expr1009]) ORDERED FORWARD) |--Index Seek(OBJECT:([propertyControlCenter].[dbo].[Inventory].[IX_Inventory_Acct_ID] AS [I]), SEEK:([I].[Acct_ID]=(18731) AND [I].[Inv_ID]=[LOALogs].[dbo].[LogInvSearches_Daily].[Inv_ID] as [D].[Inv_ID]) ORDERED FORWARD) (8 row(s) affected) (1 row(s) affected)
所以因为我得到很好的索引搜索在我的执行计划,我能做些什么来得到这个运行速度更快?
UPDATE:
这是同样的查询运行没有DENSE_RANK(),它需要完全相同的24秒内运行,使我有相同的基本查询计划:
StmtText ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --SET SHOWPLAN_TEXT ON SELECT TOP 5 Sum(LogCount) AS Views , Inv_ID FROM LogInvSearches_Daily D (NOLOCK) WHERE LogDay > DateAdd(d, -30, getdate()) AND EXISTS( SELECT NULL FROM propertyControlCenter.dbo.Inventory (NOLOCK) WHERE Acct_ID = 18731 AND Inv_ID = D.Inv_ID ) GROUP BY Inv_ID ORDER BY Views, Inv_ID (1 row(s) affected) StmtText ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |--Sort(TOP 5, ORDER BY:([Expr1006] ASC, [D].[Inv_ID] ASC)) |--Stream Aggregate(GROUP BY:([D].[Inv_ID]) DEFINE:([Expr1006]=SUM([LOALogs].[dbo].[LogInvSearches_Daily].[LogCount] as [D].[LogCount]))) |--Sort(ORDER BY:([D].[Inv_ID] ASC)) |--Nested Loops(Inner Join, OUTER REFERENCES:([D].[Inv_ID])) |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1010], [Expr1011], [Expr1009])) | |--Compute Scalar(DEFINE:(([Expr1010],[Expr1011],[Expr1009])=GetRangeWithMismatchedTypes(dateadd(day,(-30),getdate()),NULL,(6)))) | | |--Constant Scan | |--Index Seek(OBJECT:([LOALogs].[dbo].[LogInvSearches_Daily].[IX_LogInvSearches_Daily_LogDay] AS [D]), SEEK:([D].[LogDay] > [Expr1010] AND [D].[LogDay] < [Expr1011]) ORDERED FORWARD) |--Index Seek(OBJECT:([propertyControlCenter].[dbo].[Inventory].[IX_Inventory_Acct_ID]), SEEK:([propertyControlCenter].[dbo].[Inventory].[Acct_ID]=(18731) AND [propertyControlCenter].[dbo].[Inventory].[Inv_ID]=[LOALogs].[dbo].[LogInvS (9 row(s) affected)
谢谢,
丹
我还没经过你的整个阅读的问题(我会走到那不久),但回答的早期评论:你可以在SQL Server 2008标准版中使用分区使用分区视图。它被划分为表(这被公认为更灵活),仅限于企业版。
Paritioned看待信息:http://msdn.microsoft.com/en-us/library/ms190019.aspx
对更广泛的问题,我想知道,如果你真的需要DENSE_RANK在那里。我想知道你是否在DENSE_RANK的ORDER BY和查询本身的ORDER BY之间感到困惑。由于它站在你的TOP 5将返回5 undefined记录,因为SQL Server不保证记录上的任何顺序,除非指定了ORDER BY子句(你还没有完成)。如果您将ORDER BY从DENSE_RANK向下移动以成为整个查询ORDER BY,那么记录会按我想的方式出现,并且它将消除对昂贵的DENSE_RANK聚合函数的需要。
SELECT TOP 5
SUM([LogCount]) AS [Views],
[Inv_ID]
FROM [LogInvSearches_Daily] D (NOLOCK)
WHERE
[LogDay] > DateAdd(d, -30, getdate())
AND EXISTS(
SELECT *
FROM Inventory (NOLOCK)
WHERE Acct_ID = 18731
AND Inv_ID = D.Inv_ID
)
GROUP BY
Inv_ID
ORDER BY
[Views] DESC,
[Inv_ID]
UPDATE:
的时间可能正在使用在这里:
|--Sort(ORDER BY:([D].[Inv_ID] ASC))
你可以尝试创建一个覆盖索引像这样的:
CREATE NONCLUSTERED INDEX [IX_LogInvSearches_Daily_Perf] ON [dbo].[LogInvSearches_Daily]
(
[Inv_ID] ASC,
[LogDay] ASC
)
INCLUDE
(
[LogCount]
)
注我也稍微改变了ORDER BY(Inv_ID现在是这样用ASC代替DESC)。我怀疑这种改变不会以有问题的方式影响结果,但可能有助于提高性能,因为它将按照与它们分组相同的顺序返回行(尽管这可能是不相关的!)。
DENSE_RANK()或不,结果仍然一样慢。我已经尝试了这两种方式,并且我仍然无法将此加载速度超过24秒。更新后显示查询计划和时间为同一查询没有DENSE_RANK() – 2010-05-12 18:14:09
我已经更新我的回答与索引建议 – 2010-05-12 18:33:13
我认为该索引将做的伎俩。现在我只需要弄清楚如何在不关闭整个服务器的情况下创建索引... – 2010-05-13 15:47:29
分区之外,
基于我们比你们大表的经验,我们提取数据到一个临时表(不表变量)和聚合上。并非针对所有查询,而是更复杂的查询。
除此之外,我同意丹尼尔·伦肖的有关DENSE_RANK
观察我还认为有关移动[Inv_ID],[LogCount]进入指数(不包括,或许还有一个降序排序)
那么这就是聚合表...我们有一个由MS表ms创建的ms,然后将所有这些请求转换成天。我现在试图查询。我无法将其分解得更远,因为这些将是用户根据需要为其帐户运行的动态查询。 – 2010-05-12 18:21:53
Acct_ID位于Inventory表上,似乎有自己的索引(IX_Inventory_Acct_ID)。也许如果Inventory(Acct_Id,Inv_Id)上的索引和LogInvSearches_Daily(Inv_Id,LogDay)周围聚集(或至少索引),您会有更多的运气。
顺便说一句,我不知道什么LogInvSearches_Daily.ID当前的聚类索引应该买你。为什么导入时在磁盘上有近距离ID的记录?
你能提供一个你想看到的输出的例子吗?目前还不清楚为什么你需要DENSE_RANK。 – 2010-05-12 16:47:00
我只需要排名前5位。刚刚发布了更新,显示了使用或不使用DENSE_RANK()的完全相同的性能。 – 2010-05-12 18:18:09