寻呼和分组
我的数据库存储在sql server 2005 db中。寻呼和分组
这个查询花费不到一秒执行:
SELECT * FROM (
SELECT ROW_NUMBER() OVER (ORDER BY tblOrders.orderid) AS RowNum,
SUM(tblProducts.Price) as price
FROM tblOrders
LEFT OUTER JOIN tblOrderDetails ON tblOrders.orderid = tblOrderDetails.OrderId
LEFT OUTER JOIN tblProducts ON tblOrderDetails.ProductId = tblProducts.ProductId
GROUP BY tblOrders.orderid
) as x
where RowNum >= 21001 and RowNum < 21011
而这个查询需要10秒来执行:
SELECT * FROM (
SELECT ROW_NUMBER() OVER (ORDER BY tblOrders.orderid) AS RowNum,
SUM(tblProducts.Price) as price,
OrderDate
FROM tblOrders
LEFT OUTER JOIN tblOrderDetails ON tblOrders.orderid = tblOrderDetails.OrderId
LEFT OUTER JOIN tblProducts ON tblOrderDetails.ProductId = tblProducts.ProductId
GROUP BY tblOrders.orderid, tblOrders.OrderDate
) as x
where RowNum >= 21001 and RowNum < 21011
为什么可能会有这样的差别?
所有表都有一个名为id的列,它包含主键。不知道为什么orderid和ProductId也存在,因为我没有设计数据库。
/巴里
UPDATE
订购日期是日期时间
第二更新
记住,每三个表具有id列充当主键。但是,在引用表之间使用orderid,productid等。我不确定它为什么以这种方式实施,但我猜它是非常错误的。
tblOrders:
Id; int; no null; PK
OrderId; int; allow null
OrderDate; datetime; allow null
tblOrderDetails:
Id; int; no null; PK
OrderId; int; allow null
ProductId; int; allow null
tblProducts:
Id; int; PK; no null
ProductId; allow null
Price; money; allow null
这是足够重的查询执行计划 -
第三次更新
这只需要一秒钟的时间 -
SELECT ROW_NUMBER() OVER (ORDER BY tblOrders.orderid) AS RowNum,
SUM(tblProducts.Price) as price,
OrderDate
FROM tblOrders
LEFT OUTER JOIN tblOrderDetails ON tblOrders.orderid = tblOrderDetails.OrderId
LEFT OUTER JOIN tblProducts ON tblOrderDetails.ProductId = tblProducts.ProductId
GROUP BY tblOrders.orderid, OrderDate
和THI只发2秒 -
SELECT * FROM (
SELECT ROW_NUMBER() OVER (ORDER BY tblOrders.orderid) AS RowNum,
SUM(tblProducts.Price) as price,
MAX(tblOrders.OrderDate) as OrderDate -- do this instead of grouping
FROM tblOrders
LEFT OUTER JOIN tblOrderDetails ON tblOrders.orderid = tblOrderDetails.OrderId
LEFT OUTER JOIN tblProducts ON tblOrderDetails.ProductId = tblProducts.ProductId
GROUP BY tblOrders.orderid ) as x
但这需要10秒 -
SELECT * FROM (
SELECT ROW_NUMBER() OVER (ORDER BY tblOrders.orderid) AS RowNum,
SUM(tblProducts.Price) as price,
MAX(tblOrders.OrderDate) as OrderDate -- do this instead of grouping
FROM tblOrders
LEFT OUTER JOIN tblOrderDetails ON tblOrders.orderid = tblOrderDetails.OrderId
LEFT OUTER JOIN tblProducts ON tblOrderDetails.ProductId = tblProducts.ProductId
GROUP BY tblOrders.orderid ) as x
where RowNum >= 21001 and RowNum < 21011
where子句是添加8秒。为什么?
这离不开执行计划来回答,但是我可以猜到:
- 额外的列可能会阻止使用索引
- 慢查询的基数非常高
- 的统计数据OrderDate不知何故过时(exec sp_updatestats)
更新:您发布的执行计划确实很可怕。
创建索引:
create unique nonclustered index x0 on tblOrder(orderid) include (OrderDate)
create unique nonclustered index x1 on tblProduct (productid) include (Price)
create nonclustered index x2 on tblOrderDetails(orderid, ProductId)
我跟你赌美元到包括两个输出列表和分组子句导致你慢下来“tblOrders.OrderDate”甜甜圈。我建议你SET STATISTICS IO ON
并运行这两个查询,并看看你如何获得不同的扫描&寻求每个表。
很可能SQL引擎对于第二个查询考虑OrderDate列的计划有很大不同,导致更多的CPU处理或更多的磁盘IO(更可能)。
OrderDate是什么?约会时间?虽然这些查询看起来非常相似,但我怀疑OrderDate包含时间信息,因此排序和分组要花费得多(并且导致第二个查询的子查询中有更多的行)。
考虑以下变化:
SELECT RowNum, price, DD = DATEADD(DAY, DD, '19000101') FROM (
SELECT ROW_NUMBER() OVER (ORDER BY tblOrders.orderid) AS RowNum,
SUM(tblProducts.Price) as price,
DATEDIFF(DAY, '19000101', tblOrders.OrderDate) as DD
FROM tblOrders
LEFT OUTER JOIN tblOrderDetails ON tblOrders.orderid = tblOrderDetails.OrderId
LEFT OUTER JOIN tblProducts ON tblOrderDetails.ProductId = tblProducts.ProductId
GROUP BY tblOrders.orderid, DATEDIFF(DAY, '19000101', tblOrders.OrderDate)
) as x
where RowNum >= 21001 and RowNum < 21011
ORDER BY RowNum;
在SQL Server 2008或更好的,你可以简化到CONVERT(DATE, OrderDate)
...
没有实际的表结构和执行计划,我不能准确回答,但如果orderid在tblOrders中是唯一的,比从group by statement中删除OrderDate更好,并且在选择列表中将它添加为min(tblOrders.OrderDate) as OrderDate
。它应该给出相同的结果(如果tblOrders.orderid是唯一键)但工作得更好。
这需要11秒钟来执行。 – Baz 2012-04-16 18:53:17
请张贴两个执行计划的截图。 – usr 2012-04-15 21:12:40
或者某个地方的实际计划......比包含操作员以外的其他信息的屏幕截图更有用。还要确保他们是实际的计划,而不是估计的计划。 – 2012-04-15 23:12:21
你的桌子上有索引吗?哪一个? – oryol 2012-04-16 19:40:45