如何优化此查询？需要3分钟才能运行

问题描述：

具有一排2000-01-01和2040-01-01之间的每个日期共计14610行
甲locations表，其具有用于每一个位置总共12个行
具有一个A receipts表中id和name甲calendar表id和datetime，以及其他几个不相关的总共约250,000行的字段

如果没有收据，我试图根据位置分组的日期范围内每天获得receipts的计数。

我有一个工作查询，但它需要〜3分钟运行：

SELECT 
    `locations`.`name` AS `location`, 
    `calendar`.`date` AS `date`, 
    COUNT(`receipts`.`id`) AS `count` 
FROM `locations` 
    CROSS JOIN `calendar` 
    LEFT JOIN `receipts` ON `calendar`.`date` = DATE(`receipts`.`datetime`) 
     AND `locations`.`id` = UPPER(LEFT(`receipts`.`id`, 1)) # there is no `location_id` FK. First char of receipts id is same as location id 
WHERE `calendar`.`date` >= '2017-04-01' AND `calendar`.`date` <= '2017-04-07' 
GROUP BY `locations`.`id`, `calendar`.`id` 
ORDER BY `locations`.`name` ASC, `calendar`.`date` ASC;

我相信它有事情做与WHERE声明。

我改变了WHERE这个，而不是它运行瞬间，但它不再让我零计数无收款：

SELECT 
    `locations`.`name` AS `location`, 
    `calendar`.`date` AS `date`, 
    COUNT(`receipts`.`id`) AS `count` 
FROM `locations` 
    CROSS JOIN `calendar` 
    LEFT JOIN `receipts` ON `calendar`.`date` = DATE(`receipts`.`datetime`) 
     AND `locations`.`id` = UPPER(LEFT(`receipts`.`id`, 1)) # there is no `location_id` FK. First char of receipts id is same as location id 
WHERE DATE(`receipts`.`datetime`) >= '2017-04-01' AND DATE(`receipts`.`datetime`) <= '2017-04-07' 
GROUP BY `locations`.`id`, `calendar`.`id` 
ORDER BY `locations`.`name` ASC, `calendar`.`date` ASC;

然后我开始与子查询乱搞，但没有成功：

SELECT 
    `locations`.`name` AS `location`, 
    `cal`.`date` AS `date`, 
    COUNT(`receipts`.`id`) AS `count` 
FROM `locations` 
    CROSS JOIN (
     SELECT `calendar`.`id`, `calendar`.`date` 
     FROM `calendar` 
     WHERE `calendar`.`date` >= '2017-04-01' AND `calendar`.`date` <= '2017-04-07' 
    ) `cal` 
    LEFT JOIN `receipts` ON `cal`.`date` = DATE(`receipts`.`datetime`) 
     AND `locations`.`id` = UPPER(LEFT(`receipts`.`id`, 1)) # there is no `location_id` FK. First char of receipts id is same as location id 
WHERE DATE(`receipts`.`datetime`) >= '2017-04-01' AND DATE(`receipts`.`datetime`) <= '2017-04-07' 
GROUP BY `locations`.`id`, `cal`.`id` 
ORDER BY `locations`.`name` ASC, `cal`.`date` ASC;

无论如何，我可以加快第一个查询，因为这是给我我想要的输出的那个？

我认为这里的字符串比较是导致经济放缓，你可以考虑增加一个外键？ –

你有适当的索引吗？尝试解释你的select语句并检查它是否使用索引。 – money

并在问题中包含解释的结果，所以我们可以看到。 – Shadow

答

试试这个：

SELECT l.name location, c.date, COUNT(r.id) count 
FROM calendar c 
    left join calendar n on n.Date = c.Date + 1 -- one day after c.date 
    left join (locations l join receipts r 
       on r.id like '%' + l.Id) 
    on r.datetime between c.Date and n.Date 
where c.Date between '2017-04-01' and '2017-04-07' 
GROUP BY l.id, c.id 
ORDER BY l.name, c.date;

你的问题的原因是：
1.You使用交叉联接是不必要。交叉连接创建笛卡尔产品（一边的每一行都与另一边的每一行相结合）。因此，将字母与十位数字交叉连接将产生260行{A0，A1，A2 ... A9，B1， B2，.... B9 ...等}
2。在你的SQL查询中有多个（尽管一个就足够了）的事实会造成查询处理器不得不从磁盘读取表的每一行，从而有效地防止它使用可能在表上的任何索引。在过滤器（where子句）或排序（Order by子句）的表中的某列值的函数上使用此功能会执行此操作，因为查询处理器无法在不执行该函数的情况下知道函数值是什么，并且它必须读取从磁盘上的主表获取行以获取执行该函数的基础值。如果它仅仅是原始列值，并且该列在索引中，那么处理器不需要读取主数据表，它可以仅遍历索引，索引通常是相当小的尺寸并且需要更小的数目的磁盘IO。

这被称为SARGable。

如果c.Date + 1是不可能在MySQL，那么试试这个：

SELECT l.name location, c.date, COUNT(r.id) count 
FROM calendar c 
    left join calendar n on n.Date = 
     (Select min(date) from Calendar -- subquery gets the next day in calendar 
     Where date > c.Date)   
    left join (locations l join receipts r 
       on r.id like '%' + l.Id) 
    on r.datetime between c.Date and n.Date 
where c.Date between '2017-04-01' and '2017-04-07' 
GROUP BY l.id, c.id 
ORDER BY l.name, c.date;

感谢您的帮助。我尝试了您发布的第一个查询，但仍需要很长时间才能运行。但是，我最终自己解决了这个问题（请参阅我的答案）。我不得不将where语句移到解决我的问题的连接子查询中。我怀疑这是因为它不再试图在已经很大的日历日期表上加入这么多行。限制结果集加入解决了问题。我对此一无所知，所以确定这就是为什么，但嘿它有效！ –

答

SELECT 
    `locations`.`name` AS `location`, 
    `calendar`.`date` AS `date`, 
    COUNT(`receipts`.`id`) AS `count` 
FROM `locations` 
    CROSS JOIN `calendar` 
    LEFT JOIN `receipts` ON `calendar`.`date` = DATE(`receipts`.`datetime`) 
     AND `locations`.`id` = UPPER(LEFT(`receipts`.`id`, 1)) # there is no `location_id` FK. First char of receipts id is same as location id 
WHERE `calendar`.`date` BETWEEN '2017-04-01' AND '2017-04-07' 
GROUP BY `locations`.`id`, `calendar`.`id` 
ORDER BY `locations`.`name` ASC, `calendar`.`date` ASC;

试试上面的查询。

在这里，我用BETWEEN而不是<和>。

你也可以在calendar.date这个字段上创建索引。

您可以在子表上添加FOREIGN KEY约束，并在该列上进行连接。在这种情况下，INDEX也会有所帮助。

'BETWEEN'没有加快速度。然而，我最终自己解决了它（见答案）。不过谢谢。 –

'BETWEEN'总是相当于' ='的表现。（并不等同于''在功能上 - 包括在内。） –

@RickJames明白了man .. !! –

答

对不起，我浪费了每个人的时间，但我设法自己解决这个问题。

这里是我想出查询它运行瞬间：

SELECT 
    `l`.`name` AS `location`, 
    `c`.`date` AS `date`, 
    COUNT(`r`.`id`) AS `count` 
FROM `locations` AS `l` 
    CROSS JOIN (
     SELECT `calendar`.`id`, `calendar`.`date` 
     FROM `calendar` 
     WHERE `calendar`.`date` >= '2017-04-01' AND `calendar`.`date` <= '2017-04-07' 
    ) `c` 
    LEFT JOIN (
     SELECT `receipts`.`id`, `receipts`.`datetime` 
     FROM `receipts` 
     WHERE DATE(`receipts`.`datetime`) >= '2017-04-01' AND DATE(`receipts`.`datetime`) <= '2017-04-07' 
    ) `r` ON `c`.`date` = DATE(`r`.`datetime`) AND `l`.`id` = UPPER(LEFT(`r`.`id`, 1)) 
GROUP BY `l`.`id`, `c`.`id` 
ORDER BY `l`.`name` ASC, `c`.`date` ASC;

如何优化此查询？需要3分钟才能运行

相关推荐