使用多连接和大记录集时的SQL查询优化
我正在制作留言板,并试图检索常规主题(即未粘贴主题),并按最后发布的消息日期对其进行排序。但是,当我有大约10,000条消息和1500个主题时,我能够完成此任务,查询时间大于60秒。使用多连接和大记录集时的SQL查询优化
我的问题是,有什么我可以做我的查询来提高性能或是我的设计从根本上有缺陷?
这是我正在使用的查询。
SELECT Messages.topic_id,
Messages.posted,
Topics.title,
Topics.user_id,
Users.username
FROM Messages
LEFT JOIN
Topics USING(topic_id)
LEFT JOIN
Users on Users.user_id = Topics.user_id
WHERE Messages.message_id IN (
SELECT MAX(message_id)
FROM Messages
GROUP BY topic_id)
AND Messages.topic_id
NOT IN (
SELECT topic_id
FROM StickiedTopics)
AND Messages.posted IN (
SELECT MIN(posted)
FROM Messages
GROUP BY message_id)
AND Topics.board_id=1
ORDER BY Messages.posted DESC LIMIT 50
编辑这里是解释计划
+----+--------------------+----------------+----------------+------------------+----------+---------+-------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+----------------+----------------+------------------+----------+---------+-------------------------+------+----------------------------------------------+
| 1 | PRIMARY | Topics | ref | PRIMARY,board_id | board_id | 4 | const | 641 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | Users | eq_ref | PRIMARY | PRIMARY | 4 | spergs3.Topics.user_id | 1 | |
| 1 | PRIMARY | Messages | ref | topic_id | topic_id | 4 | spergs3.Topics.topic_id | 3 | Using where |
| 4 | DEPENDENT SUBQUERY | Messages | index | NULL | PRIMARY | 8 | NULL | 1 | |
| 3 | DEPENDENT SUBQUERY | StickiedTopics | index_subquery | topic_id | topic_id | 4 | func | 1 | Using index |
| 2 | DEPENDENT SUBQUERY | Messages | index | NULL | topic_id | 4 | NULL | 3 | Using index |
+----+--------------------+----------------+----------------+------------------+----------+---------+-------------------------+------+----------------------------------------------+
指标
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Messages | 0 | PRIMARY | 1 | message_id | A | 9956 | NULL | NULL | | BTREE | |
| Messages | 0 | PRIMARY | 2 | revision_no | A | 9956 | NULL | NULL | | BTREE | |
| Messages | 1 | user_id | 1 | user_id | A | 432 | NULL | NULL | | BTREE | |
| Messages | 1 | topic_id | 1 | topic_id | A | 3318 | NULL | NULL | | BTREE | |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Topics | 0 | PRIMARY | 1 | topic_id | A | 1205 | NULL | NULL | | BTREE | |
| Topics | 1 | user_id | 1 | user_id | A | 133 | NULL | NULL | | BTREE | |
| Topics | 1 | board_id | 1 | board_id | A | 1 | NULL | NULL | | BTREE | |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Users | 0 | PRIMARY | 1 | user_id | A | 2051 | NULL | NULL | | BTREE | |
| Users | 0 | username_UNIQUE | 1 | username | A | 2051 | NULL | NULL | | BTREE | |
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
我会从合格主题的第一个基础开始,获取这些ID,然后再加入。 我的内部第一个查询按照topic_id和max消息进行了预先限定的分组,以获取预先限定的不同ID。我也已将LEFT JOIN应用于stickiesTopics。为什么?通过进行左连接,我可以查找找到的那些(您想要排除的那些)。所以我已经为Stickies应用了WHERE子句主题ID为NULL(即:未找到)。所以通过这样做,我们已经明确地将列表配对了,而不需要做几个嵌套的子查询。从这个结果中,我们可以加入消息,主题(包括board_id = 1的限定符),用户并根据需要获取部分。最后,为您的MIN(发布)限定符应用一个WHERE IN子选择。不了解其基础,但将其作为原始查询的一部分留下。然后按顺序和限制。
SELECT STRAIGHT_JOIN
M.topic_id,
M.posted,
T.title,
T.user_id,
U.username
FROM
(select
M1.Topic_ID,
MAX(M1.Message_id) MaxMsgPerTopic
from
Messages M1
LEFT Join StickiedTopics ST
ON M1.Topic_ID = ST.Topic_ID
where
ST.Topic_ID IS NULL
group by
M1.Topic_ID) PreQuery
JOIN Messages M
ON PreQuery.MaxMsgPerTopic = M.Message_ID
JOIN Topics T
ON M.Topic_ID = T.Topic_ID
AND T.Board_ID = 1
LEFT JOIN Users U
on T.User_ID = U.user_id
WHERE
M.posted IN (SELECT MIN(posted)
FROM Messages
GROUP BY message_id)
ORDER BY
M.posted DESC
LIMIT 50
这很好,非常感谢!它将查询时间缩短到20秒,并将硬件增加到了4点。好得多! – Drew 2012-03-16 04:51:32
我猜你的问题的很大一部分就在于你的子查询。尝试这样的:
SELECT Messages.topic_id,
Messages.posted,
Topics.title,
Topics.user_id,
Users.username
FROM Messages
LEFT JOIN
Topics USING(topic_id)
LEFT JOIN
StickiedTopics ON StickiedTopics.topic_id = Topics.topic_id
AND StickedTopics.topic_id IS NULL
LEFT JOIN
Users on Users.user_id = Topics.user_id
WHERE Messages.message_id IN (
SELECT MAX(message_id)
FROM Messages m1
WHERE m1.topic_id = Messages.topic_id)
AND Messages.posted IN (
SELECT MIN(posted)
FROM Messages m2
GROUP BY message_id)
AND Topics.board_id=1
ORDER BY Messages.posted DESC LIMIT 50
我优化了第一个子查询通过删除分组。第二个子查询是不必要的,因为它可以替换为JOIN
。
我不太清楚这是什么第三子查询是应该做的:
AND Messages.posted IN (
SELECT MIN(posted)
FROM Messages m2
GROUP BY message_id)
我也许能帮助优化,如果我知道什么是应该做的。 posted
究竟是什么 - 日期,整数等?它代表什么?
Messages.posted是一个unix时间戳。留言板还支持编辑并保留修订历史记录,以便此查询将获取最早修订的日期。 – Drew 2012-03-16 04:53:54
您至少应该发布解释计划。 – EvilTeach 2012-03-14 02:07:32
您使用的是什么RDBMS? – squillman 2012-03-14 02:09:47
没有查询计划=没有优化(尽管可能猜测应该存在的索引,哪些*也需要在问题中显示)。你有没有考虑用适当的'JOIN'替换'IN'? (我不使用MySQL,所以我不知道它是什么: - /) – 2012-03-14 02:14:39