优化加入mysql中的很多表格

问题描述:

我有多个表,我试图加入。我在表格中添加了索引以提高速度,但仍需要很长时间才能加入。我怀疑这是预期的,但我想知道是否有更有效的方式来创建多个表的连接。我还将net_read_timeout =设置为150,因为我收到丢失的连接错误。我的查询看起来像:优化加入mysql中的很多表格

set net_read_timeout = 150; 
ALTER TABLE wspeed2 ADD INDEX (speed,roadtypeID) --For all the tables 

SELECT a.month,a.roadTypeID,a.speed,a.pid, a.or, b.pid, b.or, c.pid, c.or, d.pid, d.or, 
     e.pid, e.or, f.pid, f.or, g.pid, g.or, h.pid, h.or, i.pid, i.or, j.pid, j.or, 
     k.pid, k.or, l.pid, l.or, m.pid, m.or, n.pid, n.or, o.pid, o.or, p.pid, p.or, 
     q.pid, q.or, r.pid, r.or, s.pid, s.or, t.pid, t.or, u.pid, u.or, v.pid, v.or 
FROM wspeed2 a, wspeed3 b, wspeed20 c, wspeed24 d, wspeed25 e, wspeed26 f, wspeed27 g, wspeed63 h, wspeed65 i, wspeed68 j, 
    wspeed69 k, wspeed70 l, wspeed71 m, wspeed72 n, wspeed73 o, wspeed74 p, wspeed75 q, wspeed76 r, wspeed77 s, wspeed78 t, wspeed81 u, wspeed82 v 
WHERE a.speed = b.speed and b.speed = c.speed and c.speed = d.speed and d.speed = e.speed and e.speed = f.speed and f.speed = g.speed and g.speed = h.speed 
    and h.speed = i.speed and i.speed = j.speed and j.speed = k.speed and k.speed = l.speed and l.speed = m.speed and m.speed = n.speed and n.speed = o.speed 
    and o.speed = p.speed and p.speed = q.speed and q.speed = r.speed and r.speed = s.speed and s.speed = t.speed and t.speed = u.speed and u.speed = v.speed 
GROUP BY a.speed; 
+0

我来自Microsoft的后台;但是,一般来说,当我有5个以上的表加入时,我会使用临时表或其他临时表。 – 2014-08-28 18:39:46

+0

这是SQL数据库的主要问题。有没有更好的方式来做到这一点,我知道 – ControlAltDel 2014-08-28 18:40:33

虽然查询本身看起来很简单但很奇怪,这里是连接。注意......因为你有a = bb = cc = d等等......它也意味着a = ra = sa = t等等......所以,不是所有的人都依赖它前面的别名,它可能会帮助你引擎将所有其他速度表直接连接到根“a”级别别名,如下所示。这就是说,如果一个或多个表没有“a”表中相应速度的记录,它们将不会出现在结果集中。如果您希望ALL在另一个表中有匹配,请将它们全部更改为LEFT JOIN。

现在,看着你的“a”表,你基于道路类型和每个月的速度。速度栏是独特的栏目吗?我会认为这是,但不是积极的。如果任何基础表被加入到每个速度值相同的记录超过1条,您将得到一个笛卡儿结果,并且可能会窒息您的查询。

此外,你有一个组,但没有聚合函数列,如SUM(something),count(),avg(),min(),max(),那么这个组的点是什么。有时,您可能希望订购的东西(最好的东西与指数的“一”表

SELECT 
     a.month, a.roadTypeID, a.speed, 
     a.pid, a.or, b.pid, b.or, c.pid, c.or, d.pid, d.or, 
     e.pid, e.or, f.pid, f.or, g.pid, g.or, h.pid, h.or, 
     i.pid, i.or, j.pid, j.or, k.pid, k.or, l.pid, l.or, 
     m.pid, m.or, n.pid, n.or, o.pid, o.or, p.pid, p.or, 
     q.pid, q.or, r.pid, r.or, s.pid, s.or, t.pid, t.or, 
     u.pid, u.or, v.pid, v.or 
    FROM 
     wspeed2 a 
     JOIN wspeed3 b on a.speed = b.speed 
     JOIN wspeed20 c on a.speed = c.speed 
     JOIN wspeed24 d on a.speed = d.speed 
     JOIN wspeed25 e on a.speed = e.speed 
     JOIN wspeed26 f on a.speed = f.speed 
     JOIN wspeed27 g on a.speed = g.speed 
     JOIN wspeed63 h on a.speed = h.speed 
     JOIN wspeed65 i on a.speed = i.speed 
     JOIN wspeed68 j on a.speed = j.speed 
     JOIN wspeed69 k on a.speed = k.speed 
     JOIN wspeed70 l on a.speed = l.speed 
     JOIN wspeed71 m on a.speed = m.speed 
     JOIN wspeed72 n on a.speed = n.speed 
     JOIN wspeed73 o on a.speed = o.speed 
     JOIN wspeed74 p on a.speed = p.speed 
     JOIN wspeed75 q on a.speed = q.speed 
     JOIN wspeed76 r on a.speed = r.speed 
     JOIN wspeed77 s on a.speed = s.speed 
     JOIN wspeed78 t on a.speed = t.speed 
     JOIN wspeed81 u on a.speed = u.speed 
     JOIN wspeed82 v on a.speed = v.speed 

如果仍然没有帮助,也许加入MySQL的关键字“STRAIGHT_JOIN”可能有帮助,如:

选择STRAIGHT_JOIN [查询的休息]

+0

@丹尼尔,很高兴它似乎为你工作。你介意让我知道查询的性能改进。这对其他具有实际数据和类似多个连接的人可能会有好处。了解此前后的查询时间之前/之后是很好的了解。 – DRapp 2014-08-29 02:06:54

+0

是的,没问题。在查看并尝试查询结构之后,我决定也包含所有可匹配所有表的pk。在这种情况下,pk由monthid,roadtype和speed组成。当使用这个结构进行连接时,查询的速度从5分钟提高到了一秒。 – 2014-08-29 15:49:54

使用内部和左/右连接会给你更好的性能。尝试用这种方法重写查询 -

select ... from t1 
innerjoin t2 on t1.pk=t2.fk 
leftjoin t3 on t1.pk=t3.fk 
+1

我没有看到为什么MySQL不会优化隐式和显式连接以产生相同的执行路径。你为什么认为明确的连接会带来更好的表现? – 2014-08-28 19:08:02

+0

如果3个表各有100行。在“FROM t1,t2”方法中,它将首先创建1000000行然后进行过滤。 但在“FROM t1 join t2 ON ..”中它将只创建匹配的行。 – 2014-08-28 19:16:28

+1

@Biswajit,你确定吗?这是一个非常基本的优化。我发现很难相信MySQL不知情。 – 2014-08-28 19:21:55

如果speed列不在这些表中唯一的(而且可能它不是,因为你说你添加一个索引与speed作为国内领先的列...

如果t这里有多个行的值为speed,在这些表中,那么您的查询可能会创建一个中间集合。

让我们来做一些简单的数学。如果每个表中有两行具有相同的速度值,则a和b之间的JOIN操作将为该速度创建4行。当我们将连接添加到c时,还有另外两行,这总共有8行。当我们把所有22个表连接起来,每个表都有两行时,我们在2^22或超过400万行。然后,需要在GROUP BY操作中处理所有具有speed的所有相同值的整行行以消除重复项。

(当然,如果表中的任何一个不具有一排同样speed值,那么查询会产生零行为speed。)

就个人而言,我会沟老用于JOIN操作的-school逗号语法,并改为使用JOIN关键字。我将WHERE子句中的连接谓词移动到适当的ON子句中。

我也想让其中一个表作为所有连接的“驱动程序”,我会在每个连接中使用对同一个表的引用。 (我们知道,如果a=bb=c,然后a=c。但我不知道MySQL优化,无论任何区别,我们是否到位a=b and b=c指定a=b and a=c

如果有不同的值相对较少的数量在每个表中有speed,但是很多具有相同值的行,我会考虑使用内联视图为每​​个表的每个速度获取一行。MySQL可以使用合适的索引来优化GROUP BY在每个单独的表上操作...我会选择覆盖索引...例如

ON wspeed20 (speed, pid, `or`) 
ON wspeed24 (speed, pid, `or`) 

Unfor可调节地,派生表(内联视图查询的结果)未被索引,所以JOIN操作可能很昂贵(对于来自每个内联视图查询的许多行)。

SELECT a.month,a.roadTypeID,a.speed,a.pid,a.or, b.pid, b.or, c.pid, c.or, d.pid, d.or, 
    e.pid, e.or, f.pid, f.or, g.pid, g.or, h.pid, h.or, i.pid, i.or, j.pid, j.or, 
    k.pid, k.or, l.pid, l.or, m.pid, m.or, n.pid, n.or, o.pid, o.or, p.pid, p.or, 
    q.pid, q.or, r.pid, r.or, s.pid, s.or, t.pid, t.or, u.pid, u.or, v.pid, v.or 

    FROM (SELECT speed, pid, `or` FROM wspeed2 GROUP BY speed) a 
    JOIN (SELECT speed, pid, `or` FROM wspeed3 GROUP BY speed) b ON b.speed = a.speed 
    JOIN (SELECT speed, pid, `or` FROM wspeed20 GROUP BY speed) c ON c.speed = a.speed 
    JOIN (SELECT speed, pid, `or` FROM wspeed24 GROUP BY speed) d ON d.speed = a.speed 
    JOIN (SELECT speed, pid, `or` FROM wspeed25 GROUP BY speed) e ON e.speed = a.speed 
    JOIN (SELECT speed, pid, `or` FROM wspeed26 GROUP BY speed) f ON f.speed = a.speed 
    JOIN (SELECT speed, pid, `or` FROM wspeed27 GROUP BY speed) g ON g.speed = a.speed 
    JOIN (SELECT speed, pid, `or` FROM wspeed63 GROUP BY speed) h ON h.speed = a.speed 
    JOIN (SELECT speed, pid, `or` FROM wspeed65 GROUP BY speed) i ON i.speed = a.speed 
    JOIN (SELECT speed, pid, `or` FROM wspeed68 GROUP BY speed) j ON j.speed = a.speed 
    JOIN (SELECT speed, pid, `or` FROM wspeed69 GROUP BY speed) k ON k.speed = a.speed 
    JOIN (SELECT speed, pid, `or` FROM wspeed70 GROUP BY speed) l ON l.speed = a.speed 
    JOIN (SELECT speed, pid, `or` FROM wspeed71 GROUP BY speed) m ON m.speed = a.speed 
    JOIN (SELECT speed, pid, `or` FROM wspeed72 GROUP BY speed) n ON n.speed = a.speed 
    JOIN (SELECT speed, pid, `or` FROM wspeed73 GROUP BY speed) o ON o.speed = a.speed 
    JOIN (SELECT speed, pid, `or` FROM wspeed74 GROUP BY speed) p ON p.speed = a.speed 
    JOIN (SELECT speed, pid, `or` FROM wspeed75 GROUP BY speed) q ON q.speed = a.speed 
    JOIN (SELECT speed, pid, `or` FROM wspeed76 GROUP BY speed) r ON r.speed = a.speed 
    JOIN (SELECT speed, pid, `or` FROM wspeed77 GROUP BY speed) s ON s.speed = a.speed 
    JOIN (SELECT speed, pid, `or` FROM wspeed78 GROUP BY speed) t ON t.speed = a.speed 
    JOIN (SELECT speed, pid, `or` FROM wspeed81 GROUP BY speed) u ON u.speed = a.speed 
    JOIN (SELECT speed, pid, `or` FROM wspeed82 GROUP BY speed) v ON v.speed = a.speed 

这不得不减少对需要连接的行数的潜力(同样,如果有大量重复值的speed,并为speed少数不同的值)。但是同样,派生表之间的JOIN操作不会有任何可用的索引。 (至少,在MySQL版本中不能高达5.6)。