在大型数据集上使用JOIN运行SQL查询

问题描述：

我试着去运行一个内部联接查询，80000数据库之间（这是表格B）对一个40GB的数据与约6亿的记录刷新记录（这是表A）

是Mysql的适合跑步这排序查询？我应该预计它会花费多少时间？

这是我ied下面的代码。但是，由于我的dbs连接在60000秒失败，所以失败了。

set net_read_timeout = 36000; 


INSERT 
INTO C 
SELECT A.id, A.link_id, link_ref, network, 
date_1, time_per, 
veh_cls, data_source, N, av_jt 
from A 
inner join B 
on A.link_id = B.link_id;

我开始研究如何将40GB表大小缩减为临时表，尝试使查询更易于管理。不过，我不断收到

错误代码：1206锁的总数超过了锁表的大小646.953秒

我在正确的轨道上？欢呼！

我分裂数据库的代码是：

LOCK TABLES TFM_830_car WRITE, tfm READ; 
INSERT 
INTO D 
SELECT A.id, A.link_id, A.time_per, A.av_jt 
from A 
where A.time_per = 34 and A.veh_cls = 1; 
UNLOCK TABLES;

也许我的表索引是正确的我只有一个简单的主键

CREATE Table A 
(
id int unsigned Not Null auto_increment, 
link_id varchar(255) not Null, 
link_ref int not Null, 
network int not Null, 
date_1 varchar(255) not Null, 
#date_2 time default Null, 
time_per int not null, 
veh_cls int not null, 
data_source int not null, 
N int not null, 
av_jt int not null, 
sum_squ_jt int not null, 


Primary Key (id) 
); 


Drop table if exists B; 
CREATE Table B 
(
id int unsigned Not Null auto_increment, 
TOID varchar(255) not Null, 
link_id varchar(255) not Null, 
ABnode varchar(255) not Null, 

#date_2 time not Null, 

Primary Key (id) 

);

在架构方面，它仅仅是这两个表（A和B）加载在数据库下面

“对一个40GB的数据集”。有多少条记录？你的表格是否正确编制索引？ – 2014-11-24 19:35:37

80k记录对于这样的数据量似乎相当低，你在那里存储什么，XML转储，图像二进制文件？ – 2014-11-24 19:38:38

您可以通过从子查询（派生表）中选择来解决错误，但它不会解决性能问题。发布您的架构和一些示例数据以获得进一步帮助。 – 2014-11-24 19:59:23

答

我相信在这篇文章中已经给出了答案：The total number of locks exceeds the lock table size

即。使用表锁来避免InnoDB的默认行逐行锁模式

答

感谢您的帮助。

索引似乎解决了这个问题。我设法通过索引，以减少从700secs每记录aprox的0.2secs查询时间：

A.link_id

即从

from A 
inner join B 
on A.link_id = B.link_id;

发现这真的有用的帖子。 v乐于助人像用于索引自己

http://hackmysql.com/case4

代码newbe是：

CREATE INDEX linkid_index ON A(link_id);

在大型数据集上使用JOIN运行SQL查询

相关推荐