Sqlalchemy在查询过程中丢失连接

问题描述:

我正在开发一个项目,该项目结合了基于注册用户的一些数据源。一个特别的查询给了我很多的问题:Sqlalchemy在查询过程中丢失连接

import numpy as np 
import pandas as pd 
from pandas import Series, DataFrame 
from sqlalchemy import create_engine 

# of course, the info here is obscured 
prod_engine = create_engine('mysql+mysqlconnector://[email protected]:3306/database',pool_timeout=3600,pool_recycle=3600) 

query_users = """ 
SELECT users.id, 
CASE 
    WHEN ((users.role = '' OR users.role IS NULL) AND users.plan LIKE 'pro%') OR users.role REGEXP '(pro|agent|manager)' THEN 'professional' ELSE 'consumer' 
END AS 'modified_role', 
users.created_at, 
users.logged_in_at AS 'last_login', 
COUNT(DISTINCT(folders.id)) AS 'folder_count', 
IF(COUNT(DISTINCT(folders.id)) > 1, '2 or more','0 to 1') AS 'folder_group', 
MIN(folders.created_at) AS 'first_folder_created', 
MAX(folders.created_at) AS 'last_folder_created' 
FROM users 
LEFT OUTER JOIN folders 
ON folders.created_by = users.id 
AND folders.discarded = 0 
AND folders.created_at >= '2010-11-30 23:59:59' 
WHERE users.invalid_email IS NULL 
GROUP BY 1""" 

users = pd.read_sql_query(query_users, prod_engine) 

不管是什么我已经试过,我得到这个错误(几乎总是在三秒钟内,有时瞬间)。

InterfaceError: (InterfaceError) 2013: Lost connection to MySQL server during query 

我已经尝试了一些东西,比如添加pool_timeoutpool_recycle选项每文档的create_engine功能这里http://docs.sqlalchemy.org/en/latest/core/engines.html

我也试过,但得到了同样的错误。

有趣的是,只要我在Sequel Pro中运行它,这个查询就可以正常工作;它立即开始返回行,并且只需要约10秒钟即可完成。产量约为550,000行。

还有其他线程/帖子我发现的数量,但没有一个似乎很满足我的需要: https://groups.google.com/forum/#!topic/sqlalchemy/TWL7aWab9ww Handle SQLAlchemy disconnect http://blog.fizyk.net.pl/blog/reminder-set-pool_recycle-for-sqlalchemys-connection-to-mysql.html

阅读文档这里http://dev.mysql.com/doc/refman/5.5/en/error-lost-connection.html,我注意到这条线:

Sometimes the “during query” form happens when millions of rows are being sent as part of one or more queries. If you know that this is happening, you should try increasing net_read_timeout from its default of 30 seconds to 60 seconds or longer, sufficient for the data transfer to complete.

好像我可能需要更改此选项,但是我无法在提到此问题的SQLAlchemy文档中找到任何内容。

有没有人遇到过这个问题?如果是这样,你是如何解决它的?

+0

使用其他方法(例如GUI sql管理器)时,该查询是否有效?你是否尝试过一个不同于'mysqlconnector'的其他DBAPI(例如'pymysql',你可以在这里检查[http://docs.sqlalchemy.org/en/rel_0_9/dialects/mysql.html#dialect-mysql]) ? – ari

+0

是的,可以和Sequel Pro http://www.sequelpro.com一起使用。尽管没有尝试过mymysql。 – measureallthethings

+0

尝试使用另一个DBAPI:对于pymysql,您可以通过在终端中运行'conda install pymysql'来下载它(或者如果您没有Anaconda,则使用pip),然后在'create_engine'字符串中用'pymysql'替换'mysqlconnector' 。 – ari

检查你的MySQL服务器的变量max_allowed_packet_size并增加它。大多数情况下,当MySQL在查询过程中断开连接时,这是因为有效负载太大。