XGBoost模型上的GridSearchCV给出错误

问题描述:

我在python中创建了XGBoost分类器。我试图做GridSearch找到这样XGBoost模型上的GridSearchCV给出错误

grid_search = GridSearchCV(model, param_grid, scoring="neg_log_loss", n_jobs=-1, cv=kfold) 
grid_result = grid_search.fit(X, Y) 

print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_)) 

means = grid_result.cv_results_['mean_test_score'] 
stds = grid_result.cv_results_['std_test_score'] 
params = grid_result.cv_results_['params'] 

for mean, stdev, param in zip(means, stds, params): 
    print("%f (%f) with: %r" % (mean, stdev, param)) 

最佳参数当运行搜索,我得到这样的错误

[Errno 28] No space left on device 

我用了一个稍微大尺寸的数据集。其中, X.shape = (38932, 1002) Y.shape= (38932,)

问题是什么?如何解决这个问题。?

这是因为数据集对我的机器来说太大了。如果是的话,我该怎么做才能在这个数据集上执行GridSearch。

+0

请或者通过提供样品和形状或链接到数据 – sgDysregulation

+0

我包括数据集的说明已编辑的问题,并添加了形状 –

+0

这是一个类似的问题,你正在经历:https://stackoverflow.com/a/6999259/1577947 – Jarad

的错误指示共享内存不多了,这可能是因为增加 kfolds的数量和/或调整使用即n_jobs会解决此问题 。这里是使用xgboost

工作示例的线程数
import xgboost as xgb 
from sklearn.model_selection import GridSearchCV 
from sklearn import datasets 

clf = xgb.XGBClassifier() 
parameters = { 
    'n_estimators': [100, 250, 500], 
    'max_depth': [6, 9, 12], 
    'subsample': [0.9, 1.0], 
    'colsample_bytree': [0.9, 1.0], 
} 
bsn = datasets.load_iris() 
X, Y = bsn.data, bsn.target 
grid = GridSearchCV(clf, 
        parameters, n_jobs=4, 
        scoring="neg_log_loss", 
        cv=3) 

grid.fit(X, Y) 
print("Best: %f using %s" % (grid.best_score_, grid.best_params_)) 

means = grid.cv_results_['mean_test_score'] 
stds = grid.cv_results_['std_test_score'] 
params = grid.cv_results_['params'] 

for mean, stdev, param in zip(means, stds, params): 
    print("%f (%f) with: %r" % (mean, stdev, param)) 

的输出是

Best: -0.121569 using {'colsample_bytree': 0.9, 'max_depth': 6, 'n_estimators': 100, 'subsample': 1.0} 
-0.126334 (0.080193) with: {'colsample_bytree': 0.9, 'max_depth': 6, 'n_estimators': 100, 'subsample': 0.9} 
-0.121569 (0.081561) with: {'colsample_bytree': 0.9, 'max_depth': 6, 'n_estimators': 100, 'subsample': 1.0} 
-0.139359 (0.075462) with: {'colsample_bytree': 0.9, 'max_depth': 6, 'n_estimators': 250, 'subsample': 0.9} 
-0.131887 (0.076174) with: {'colsample_bytree': 0.9, 'max_depth': 6, 'n_estimators': 250, 'subsample': 1.0} 
-0.148302 (0.074890) with: {'colsample_bytree': 0.9, 'max_depth': 6, 'n_estimators': 500, 'subsample': 0.9} 
-0.135973 (0.076167) with: {'colsample_bytree': 0.9, 'max_depth': 6, 'n_estimators': 500, 'subsample': 1.0} 
-0.126334 (0.080193) with: {'colsample_bytree': 0.9, 'max_depth': 9, 'n_estimators': 100, 'subsample': 0.9} 
-0.121569 (0.081561) with: {'colsample_bytree': 0.9, 'max_depth': 9, 'n_estimators': 100, 'subsample': 1.0} 
-0.139359 (0.075462) with: {'colsample_bytree': 0.9, 'max_depth': 9, 'n_estimators': 250, 'subsample': 0.9} 
-0.131887 (0.076174) with: {'colsample_bytree': 0.9, 'max_depth': 9, 'n_estimators': 250, 'subsample': 1.0} 
-0.148302 (0.074890) with: {'colsample_bytree': 0.9, 'max_depth': 9, 'n_estimators': 500, 'subsample': 0.9} 
-0.135973 (0.076167) with: {'colsample_bytree': 0.9, 'max_depth': 9, 'n_estimators': 500, 'subsample': 1.0} 
-0.126334 (0.080193) with: {'colsample_bytree': 0.9, 'max_depth': 12, 'n_estimators': 100, 'subsample': 0.9} 
-0.121569 (0.081561) with: {'colsample_bytree': 0.9, 'max_depth': 12, 'n_estimators': 100, 'subsample': 1.0} 
-0.139359 (0.075462) with: {'colsample_bytree': 0.9, 'max_depth': 12, 'n_estimators': 250, 'subsample': 0.9} 
-0.131887 (0.076174) with: {'colsample_bytree': 0.9, 'max_depth': 12, 'n_estimators': 250, 'subsample': 1.0} 
-0.148302 (0.074890) with: {'colsample_bytree': 0.9, 'max_depth': 12, 'n_estimators': 500, 'subsample': 0.9} 
-0.135973 (0.076167) with: {'colsample_bytree': 0.9, 'max_depth': 12, 'n_estimators': 500, 'subsample': 1.0} 
-0.132745 (0.080433) with: {'colsample_bytree': 1.0, 'max_depth': 6, 'n_estimators': 100, 'subsample': 0.9} 
-0.127030 (0.077692) with: {'colsample_bytree': 1.0, 'max_depth': 6, 'n_estimators': 100, 'subsample': 1.0} 
-0.146143 (0.077623) with: {'colsample_bytree': 1.0, 'max_depth': 6, 'n_estimators': 250, 'subsample': 0.9} 
-0.140400 (0.074645) with: {'colsample_bytree': 1.0, 'max_depth': 6, 'n_estimators': 250, 'subsample': 1.0} 
-0.153624 (0.077594) with: {'colsample_bytree': 1.0, 'max_depth': 6, 'n_estimators': 500, 'subsample': 0.9} 
-0.143833 (0.073645) with: {'colsample_bytree': 1.0, 'max_depth': 6, 'n_estimators': 500, 'subsample': 1.0} 
-0.132745 (0.080433) with: {'colsample_bytree': 1.0, 'max_depth': 9, ... 
+0

我已经在我的机器上成功运行CVSearch之前。我只面对这个数据集的问题。 –

+0

我会尝试不使用'kFold',并让您知道它是怎么回事 –

+0

您可能还想打开gridsearch中的详细内容(即verbose = 5)来查看某些参数值是否导致问题。 – sgDysregulation