Python,在多个CPU上运行循环
问题描述:
我创建了一个类似于sklearn gridsearch的小代码,它在一组超参数上训练模型(在下面的代码中为X和y),使用一组超参数有关验证数据的几个指标(Xt,yt_class),并将结果存储在pandas DataFrame中。Python,在多个CPU上运行循环
from sklearn.grid_search import ParameterGrid
from sklearn.metrics import precision_score,f1_score
grid = {'C':[1,10.0,50,100.0],'gamma':[0.00001,0.0001,0.001,0.01,0.1]}
param_grid = ParameterGrid(grid)
results = pd.DataFrame(list(param_grid))
precision = []
f1 = []
for params in param_grid:
model = SVC(kernel='rbf',cache_size=1000,class_weight='balanced',**params)
model.fit(X,y)
precision.append(precision_score(yt_class, model.predict(Xt), average='weighted'))
f1.append(f1_score(yt_class, model.predict(Xt), average='weighted'))
print(params)
print(precision_score(yt_class, model.predict(Xt), average='weighted'))
print(f1_score(yt_class, model.predict(Xt), average='weighted'))
results['precision'] = precision
results['f1'] = f1
现在我努力让自己在多个CPU我的循环运行,我尝试以下基本的例子多处理模块,但作为新的Python和总体规划的无法弄清楚它的作品在我的情况。什么不起作用
例子:
import multiprocessing as mp
pool = mp.Pool(processes=8)
def get_scores(param_grid):
precision = []
f1 = []
for params in param_grid:
model = SVC(kernel='rbf',cache_size=1000,class_weight='balanced',**params)
model.fit(X,y)
model.predict(Xt)
precision.append(precision_score(yt_class, model.predict(Xt), average='weighted'))
f1.append(f1_score(yt_class, model.predict(Xt), average='weighted'))
return precision,f1
scores = pool.apply(get_scores,param_grid)
答
你get_scores
方法应该只包含循环
的内部的试试这个:
import multiprocessing as mp
from sklearn.grid_search import ParameterGrid
from sklearn.metrics import precision_score,f1_score
def get_scores(params):
model = SVC(kernel='rbf',cache_size=1000,class_weight='balanced',**params)
model.fit(X,y)
model.predict(Xt)
precision = precision_score(yt_class, model.predict(Xt), average='weighted')
f1 = f1_score(yt_class, model.predict(Xt), average='weighted')
return precision, f1
grid = {'C':[1,10.0,50,100.0],'gamma':[0.00001,0.0001,0.001,0.01,0.1]}
param_grid = ParameterGrid(grid)
pool = mp.Pool(processes=8)
scores = pool.map_async(get_scores, param_grid).get()
# scores is a list of tuples [(precision_1, f1_1), (precision_2, f1_2)...]
# you can "unzip" it like this
precision, f1 = zip(*scores)
我拿到后出现以下错误运行代码(运行我的代码后出现同样的错误):'TypeError:get_scores()需要1个位置参数,但是有20个被赋予' –
对,它应该是map_异步,而不是apply_async –