Pandas Groupby坏行

问题描述：

有没有办法强制pandas.groupby返回一个DataFrame？下面是说明我的问题的例子：Pandas Groupby坏行

玩具数据框：

df = pd.DataFrame(data=dict(a=[1, 1, 1, 2, 2, 2, 3, 3, 3], 
          b=[1, 1, 1, 2, 2, 2, 4, 4, 4])

该函数返回预期数据帧：

def fcn_good(d): 
    return pd.Series(data=dict(mean=d.b.mean(), std=d.b.std())) 
print(df.groupby('a').apply(fcn_good))

随着输出

mean std 
a 
1 1.0 0.0 
2 2.0 0.0 
3 4.0 0.0

现在这里是问题。在我的真实代码中，某些groupby键在计算过程中会失败。我想输出是：

mean std 
a 
1 1.0 0.0 
2 NaN NaN 
3 4.0 0.0

但是，此代码

def fcn_bad(d): 
    if int(d.a.unique()[0]) == 2: # Simulate failure 
     return pd.Series() 
    return pd.Series(data=dict(mean=d.b.mean(), std=d.b.std())) 
print(df.groupby('a').apply(fcn_bad))

返回了一系列的替代：

a 
1 mean 1.0 
    std  0.0 
3 mean 4.0 
    std  0.0 
dtype: float64

任何人知道如何得到这个工作？

答

您可以通过a列的unique值使用unstack和reindex，因为groupby是a列：

def fcn_bad(d): 
    if int(d.a.unique()[0]) == 2: # Simulate failure 
     return pd.Series() 
    return pd.Series(data=dict(mean=d.b.mean(), std=d.b.std())) 
print(df.groupby('a').apply(fcn_bad).unstack().reindex(df.a.unique())) 
    mean std 
a   
1 1.0 0.0 
2 NaN NaN 
3 4.0 0.0

如果与最终df列名添加index到Series像pd.Series(index=['mean','std'])，它的回报DataFrame：

def fcn_bad(d): 
    if int(d.a.unique()[0]) == 2: # Simulate failure 
     return pd.Series(index=['mean','std']) 
    return pd.Series(data=dict(mean=d.b.mean(), std=d.b.std())) 
print(df.groupby('a').apply(fcn_bad)) 
    mean std 
a   
1 1.0 0.0 
2 NaN NaN 
3 4.0 0.0

相关推荐