如何分组并将数组分配给python-pandas中的列?

问题描述:

给定一个数据帧df那样:如何分组并将数组分配给python-pandas中的列?

a  b  
2  nan 
3  nan 
3  nan 
4  nan 
4  nan 
4  nan 
5  nan 
5  nan 
5  nan 
5  nan 
... 

一个关键的规则是,每个数字na重复n-1行。而我的预期成果是:

a  b  
2  1 
3  1 
3  2 
4  1 
4  2 
4  3 
5  1 
5  2 
5  3 
5  4 
... 

因此,在b数量m1n-1列表。我试过这样:

df.groupby('a').apply(lambda x: np.asarray(range(x['a'].unique()[0]))) 

但结果是一行中的列表,这不是我想要的。

你能告诉我如何实现它吗?提前致谢!

您需要cumcount

df['b'] = df.groupby('a').cumcount() + 1 
print (df) 
    a b 
0 2 1 
1 3 1 
2 3 2 
3 4 1 
4 4 2 
5 4 3 
6 5 1 
7 5 2 
8 5 3 
9 5 4 

# make a column that is 0 on the first occurrence of a number in a and 1 after 
df['is_duplicated'] = df.duplicated(['a']).astype(int) 

# group by values of a and get the cumulative sum of duplicates 
# add one since the first duplicate has a value of 0 
df['b'] = df[['a', 'is_duplicated']].groupby(['a']).cumsum() + 1 
+0

感谢您的伟大的答案!精彩! –