如何分组并将数组分配给python-pandas中的列?
问题描述:
给定一个数据帧df
那样:如何分组并将数组分配给python-pandas中的列?
a b
2 nan
3 nan
3 nan
4 nan
4 nan
4 nan
5 nan
5 nan
5 nan
5 nan
...
一个关键的规则是,每个数字n
在a
重复n-1
行。而我的预期成果是:
a b
2 1
3 1
3 2
4 1
4 2
4 3
5 1
5 2
5 3
5 4
...
因此,在b
数量m
为1
到n-1
列表。我试过这样:
df.groupby('a').apply(lambda x: np.asarray(range(x['a'].unique()[0])))
但结果是一行中的列表,这不是我想要的。
你能告诉我如何实现它吗?提前致谢!
答
您需要cumcount
:
df['b'] = df.groupby('a').cumcount() + 1
print (df)
a b
0 2 1
1 3 1
2 3 2
3 4 1
4 4 2
5 4 3
6 5 1
7 5 2
8 5 3
9 5 4
答
# make a column that is 0 on the first occurrence of a number in a and 1 after
df['is_duplicated'] = df.duplicated(['a']).astype(int)
# group by values of a and get the cumulative sum of duplicates
# add one since the first duplicate has a value of 0
df['b'] = df[['a', 'is_duplicated']].groupby(['a']).cumsum() + 1
感谢您的伟大的答案!精彩! –