计算高维交叉表中的百分比
问题描述:
我对3个变量(position
,offer
,group
)作了crosstab
。我怎样才能通过总计1个变量offer
来计算百分比,而不是利润率(即按列来标准化)?计算高维交叉表中的百分比
df = pd.crosstab(df.group, [df.position, df.offer], margins = True)
DF
pid offer position group
1 accept left group1
1 accept left group1
1 accept right group2
1 reject right group2
1 reject right group1
2 reject right group1
2 reject left group2
2 accept left group3
3 accept right group3
3 reject right group1
3 reject right group2
我目前的交叉表:
position left right All
offer accept reject accept reject
group1 2 0 0 3 5
group2 0 1 1 2 4
group3 1 0 1 0 2
All 3 1 2 5 11
预期结果:
position left right
offer accept reject accept reject
group1 1 0 0 1
group2 0 1 0.33 0.66
group3 1 0 1 0
谢谢!
答
再往下一个步骤,groupby
沿着列的第0级,除以sum
。
c = pd.crosstab(df.group, [df.position, df.offer])
df = c/c.groupby(level=0, axis=1).sum()
print(df)
position left right
offer accept reject accept reject
group
group1 1.0 0.0 0.000000 1.000000
group2 0.0 1.0 0.333333 0.666667
group3 1.0 0.0 1.000000 0.000000
如果你像我,你可能想整个数字为整数,可以是这样做的是尽可能多的一个完美主义者:
df = c.div(c.groupby(level=0, axis=1).sum()).astype(object)
print(df)
position left right
offer accept reject accept reject
group
group1 1 0 0 1
group2 0 1 0.333333 0.666667
group3 1 0 1 0
+0
@COLDSPEED,我怎么能groupby多层次? 'groupby([level = 0,level = 1],axis = 1)'似乎不起作用。谢谢! – Kay
+1
@Kay'groupby(level = [0,1],axis = 1)' –
答
你可以使用
In [4013]: dfa = df.groupby(['group', 'position', 'offer']).size().unstack(fill_value=0)
In [4014]: dfa.div(dfa.sum(axis=1), axis=0).unstack()
Out[4014]:
offer accept reject
position left right left right
group
group1 1.0 0.000000 0.0 1.000000
group2 0.0 0.333333 1.0 0.666667
group3 1.0 1.000000 0.0 0.000000
你也可以有也可以从pivot_table
获得。
df.pivot_table(index=['group', 'position'], columns='offer', aggfunc=len)['pid']
'df'看起来像什么? –