大熊猫聚集
问题描述:
我是熊猫新手,一直在使用它作为一堂课,但是我当然不熟悉Panda-ese。大熊猫聚集
比方说,我有一个数据帧,例如:
accord = pd.Series({'Manufacturer' : 'Honda',
'Model' : 'Accord',
'Drivetrain': 'FWD'})
civic = pd.Series({'Manufacturer' : 'Honda',
'Model' : 'Civic',
'Drivetrain': 'FWD'})
focus = pd.Series({'Manufacturer' : 'Ford',
'Model' : 'Focus',
'Drivetrain': 'FWD'})
mustang = pd.Series({'Manufacturer' : 'Ford',
'Model' : 'Mustang',
'Drivetrain': 'RWD'})
cars_df = pd.DataFrame([accord, civic, focus, mustang])
什么我最终想要得到的是包括每个制造商总的模型和多少前轮驱动的车辆,他们做一个清单。
所以,我拉了一系列的,做一个新的数据帧:
manufacturer_s = cars_df['Manufacturer'].unique()
manufacturer_df = pd.DataFrame(index=manufacturer_s)
我加空列因我所求的信息:
manufacturer_df['FWD MODEL COUNT'] = 0
manufacturer_df['MODEL COUNT'] = 0
而且我用“iterrows”来填充这样的数据:
for manufacturer, row in manufacturer_df.iterrows():
row['MODEL COUNT'] =
len(cars_df[cars_df['Manufacturer'] == manufacturer])
row['FWD MODEL COUNT'] =
len(cars_df[(cars_df['Manufacturer'] == manufacturer) &
(cars_df['Drivetrain'] == 'FWD')])
现在,我的输出如下:
FWD MODEL COUNT MODEL COUNT
Honda 2 2
Ford 1 2
(编辑:我发现一个错字,所以这部分工作)现在,这不仅是详细(可能慢),但它不觉得“熊猫式”。
另外,我试过如下:
manufacturer_df['MODEL COUNT'] = manufacturer_df.apply(lambda car:
len(cars_df[cars_df['Manufacturer'] == car.index]), axis=1)
manufacturer_df['FWD MODEL COUNT'] = manufacturer_df.apply(lambda car:
len(cars_df[(cars_df['Manufacturer'] == car.index) &
(cars_df['Drivetrain'] == 'FWD')]), axis=1)
这并不在所有的工作......所以,我应该怎么做到这一点,(还)我究竟做错了什么?
答
您可以使用groupby().agg()
,您可以在其中汇总每个列的不同聚合函数。你可以计算与pd.Series.nunique
每个厂商独特的模型和计算trues的数量x == "FWD"
为每个组计算FWD车辆的总数:
(cars_df.groupby("Manufacturer").agg({"Model": "nunique",
"Drivetrain": lambda x: (x == "FWD").sum()}))
# Model Drivetrain
#Manufacturer
# Ford 2 1
# Honda 2 2
您可能希望*总和*,而不是* len个*。想一想吧。 –