使用其他数据框的匹配值在数据帧中创建新列
问题描述:
有两个数据框,一个有少量信息(df1),另一个有全部数据(df2)。我正在尝试在df1的新列中创建该列,该列找到Total2值并根据名称填充新列。请注意,在df1中可见的名称将始终在df2的名称中找到匹配项。我想知道在熊猫中是否有一些功能已经做到了这一点?我的最终目标是创建一个条形图。使用其他数据框的匹配值在数据帧中创建新列
alldatapath = "all_data.csv"
filteredpath = "filtered.csv"
import pandas as pd
df1 = pd.read_csv(
filteredpath, # file name
sep=',', # column separator
quotechar='"', # quoting character
na_values="NA", # fill missing values with 0
usecols=[0,1], # columns to use
decimal='.') # symbol for decimals
df2 = pd.read_csv(
alldatapath, # file name
sep=',', # column separator
quotechar='"', # quoting character
na_values="NA", # fill missing values with 0
usecols=[0,1], # columns to use
decimal='.') # symbol for decimals
df1 = df1.head(5) #trim to top 5
print(df1)
print(df2)
输出(DF1):
Name Total
0 Accounting 3
1 Reporting 1
2 Finance 1
3 Audit 1
4 Template 2
输出(DF2):
Name Total2
0 Reporting 100
1 Accounting 120
2 Finance 400
3 Audit 500
4 Information 50
5 Template 1200
6 KnowHow 2000
最终输出(DF1)应该是这样的:
Name Total Total2(new column)
0 Accounting 3 120
1 Reporting 1 100
2 Finance 1 400
3 Audit 1 500
4 Template 2 1200
答
需要map
通过Series
第一个新列:
df1['Total2'] = df1['Name'].map(df2.set_index('Name')['Total2'])
print (df1)
Name Total Total2
0 Accounting 3 120
1 Reporting 1 100
2 Finance 1 400
3 Audit 1 500
4 Template 2 1200
然后set_index
与DataFrame.plot.bar
:
df1.set_index('Name').plot.bar()
的感谢!我将研究这些功能,将其应用于我的全球代码。 – Gonzalo