根据每个其他列的值合并大数据框
问题描述:
如何执行此操作?我有一个.csv文件下面的数据集:根据每个其他列的值合并大数据框
+------------+----------------+------------+---------------+------------+----------------+------------+----------------+------------+---------------+
| Date | NBDG LN Equity | Date | P2P LN Equity | Date | HWSL LN Equity | Date | BPCR LN Equity | Date | AXI LN Equity |
+------------+----------------+------------+---------------+------------+----------------+------------+----------------+------------+---------------+
| 09-08-2017 | 78,5 | 09-08-2017 | 877,061 | 09-08-2017 | 107,082 | 09-08-2017 | 1,0981 | 08-08-2017 | 94 |
| 08-08-2017 | 78,5 | 08-08-2017 | 878,7899 | 08-08-2017 | 106,5 | 08-08-2017 | 1,1021 | 07-08-2017 | 94 |
| 03-08-2017 | 78,5 | 07-08-2017 | 879,709 | 07-08-2017 | 106,2 | 07-08-2017 | 1,0945 | 02-08-2017 | 98,2472 |
| 01-08-2017 | 78,5 | 04-08-2017 | 879,6708 | 04-08-2017 | 105,4882 | 04-08-2017 | 1,0932 | 27-07-2017 | 98,5 |
+------------+----------------+------------+---------------+------------+----------------+------------+----------------+------------+---------------+
,我要“合并”成格式:
+------------+----------------+---------------+----------------+----------------+---------------+
| Date | NBDG LN Equity | P2P LN Equity | HWSL LN Equity | BPCR LN Equity | AXI LN Equity |
+------------+----------------+---------------+----------------+----------------+---------------+
| 09-08-2017 | 78,5 | 877,061 | 107,082 | 1,0981 | NA |
| 08-08-2017 | 78,5 | 878,7899 | 106,5 | 1,1021 | 94 |
| 07-08-2017 | NA | 879,709 | 106,2 | 1,0945 | 94 |
| 04-08-2017 | NA | 879,6708 | 105,4882 | 1,0932 | NA |
| 03-08-2017 | 78,5 | NA | NA | NA | NA |
| 02-08-2017 | NA | NA | NA | NA | 98,2472 |
| 01-08-2017 | 78,5 | NA | NA | NA | NA |
| 27-07-2017 | NA | NA | NA | NA | 98,5 |
+------------+----------------+---------------+----------------+----------------+---------------+
我怎么能做到这一点没有硬编码太多了?我开始用
dfData = local_csv('Data.csv', timezone='DK', sep=';')
lDateColumns = [col for col in dfData.columns if 'Date' in col]
dfData[dfData[lDateColumns].apply(pd.Series.nunique, axis=1)==1]
,直到我注意到,有时指数相对于海誓山盟导致只有4行留下抵消唯一的行排序。
感谢
答
我崩溃了一块数据框件(更准确地说,2列2列),然后合并一切重新走到一起:
In [103]: df
Out[103]:
Date NBDG LN Equity Date.1 P2P LN Equity Date.2 \
0 09-08-2017 78,5 09-08-2017 877,061 09-08-2017
1 08-08-2017 78,5 08-08-2017 878,7899 08-08-2017
2 03-08-2017 78,5 07-08-2017 879,709 07-08-2017
3 01-08-2017 78,5 04-08-2017 879,6708 04-08-2017
HWSL LN Equity Date.3 BPCR LN Equity Date.4 AXI LN Equity
0 107,082 09-08-2017 1,0981 08-08-2017 94
1 106,5 08-08-2017 1,1021 07-08-2017 94
2 106,2 07-08-2017 1,0945 02-08-2017 98,2472
3 105,4882 04-08-2017 1,0932 27-07-2017 98,5
In [114]: res = []
In [115]: for i in range(5):
...: df_temp = pd.concat([df.iloc[:, 2*i], df.iloc[:, 2*i+1]], axis=1)
...: df_temp.columns = ['Date', df_temp.columns[1]]
...: res.append(df_temp)
...:
我们现在有数据帧的数组,其第一列始终是日期(并称为“日期”),第二列是相关度量。我们打算将所有东西合并使用functools.reduce
In [117]: from functools import reduce
In [120]: reduce(lambda df1,df2: df1.merge(df2, on='Date', how='outer'), res)
Out[120]:
Date NBDG LN Equity P2P LN Equity HWSL LN Equity BPCR LN Equity \
0 09-08-2017 78,5 877,061 107,082 1,0981
1 08-08-2017 78,5 878,7899 106,5 1,1021
2 03-08-2017 78,5 NaN NaN NaN
3 01-08-2017 78,5 NaN NaN NaN
4 07-08-2017 NaN 879,709 106,2 1,0945
5 04-08-2017 NaN 879,6708 105,4882 1,0932
6 02-08-2017 NaN NaN NaN NaN
7 27-07-2017 NaN NaN NaN NaN
AXI LN Equity
0 NaN
1 94
2 NaN
3 NaN
4 94
5 NaN
6 98,2472
7 98,5
到目前为止您尝试过什么?请发布您的代码。 – James