建立在数据帧的峰值列
问题描述:
我有一个数据帧DF whoch样子:建立在数据帧的峰值列
date perf cumulative_perf
29/11/2005 36528.11368 36528.11368
30/11/2005 29034.77194 65562.88563
01/12/2005 47923.50416 113486.3898
02/12/2005 52740.69331 166227.0831
05/12/2005 -3185.762137 163041.0321
06/12/2005 -25084.55935 137956.7616
07/12/2005 3551.701267 141508.4629
08/12/2005 22039.83875 163548.3016
09/12/2005 58217.58428 221765.8859
12/12/2005 -2906.995835 218858.8901
13/12/2005 -31979.02878 186879.8613
我想添加一个名为峰列会看cumulative_perf日期和比较它的峰值从昨天开始并返回到最高点的两列。结果输出很有希望的样子:
date perf cumulative_perf peak
29/11/2005 36528.11368 36528.11368 36528.11368
30/11/2005 29034.77194 65562.88563 65562.88563
01/12/2005 47923.50416 113486.3898 113486.3898
02/12/2005 52740.69331 166227.0831 166227.0831
05/12/2005 -3185.762137 163041.0321 166227.0831
06/12/2005 -25084.55935 137956.7616 166227.0831
07/12/2005 3551.701267 141508.4629 166227.0831
08/12/2005 22039.83875 163548.3016 166227.0831
09/12/2005 58217.58428 221765.8859 221765.8859
12/12/2005 -2906.995835 218858.8901 221765.8859
13/12/2005 -31979.02878 186879.8613 221765.8859
可能有人请让我知道我是如何引用的cumulative_perf列,当天的峰值之前作出决定其价值的重返巅峰列当天?
任何帮助将非常感激。
感谢
答
我想你需要Series.cummax
:
df['peak'] = df['cumulative_perf'].cummax()
print (df)
date perf cumulative_perf peak
0 29/11/2005 36528.113680 36528.11368 36528.11368
1 30/11/2005 29034.771940 65562.88563 65562.88563
2 01/12/2005 47923.504160 113486.38980 113486.38980
3 02/12/2005 52740.693310 166227.08310 166227.08310
4 05/12/2005 -3185.762137 163041.166227.08310
5 06/12/2005 -25084.559350 137956.76160 166227.08310
6 07/12/2005 3551.701267 141508.46290 166227.08310
7 08/12/2005 22039.838750 163548.30160 166227.08310
8 09/12/2005 58217.584280 221765.88590 221765.88590
9 12/12/2005 -2906.995835 218858.89010 221765.88590
10 13/12/2005 -31979.028780 186879.86130 221765.88590
或为numpy的解决方案:
df['peak'] = np.maximum.accumulate(df['cumulative_perf'])
print (df)
date perf cumulative_perf peak
0 29/11/2005 36528.113680 36528.11368 36528.11368
1 30/11/2005 29034.771940 65562.88563 65562.88563
2 01/12/2005 47923.504160 113486.38980 113486.38980
3 02/12/2005 52740.693310 166227.08310 166227.08310
4 05/12/2005 -3185.762137 163041.166227.08310
5 06/12/2005 -25084.559350 137956.76160 166227.08310
6 07/12/2005 3551.701267 141508.46290 166227.08310
7 08/12/2005 22039.838750 163548.30160 166227.08310
8 09/12/2005 58217.584280 221765.88590 221765.88590
9 12/12/2005 -2906.995835 218858.89010 221765.88590
10 13/12/2005 -31979.028780 186879.86130 221765.88590
答
好了,我们正在使用rolling
和max
df['cumulative_perf'].rolling(window=len(df), min_periods=1).max()
Out[487]:
0 36528.11368
1 65562.88563
2 113486.38980
3 166227.08310
4 166227.08310
5 166227.08310
6 166227.08310
7 166227.08310
8 221765.88590
9 221765.88590
10 221765.88590
Name: cumulative_perf, dtype: float64
cummax是真棒,我想'滚(2)。应用(最大)',但这并不工作 – Dark
@Bharath仍然工作,检查我的答案:-) – Wen