建立在数据帧的峰值列

问题描述:

我有一个数据帧DF whoch样子:建立在数据帧的峰值列

date    perf cumulative_perf   
29/11/2005 36528.11368  36528.11368  
30/11/2005 29034.77194  65562.88563  
01/12/2005 47923.50416  113486.3898  
02/12/2005 52740.69331  166227.0831  
05/12/2005 -3185.762137  163041.0321  
06/12/2005 -25084.55935  137956.7616  
07/12/2005 3551.701267  141508.4629  
08/12/2005 22039.83875  163548.3016  
09/12/2005 58217.58428  221765.8859  
12/12/2005 -2906.995835  218858.8901  
13/12/2005 -31979.02878  186879.8613 

我想添加一个名为峰列会看cumulative_perf日期和比较它的峰值从昨天开始并返回到最高点的两列。结果输出很有希望的样子:

date    perf cumulative_perf  peak 
29/11/2005 36528.11368 36528.11368  36528.11368 
30/11/2005 29034.77194 65562.88563  65562.88563 
01/12/2005 47923.50416 113486.3898  113486.3898 
02/12/2005 52740.69331 166227.0831  166227.0831 
05/12/2005 -3185.762137 163041.0321  166227.0831 
06/12/2005 -25084.55935 137956.7616  166227.0831 
07/12/2005 3551.701267 141508.4629  166227.0831 
08/12/2005 22039.83875 163548.3016  166227.0831 
09/12/2005 58217.58428 221765.8859  221765.8859 
12/12/2005 -2906.995835 218858.8901  221765.8859 
13/12/2005 -31979.02878 186879.8613  221765.8859 

可能有人请让我知道我是如何引用的cumulative_perf列,当天的峰值之前作出决定其价值的重返巅峰列当天?

任何帮助将非常感激。

感谢

我想你需要Series.cummax

df['peak'] = df['cumulative_perf'].cummax() 
print (df) 
      date   perf cumulative_perf   peak 
0 29/11/2005 36528.113680  36528.11368 36528.11368 
1 30/11/2005 29034.771940  65562.88563 65562.88563 
2 01/12/2005 47923.504160  113486.38980 113486.38980 
3 02/12/2005 52740.693310  166227.08310 166227.08310 
4 05/12/2005 -3185.762137  163041.166227.08310 
5 06/12/2005 -25084.559350  137956.76160 166227.08310 
6 07/12/2005 3551.701267  141508.46290 166227.08310 
7 08/12/2005 22039.838750  163548.30160 166227.08310 
8 09/12/2005 58217.584280  221765.88590 221765.88590 
9 12/12/2005 -2906.995835  218858.89010 221765.88590 
10 13/12/2005 -31979.028780  186879.86130 221765.88590 

或为numpy的解决方案:

df['peak'] = np.maximum.accumulate(df['cumulative_perf']) 
print (df) 
      date   perf cumulative_perf   peak 
0 29/11/2005 36528.113680  36528.11368 36528.11368 
1 30/11/2005 29034.771940  65562.88563 65562.88563 
2 01/12/2005 47923.504160  113486.38980 113486.38980 
3 02/12/2005 52740.693310  166227.08310 166227.08310 
4 05/12/2005 -3185.762137  163041.166227.08310 
5 06/12/2005 -25084.559350  137956.76160 166227.08310 
6 07/12/2005 3551.701267  141508.46290 166227.08310 
7 08/12/2005 22039.838750  163548.30160 166227.08310 
8 09/12/2005 58217.584280  221765.88590 221765.88590 
9 12/12/2005 -2906.995835  218858.89010 221765.88590 
10 13/12/2005 -31979.028780  186879.86130 221765.88590 
+2

cummax是真棒,我想'滚(2)。应用(最大)',但这并不工作 – Dark

+0

@Bharath仍然工作,检查我的答案:-) – Wen

好了,我们正在使用rollingmax

df['cumulative_perf'].rolling(window=len(df), min_periods=1).max() 
Out[487]: 
0  36528.11368 
1  65562.88563 
2  113486.38980 
3  166227.08310 
4  166227.08310 
5  166227.08310 
6  166227.08310 
7  166227.08310 
8  221765.88590 
9  221765.88590 
10 221765.88590 
Name: cumulative_perf, dtype: float64 
+0

'min_periods'这是我错过了,当我试过我想过len(df)。 – Dark

+0

@Bharath啊,:-) – Wen