如何从python的时间序列数据中找到分钟平均值?
问题描述:
我有元组的阵列,像这样:如何从python的时间序列数据中找到分钟平均值?
timeseries_array = [
(datetime.datetime(2017, 4, 18, 16, 57, 21, 888778), 10),
(datetime.datetime(2017, 4, 18, 16, 57, 35, 712351), 36),
(datetime.datetime(2017, 4, 18, 16, 57, 46, 831850), 70),
(datetime.datetime(2017, 4, 18, 16, 58, 0, 255499), 52),
(datetime.datetime(2017, 4, 18, 16, 58, 11, 138477), 34),
(datetime.datetime(2017, 4, 18, 16, 58, 22, 902610), 44),
(datetime.datetime(2017, 4, 18, 16, 58, 38, 206132), 106),
(datetime.datetime(2017, 4, 18, 16, 58, 53, 624415), 81),
(datetime.datetime(2017, 4, 18, 16, 59, 6, 301157), 56),
]
每个元组是一个(日期,值)。
一旦我们增加新的数据到这个阵列,它是一个新的记录数据,我想看看最后两分钟,并比较他们的数据的平均值。
所以在这个例子中,一旦我们增加了59分钟的数据,我想找到内分58和57分钟找到的数据的平均值,并比较两者。
分57平均是38.7和58分钟平均是63.4。
最好的方法是什么?也许有一个我应该使用的Python库?
答
下面是做到这一点的一种方法:
from __future__ import division
def timeseries_averages(timeseries_array):
unique_minutes = set(m[0].minute for m in timeseries_array)
for v in unique_minutes:
print 'The average of the {} minute value is {}'.format(v, sum([m[1] for m in timeseries_array if m[0].minute == v])/len([m[1] for m in timeseries_array if m[0].minute == v]))
The average of the 57 minute value is 38.6666666667
The average of the 58 minute value is 63.4
The average of the 59 minute value is 56.0
答
这是[itertools.groupby
]一个完美的应用[1]。当您将值添加到列表中时,请保持您已经看到的几分钟的“高水位标记”。当新的分钟值到达时,使用groupby
仅处理列表的最后部分(索引大于高水位标记)以将列表分成块。在每个块上计算任何你想要的。
import datetime
import itertools
timeseries_array = []
timeseries_mark = len(timeseries_array)
last_minute = 57
for new_value in [
(datetime.datetime(2017, 4, 18, 16, 57, 21, 888778), 10),
(datetime.datetime(2017, 4, 18, 16, 57, 35, 712351), 36),
(datetime.datetime(2017, 4, 18, 16, 57, 46, 831850), 70),
(datetime.datetime(2017, 4, 18, 16, 58, 0, 255499), 52),
(datetime.datetime(2017, 4, 18, 16, 58, 11, 138477), 34),
(datetime.datetime(2017, 4, 18, 16, 58, 22, 902610), 44),
(datetime.datetime(2017, 4, 18, 16, 58, 38, 206132), 106),
(datetime.datetime(2017, 4, 18, 16, 58, 53, 624415), 81),
(datetime.datetime(2017, 4, 18, 16, 59, 6, 301157), 56),
# Minute 00
(datetime.datetime(2017, 4, 18, 17, 00, 1, 000), 33),
]:
minute = new_value[0].minute
if minute != last_minute:
tail = timeseries_array[timeseries_mark:]
advance = None
for m, group in itertools.groupby(tail, key=lambda tpl: tpl[0].minute):
values = list(group)
total = sum([tpl[1] for tpl in values])
avg = total/len(values)
print("Average at minute {} is {}".format(m, avg))
if advance is None:
advance = len(values)
if advance is None:
print("Advance is none. Why?")
else:
timeseries_mark += advance
last_minute = minute
timeseries_array.append(new_value)
你是什么意思的平均?无法确切日期加时间对象 –
@DmitryPolonskiy,你可以在tuple.The在列表中位置平均的第二个值最后两个项目'(,)[1]''是和81''56' – Jon
@乔恩我没有意识到这就是我们的平均水平。 –