MATLAB：一个时间序列的每个1分钟间隔的计算均值

问题描述：

我有一堆时间序列，每个时间序列由两个分量描述，一个时间戳向量（以秒为单位）和一个测量值向量。时间向量是不均匀的（即以非规律间隔取样）MATLAB：一个时间序列的每个1分钟间隔的计算均值

我试图计算每个1分钟间隔值的平均值/ SD（以X分钟间隔计算其平均值，采取下一步间隔，...）。

我目前的实现使用循环。这是我迄今为止的样本：

t = (100:999)' + rand(900,1);  %' non-uniform time 
x = 5*rand(900,1) + 10;    % x(i) is the value at time t(i) 

interval = 1;   % 1-min interval 
tt = (floor(t(1)):interval*60:ceil(t(end)))'; %' stopping points of each interval 
N = length(tt)-1; 

mu = zeros(N,1); 
sd = zeros(N,1); 

for i=1:N 
    indices = (tt(i) <= t & t < tt(i+1)); % find t between tt(i) and tt(i+1) 
    mu(i) = mean(x(indices)); 
    sd(i) = std(x(indices)); 
end

我想知道是否有更快的矢量化解决方案。这很重要，因为我有大量的时间序列处理每个比上面显示的示例长得多..

任何帮助，欢迎。

谢谢大家的反馈。

我纠正t生成为总是单调递增（排序）的方式，这是不是一个真正的问题..

而且，我可能没有这个明确说明，但我的目的是为了有一个解决方案以分钟为单位的任何间隔长度（1分钟只是一个示例）

答

唯一合乎逻辑的解决方案似乎是...

好的。我觉得有趣的是，对我而言，只有一个合乎逻辑的解决方案，但其他许多解决方案都可以找到。无论如何，解决方案看起来很简单。鉴于向量x和t和一组等距间隔的断点TT的，

t = sort((100:999)' + 3*rand(900,1));  % non-uniform time 
x = 5*rand(900,1) + 10;    % x(i) is the value at time t(i) 

tt = (floor(t(1)):1*60:ceil(t(end)))';

（请注意，我来分类上述吨。）

我会这样的代码三个完全矢量化线。首先，如果休息是武断和间距潜在的不平等，我会用histc确定哪个间隔数据系列落在鉴于他们是一致的，只是这样做：

int = 1 + floor((t - t(1))/60);

同样，如果T的元素不知道被排序，我会用min（t）而不是t（1）。完成之后，使用准确的结果将结果降至平均值和标准偏差。

mu = accumarray(int,x,[],@mean); 
sd = accumarray(int,x,[],@std);

+1：出于某种原因，我完全忽略了ACCUMARRAY。 – gnovice 2010-02-24 17:21:06

谢谢，这是既简洁又易于阅读 – merv 2010-02-24 22:31:08

我甚至不知道有关准确的。感谢您证明它是多么有用！ – Jonas 2010-02-25 01:54:30

答

您可以尝试创建单元格数组，并通过cellfun应用mean和std。对于900个条目，它比您的解决方案慢10％，但对于90000个条目，速度要快10倍。

[t,sortIdx]=sort(t); %# we only need to sort in case t is not monotonously increasing 
x = x(sortIdx); 

tIdx = floor(t/60); %# convert seconds to minutes - can also convert to 5 mins by dividing by 300 
tIdx = tIdx - min(tIdx) + 1; %# tIdx now is a vector of indices - i.e. it starts at 1, and should go like your iteration variable. 

%# the next few commands are to count how many 1's 2's 3's etc are in tIdx 
dt = [tIdx(2:end)-tIdx(1:end-1);1]; 
stepIdx = [0;find(dt>0)]; 
nIdx = stepIdx(2:end) - stepIdx(1:end-1); %# number of times each index appears 

%# convert to cell array 
xCell = mat2cell(x,nIdx,1); 

%# use cellfun to calculate the mean and sd 
mu(tIdx(stepIdx+1)) = cellfun(@mean,xCell); %# the indexing is like that since there may be missing steps 
sd(tIdx(stepIdx+1)) = cellfun(@mean,xCell);

注：我的解决方案并没有给出确切的相同的结果是你的，因为你在最后跳过一些时间值（1:60:90为[1,61]），并且自开始间隔不完全相同。

谢谢！我有几个要点：[1]你说得对，我生成't'的方式可能并不总是单调递增，这不是意图！ [2]尽管我还在破译代码，但我确实需要区间长度为参数化（5分钟是我现在正在进行的工作，但应该很容易更改）... – merv 2010-02-24 03:10:10

[3]真相是在你计算'stepIdx'之后我有点迷路了:)能解释一下'nIdx'代表什么？我得到计算每个时间戳的分钟部分的部分，然后通过差异找出它发生变化的地方，表明下一个1分钟的时间间隔，但之后我无法跟上。 – merv 2010-02-24 03:11:08

nIdx是每个索引出现的次数。我需要这个能够使用mat2cell，它将前n个值分配到第一个单元格，第二个单元格中的第二个n值等，从而对属于每个时间间隔的索引进行分组。我希望额外的评论有助于使它更清晰。对不起，编写难以阅读的代码。我应该（已经）在做一些不同的事情，所以我匆匆回答了这个问题:) – Jonas 2010-02-24 03:27:13

答

你可以计算indices一次性使用bsxfun：

indices = (bsxfun(@ge, t, tt(1:end-1)') & bsxfun(@lt, t, tt(2:end)'));

这比循环快，但需要一次存储所有这些（时间与空间的权衡）..

我喜欢这个。唯一的问题是，如果没有for循环，我不能直接使用索引：执行'x（indices）'不起作用，我必须：'for i = 1：N，x（indices（：，i）），end' – merv 2010-02-24 22:38:57

答

这里有一个方法，它使用binary search。对于9900个元素，速度是6-10倍，对于99900个元素，速度要快64倍。使用900个元素很难获得可靠的时间，所以我不确定哪个尺寸更快。如果考虑直接从生成的数据生成tx，它几乎不会使用额外的内存。除此之外，它只有四个额外的浮点变量（prevind，first，mid和last）。

% Sort the data so that we can use binary search (takes O(N logN) time complexity). 
tx = sortrows([t x]); 

prevind = 1; 

for i=1:N 
    % First do a binary search to find the end of this section 
    first = prevind; 
    last = length(tx); 
    while first ~= last 
     mid = floor((first+last)/2); 
     if tt(i+1) > tx(mid,1) 
      first = mid+1; 
     else 
      last = mid; 
     end; 
    end; 
    mu(i) = mean(tx(prevind:last-1,2)); 
    sd(i) = std(tx(prevind:last-1,2)); 
    prevind = last; 
end;

它使用您原来的所有变量。我希望它适合你的需求。它更快，因为它需要O（log N）通过二分搜索来查找索引，但是O（N）可以按照您所做的方式找到它们。

如果预先指定mu和sd而不是在循环内部生成它们，这应该会更快。 – Jonas 2010-02-24 12:11:44

@Jonas我认为这将暗示，因为它是在提问者的代码。这只是取代提问者代码的最后5行。我认为最后5行是慢的。 – 2010-02-24 15:42:58

二进制搜索（带循环）比我开始的矢量化矢量比较快吗？ – merv 2010-02-24 22:35:28

答

免责声明：我工作了这一点，在纸面上，但尚未有机会“硅”，以检查它...

您可能能够避免产生循环或通过使用干细胞阵列一些棘手的累积和，索引和自己计算平均值和标准偏差。下面是一些代码，我相信会的工作，虽然我不能确定它如何快速明智的其他解决方案：以上

[t,sortIndex] = sort(t); %# Sort the time points 
x = x(sortIndex);   %# Sort the data values 
interval = 60;   %# Interval size, in seconds 

intervalIndex = floor((t-t(1))./interval)+1; %# Collect t into intervals 
nIntervals = max(intervalIndex);    %# The number of intervals 
mu = zeros(nIntervals,1);      %# Preallocate mu 
sd = zeros(nIntervals,1);      %# Preallocate sd 

sumIndex = [find(diff(intervalIndex)) ... 
      numel(intervalIndex)]; %# Find indices of the interval ends 
n = diff([0 sumIndex]);    %# Number of samples per interval 
xSum = cumsum(x);     %# Cumulative sum of x 
xSum = diff([0 xSum(sumIndex)]); %# Sum per interval 
xxSum = cumsum(x.^2);    %# Cumulative sum of x^2 
xxSum = diff([0 xxSum(sumIndex)]); %# Squared sum per interval 

intervalIndex = intervalIndex(sumIndex); %# Find index into mu and sd 
mu(intervalIndex) = xSum./n;        %# Compute mean 
sd(intervalIndex) = sqrt((xxSum-xSum.*xSum./n)./(n-1)); %# Compute std dev

计算使用the simplification of the formula found on this Wikipedia page标准偏差。

感谢您的回应，我想这将是有趣的比较时间与其他解决方案。 – merv 2010-02-24 22:41:39

答

与上述相同的答案，但与参数区间（window_size）。也解决了矢量长度问题。

window_size = 60; % but it can be any value 60 5 0.1, which wasn't described above 

t = sort((100:999)' + 3*rand(900,1));  % non-uniform time 
x = 5*rand(900,1) + 10;     % x(i) is the value at time t(i) 

int = 1 + floor((t - t(1))/window_size); 
tt = (floor(t(1)):window_size:ceil(t(end)))'; 



% mean val and std dev of the accelerations at speed 
mu = accumarray(int,x,[],@mean); 
sd = accumarray(int,x,[],@std); 

%resolving some issue with sizes (for i.e. window_size = 1 in stead of 60) 
while (sum(size(tt) > size(mu)) > 0) 
    tt(end)=[]; 
end 

errorbar(tt,mu,sd);

MATLAB：一个时间序列的每个1分钟间隔的计算均值

相关推荐