if语句和循环将新数据追加到原始数据帧

问题描述:

我有,我想重新融入原始数据文件中的一些模式输出。我能做到这一点使用嵌套ifelse(),但是我想办法推广的过程中,这样我可以跨多个数据集批处理进程运行它。这是我最初尝试的。if语句和循环将新数据追加到原始数据帧

模型输出对应于时间块,而每个原始数据点与离散时间相关联。

,我决定在同一时间(这里是第一天一个参数的一个例子)手动运行一天,这个非常大的,丑陋的ifelse能够正确地汇总数据。

track[,"phase"]= ifelse((phaseTable1$start[1]<=track$Time)& (track$Time< phaseTable1$end[1]), phaseTable1$phase[1], 
        ifelse((phaseTable1$start[2]<=track$Time)& (track$Time< phaseTable1$end[2]), phaseTable1$phase[2], 
         ifelse((phaseTable1$start[3]<=track$Time)& (track$Time< phaseTable1$end[3]), phaseTable1$phase[3], 
           ifelse((phaseTable1$start[4]<=track$Time)& (track$Time< phaseTable1$end[4]), phaseTable1$phase[4], 
             ifelse((phaseTable1$start[5]<=track$Time)& (track$Time< phaseTable1$end[5]), phaseTable1$phase[5], 
               ifelse((phaseTable1$start[6]<=track$Time)& (track$Time< phaseTable1$end[6]), phaseTable1$phase[6], 
                ifelse((phaseTable1$start[7]<=track$Time)& (track$Time< phaseTable1$end[7]), phaseTable1$phase[7], 
                  ifelse((phaseTable1$start[8]<=track$Time)& (track$Time< phaseTable1$end[8]), phaseTable1$phase[8], 
                    ifelse((phaseTable1$start[9]<=track$Time)& (track$Time< phaseTable1$end[9]), phaseTable1$phase[9], 
                      ifelse((phaseTable1$start[10]<=track$Time)& (track$Time< phaseTable1$end[10]), phaseTable1$phase[10], 
                       ifelse((phaseTable1$start[11]<=track$Time)& (track$Time< phaseTable1$end[11]), phaseTable1$phase[11], 
                         ifelse((phaseTable1$start[12]<=track$Time)& (track$Time< phaseTable1$end[12]), phaseTable1$phase[12], 
                           ifelse((phaseTable1$start[13]<=track$Time)& (track$Time< phaseTable1$end[13]), phaseTable1$phase[13], 
                             ifelse((phaseTable1$start[14]<=track$Time)& (track$Time<phaseTable1$end[14]), phaseTable1$phase[14], 
                              ifelse((phaseTable1$start[15]<=track$Time)& (track$Time< phaseTable1$end[15]), phaseTable1$phase[15], 
                                ifelse((phaseTable1$start[16]<=track$Time)& (track$Time< phaseTable1$end[16]), phaseTable1$phase[16], 
                                  ifelse((phaseTable1$start[17]<=track$Time)& (track$Time< phaseTable1$end[17]), phaseTable1$phase[17], 
                                    ifelse((phaseTable1$start[18]<=track$Time)& (track$Time< phaseTable1$end[18]), phaseTable1$phase[18], 
                                     ifelse((phaseTable1$start[19]<=track$Time)& (track$Time< phaseTable1$end[19]), phaseTable1$phase[19], 
                                       ifelse((phaseTable1$start[20]<=track$Time)& (track$Time< phaseTable1$end[20]), phaseTable1$phase[20], 
                                         ifelse((phaseTable1$start[21]<=track$Time)& (track$Time< phaseTable1$end[21]), phaseTable1$phase[21], 
                                           ifelse((phaseTable1$start[22]<=track$Time)& (track$Time< phaseTable1$end[22]), phaseTable1$phase[22], 
                                            ifelse((phaseTable1$start[23]<=track$Time)& (track$Time< phaseTable1$end[23]), phaseTable1$phase[23], 
                                              ifelse((phaseTable1$start[24]<=track$Time)& (track$Time< phaseTable1$end[24]), phaseTable1$phase[24], 
                                                ifelse((phaseTable1$start[25]<=track$Time)& (track$Time< phaseTable1$end[25]), phaseTable1$phase[25], 
                                                  ifelse((phaseTable1$start[26]<=track$Time)& (track$Time< phaseTable1$end[26]), phaseTable1$phase[26], 
                                                   ifelse((phaseTable1$start[27]<=track$Time)& (track$Time< phaseTable1$end[27]), phaseTable1$phase[27], 
                                                     ifelse((phaseTable1$start[28]<=track$Time)& (track$Time< phaseTable1$end[28]), phaseTable1$phase[28], 
                                                       ifelse((phaseTable1$start[29]<=track$Time)& (track$Time< phaseTable1$end[29]), phaseTable1$phase[29], 
                                                         ifelse((phaseTable1$start[30]<=track$Time)& (track$Time< phaseTable1$end[30]), phaseTable1$phase[30], 
                                                          ifelse((phaseTable1$start[31]<=track$Time)& (track$Time< phaseTable1$end[31]), phaseTable1$phase[31], 
                                                            ifelse((phaseTable1$start[32]<=track$Time)& (track$Time< phaseTable1$end[32]), phaseTable1$phase[32], 
                                                              ifelse((phaseTable1$start[33]<=track$Time)& (track$Time< phaseTable1$end[33]), phaseTable1$phase[33], 
                                                                ifelse((phaseTable1$start[34]<=track$Time)& (track$Time< phaseTable1$end[34]), phaseTable1$phase[34], 
                                                                 ifelse((phaseTable1$start[35]<=track$Time)& (track$Time< phaseTable1$end[35]), phaseTable1$phase[35],phaseTable1$phase[35] 

                                                        ))))))))))))))))))))))))))))))))))) 

这个工作,但它是相当笨拙,嵌套条件的数量从每天的数据内变化一天。

我试图返工这个融入了更多实用的循环

for (j in 1:nrow(phaseTable1)){ 
if((phaseTable1$start[j]<=track$Time)&(track$Time< phaseTable1$end[j])){track$tau== phaseTable1$tau[j]} 

} 

并不断再次得到这样的警告,导致没有数据

In if ((phaseTable1$start[j] <= track$Time) & (track$Time < ... the condition has length > 1 and only the first element will be used 

我试过被聚合这样

for (j in 1:nrow(phaseTable1)){ 
     track$phase<-ifelse(((phaseTable1$star [j]<=track$Time)&(track$Time< phaseTable1$end[j])), phaseTable1$phase[j],""))) 
} 

而出现新列的数据帧,但是它们是空的。

我试图再次使用在一篇博客文章,这也导致了错误建议thatssorandom包的包装。

for (j in 1:nrow(phaseTable1)){ 
ie(
    i(((phaseTable1$start[j]<=track$Time)&(track$Time< phaseTable1$end[j])),track$phase<- phaseTable1$phase[j]), 
e("na")) 

    } 

有我正在做或有另一种解决方案,实现什么,我试图做一个明显的错误?我承认我是一个相对业余的用户,我已经探讨了其他ifelse论坛的问题,但一直没能弄清楚我做错了什么。我有一个工作循环,可以让我在数据框中每天运行我的模型。如果我能够让下一个循环运行,那么我将能够将它嵌套到第一个循环中,并且能够批量聚合数据。任何有关解决方案的洞察将非常感谢!

如果没有数据集的工作,这可以用findInterval

df1 <- data.frame(start = seq(as.POSIXct("2017-08-07 00:00:00"), by = "hour", length.out = 24)) 
df1$end <- df1$start + 3600 
df1$phase <- letters[seq_len(nrow(df1))] 

v <- findInterval(c(as.POSIXct("2017-08-07 02:38:24"), as.POSIXct("2017-08-07 21:59:59")), df1$start) 
df1$phase[v] 
[1] "c" "v" 

来完成除非有,不需要结束时间


对于第一时间间隔之间的间隙错误,请看?&

& and && indicate logical AND and | and || indicate logical OR. The shorter form performs elementwise comparisons in much the same way as arithmetic operators. The longer form evaluates left to right examining only the first element of each vector. Evaluation proceeds only until the result is determined. The longer form is appropriate for programming control-flow and typically preferred in if clauses.

第二个错误:输入错误phaseTable1$star [j]应该phaseTable1$start[j]

第三个错误:输入错误i应该if

+0

这太好了,谢谢!我不知道findInterval()。长表和短表之间的区别也很有帮助。 – sea83

我发现,似乎是工作的解决方案。不得不重新考虑我如何设置循环。

for (j in 1:nrow(phaseTable1)){ 
for (k in 1:nrow(track)){ 
if((phaseTable1$start[j]<=track$Time[k])&(track$Time[k]< phaseTable1$end[j])){track$model[k]= phaseTable1$model[j]} 

if((phaseTable1$start[j]<=track$Time[k])&(track$Time[k]< phaseTable1$end[j])){track$phase[k]= phaseTable1$phase[j]} 

if((phaseTable1$start[j]<=track$Time[k])&(track$Time[k]< phaseTable1$end[j])){track$tau[k]= phaseTable1$tau[j]} 

if((phaseTable1$start[j]<=track$Time[k])&(track$Time[k]< phaseTable1$end[j])){track$eta[k]= phaseTable1$eta[j]} 

} 

} 
+0

嵌套'for'循环不应该需要这个。如果您发布一些具有所需输出的样本数据,则可能有更好的方法来完成此操作。 – manotheshark