根据字符矢量从数据框创建动态列数

问题描述:

我试图给出一列或多列数据给出列表,这些列应包含总和。根据字符矢量从数据框创建动态列数

例如:

set.seed(3550) 
# Creates data frame 
month <- seq.Date(from = as.Date("2012-09-01"), by = "month", length.out = 50) 
a <- rpois(50, 5000) 
b <- rpois(50, 3000) 
c <- rpois(50, 500) 
d <- rpois(50, 1000) 

df <- data.frame(month, a, b, c, d) 
# Creates list of vectors 
mylist <- list(this = "this", that = "that", other = "other") 
mylist$this <- c("a") 
mylist$that <- c("a", "b") 
mylist$other <- c("a", "c", "d") 

我可以得到我想要的结果用下面的代码:

my_df <- df %>% 
    group_by(month) %>% 
    summarize(this = sum(!!!rlang::syms(mylist$this), na.rm = TRUE), 
      that = sum(!!!rlang::syms(mylist$that), na.rm = TRUE), 
      other = sum(!!!rlang::syms(mylist$other), na.rm = TRUE)) 

随着输出是:

# A tibble: 50 x 4 
     month this that other 
     <date> <int> <int> <int> 
1 2012-09-01 4958 7858 6480 
2 2012-10-01 4969 7915 6497 
3 2012-11-01 5012 7978 6483 
4 2012-12-01 4982 7881 6460 
5 2013-01-01 4838 7880 6346 
6 2013-02-01 5090 8089 6589 
7 2013-03-01 5013 8044 6582 
8 2013-04-01 4947 7942 6388 
9 2013-05-01 5065 8124 6506 
10 2013-06-01 5020 8086 6521 
# ... with 40 more rows 

我遇到问题尝试了解如何动态创建汇总列的数量。我认为在总结通话内循环可能会起作用,但事实并非如此。

combine_iterations <- function(x, iter_list){ 
    a <- rlang::syms(names(iter_list)) 
    b <- x %>% 
    group_by(month) %>% 
    summarize(for (i in 1:length(a)){ 
     a[[i]] = sum(!!!rlang::syms(iter_list[i]), na.rm = TRUE) 
    }) 
} 

输出:

Error in lapply(.x, .f, ...) : object 'i' not found 
Called from: lapply(.x, .f, ...) 
+1

什么世界是'!!!'? –

+0

@KyleWeise它是不赞成使用标准评估函数时添加到dplyr中的报价/无报价机制的一部分。具体来说,这就是拼接。 –

你使它多一点复杂;如果你想定制的总结,你可以使用group_by %>% do,避免rlang报价/引文结束问题:

combine_iterations <- function(x, iter_list){ 
    x %>% 
     group_by(month) %>% 
     do(
      as.data.frame(lapply(iter_list, function(cols) sum(.[cols]))) 
    ) 
} 

combine_iterations(df, mylist) 
# A tibble: 50 x 4 
# Groups: month [50] 
#  month this that other 
#  <date> <int> <int> <int> 
# 1 2012-09-01 5144 8186 6683 
# 2 2012-10-01 5134 8090 6640 
# 3 2012-11-01 4949 7917 6453 
# 4 2012-12-01 5040 8203 6539 
# 5 2013-01-01 4971 7938 6474 
# 6 2013-02-01 5050 7924 6541 
# 7 2013-03-01 5018 8022 6579 
# 8 2013-04-01 4945 7987 6476 
# 9 2013-05-01 5134 8114 6590 
#10 2013-06-01 4984 8011 6476 
# ... with 40 more rows 

identical(
    df %>% 
     group_by(month) %>% 
     summarise(this = sum(a), that = sum(a, b), other = sum(a, c, d)), 

    ungroup(combine_iterations(df, mylist)) 
) 
# [1] TRUE 

或者在dopurrr::map_df另一个选择创建数据帧:

combine_iterations <- function(x, iter_list){ 
    x %>% 
     group_by(month) %>% 
     do({ 
      g = . 
      map_df(iter_list, ~ sum(g[.x])) 
     }) 
} 
+0

我也看到了你的解决方案,里面有purrr:map_df()。为什么这个更好?仅仅因为它是在R基础上完成的? –

+0

我其实更喜欢'map_df'来简洁,但认为它可能会带来困惑。我把它添加为第二个选项。 – Psidom