根据字符矢量从数据框创建动态列数
问题描述:
我试图给出一列或多列数据给出列表,这些列应包含总和。根据字符矢量从数据框创建动态列数
例如:
set.seed(3550)
# Creates data frame
month <- seq.Date(from = as.Date("2012-09-01"), by = "month", length.out = 50)
a <- rpois(50, 5000)
b <- rpois(50, 3000)
c <- rpois(50, 500)
d <- rpois(50, 1000)
df <- data.frame(month, a, b, c, d)
# Creates list of vectors
mylist <- list(this = "this", that = "that", other = "other")
mylist$this <- c("a")
mylist$that <- c("a", "b")
mylist$other <- c("a", "c", "d")
我可以得到我想要的结果用下面的代码:
my_df <- df %>%
group_by(month) %>%
summarize(this = sum(!!!rlang::syms(mylist$this), na.rm = TRUE),
that = sum(!!!rlang::syms(mylist$that), na.rm = TRUE),
other = sum(!!!rlang::syms(mylist$other), na.rm = TRUE))
随着输出是:
# A tibble: 50 x 4
month this that other
<date> <int> <int> <int>
1 2012-09-01 4958 7858 6480
2 2012-10-01 4969 7915 6497
3 2012-11-01 5012 7978 6483
4 2012-12-01 4982 7881 6460
5 2013-01-01 4838 7880 6346
6 2013-02-01 5090 8089 6589
7 2013-03-01 5013 8044 6582
8 2013-04-01 4947 7942 6388
9 2013-05-01 5065 8124 6506
10 2013-06-01 5020 8086 6521
# ... with 40 more rows
我遇到问题尝试了解如何动态创建汇总列的数量。我认为在总结通话内循环可能会起作用,但事实并非如此。
combine_iterations <- function(x, iter_list){
a <- rlang::syms(names(iter_list))
b <- x %>%
group_by(month) %>%
summarize(for (i in 1:length(a)){
a[[i]] = sum(!!!rlang::syms(iter_list[i]), na.rm = TRUE)
})
}
输出:
Error in lapply(.x, .f, ...) : object 'i' not found
Called from: lapply(.x, .f, ...)
答
你使它多一点复杂;如果你想定制的总结,你可以使用group_by %>% do
,避免rlang
报价/引文结束问题:
combine_iterations <- function(x, iter_list){
x %>%
group_by(month) %>%
do(
as.data.frame(lapply(iter_list, function(cols) sum(.[cols])))
)
}
combine_iterations(df, mylist)
# A tibble: 50 x 4
# Groups: month [50]
# month this that other
# <date> <int> <int> <int>
# 1 2012-09-01 5144 8186 6683
# 2 2012-10-01 5134 8090 6640
# 3 2012-11-01 4949 7917 6453
# 4 2012-12-01 5040 8203 6539
# 5 2013-01-01 4971 7938 6474
# 6 2013-02-01 5050 7924 6541
# 7 2013-03-01 5018 8022 6579
# 8 2013-04-01 4945 7987 6476
# 9 2013-05-01 5134 8114 6590
#10 2013-06-01 4984 8011 6476
# ... with 40 more rows
identical(
df %>%
group_by(month) %>%
summarise(this = sum(a), that = sum(a, b), other = sum(a, c, d)),
ungroup(combine_iterations(df, mylist))
)
# [1] TRUE
或者在do
与purrr::map_df
另一个选择创建数据帧:
combine_iterations <- function(x, iter_list){
x %>%
group_by(month) %>%
do({
g = .
map_df(iter_list, ~ sum(g[.x]))
})
}
+0
我也看到了你的解决方案,里面有purrr:map_df()。为什么这个更好?仅仅因为它是在R基础上完成的? –
+0
我其实更喜欢'map_df'来简洁,但认为它可能会带来困惑。我把它添加为第二个选项。 – Psidom
什么世界是'!!!'? –
@KyleWeise它是不赞成使用标准评估函数时添加到dplyr中的报价/无报价机制的一部分。具体来说,这就是拼接。 –