如何计算R中不平衡面板数据的逐年平均值?

问题描述:

我有一个季度的非平衡面板数据是这样的:如何计算R中不平衡面板数据的逐年平均值?

Firm Date Var_1    
AAA 19701130 24.46    
AAA 197NA    
AAA 19710131 NA    
AAA 19710228 34.19325     
AAA 19710331 NA    
AAA 19710430 NA    
AAA 19710531 29.0235    
AAA 19710630 NA    
AAA 19710731 NA    
AAA 19710831 16.256875    
AAA 19710930 NA    
AAA 19711031 NA    
AAA 19711130 17.22125     
AAA 19711231 NA    
BBB 19730630 4.57     
BBB 19730731 NA    
BBB 19730831 NA    
BBB 19730930 8.736    
BBB 19731031 NA    
BBB 19731130 NA    
BBB 19731231 4.97     
BBB 19740131 NA    
BBB 19740228 NA    
BBB 19740331 6.85125    
BBB 19740430 NA    
BBB 19740531 NA    
BBB 19740630 6.87225    
BBB 19740731 NA    
BBB 19740831 NA    
BBB 19740930 5.454875     
BBB 19741031 NA    
BBB 19741130 NA    
BBB 19741231 4.56875    
BBB 19750131 NA    
BBB 19750228 NA    
BBB 19750331 6.276    
BBB 19750430 NA    
BBB 19750531 NA    
BBB 19750630 6.0145    
BBB 19750731 NA    
BBB 19750831 NA    
BBB 19750930 8.376    
BBB 19751031 NA    
BBB 19751130 NA    
BBB 19751231 9.17875    

实际数据继续行数万。这里要指出的是,每个公司在不同的月末报告。我如何计算每个公司每年Var_1的平均值?最终的结果应该是年份而不是季度。理想的结果将如下所示

Firm Date Var_1 
AAA  1970 24.46 
AAA  1971 24.17 
BBB  1973 6.09  
BBB  1974 5.94  
BBB  1975 7.46  

我们可以使用其中一个功能组。在“公司”和“日期”的子字符串分组后,获得'Var_1'的mean

library(dplyr) 
df1 %>% 
    group_by(Firm, Date = substr(Date, 1,4)) %>% 
    summarise(Var_1 = round(mean(Var_1, na.rm = TRUE), 2)) 
# Firm Date Var_1 
# <chr> <chr> <dbl> 
#1 AAA 1970 24.46 
#2 AAA 1971 24.17 
#3 BBB 1973 6.09 
#4 BBB 1974 5.94 
#5 BBB 1975 7.46 

或者与aggregatebase R

aggregate(Var_1~., transform(df1, Date = substr(Date, 1, 4)), FUN = mean, na.rm = TRUE)