Tapply仅产生缺失值
问题描述:
我正在尝试生成某个国家给定城市内天主教徒百分比的估计值,并使用多级回归和调查数据后分层。Tapply仅产生缺失值
该方法适合多级Logit并生成因变量的预测概率。然后使用样本的事后分级对人口普查数据加权概率。
我可以生成初始估计值(基本上就是调查数据中给定个体的天主教徒的预测概率)。但是,当我尝试使用下面最后一行代码取平均值时,它只返回不适用于每个城市。最初的细胞预测有一些缺失的值,但远不及大多数。
我不明白为什么我不能生成市政加权平均数,因为我遵循使用不同数据的程序。任何帮助将不胜感激。
rm(list=ls(all=TRUE))
library("arm")
library("foreign")
#read in megapoll and attach
ES.data <- read.dta("ES4.dta", convert.underscore = TRUE)
#read in municipal-level dataset
munilevel <- read.dta("election.dta",convert.underscore = TRUE)
munilevel <- munilevel[order(munilevel$municode),]
#read in Census data
Census <- read.dta("poststratification4.dta",convert.underscore = TRUE)
Census <- Census[order(Census$municode),]
Census$municode <- match(Census$municode, munilevel$municode)
#Create index variables
#At level of megapoll
ES.data$ur.female <- (ES.data$female *2) + ES.data$ur
ES.data$age.edr <- 6 * (ES.data$age -1) + ES.data$edr
#At census level (same coding as above for all variables)
Census$cur.cfemale <- (Census$cfemale *2) + Census$cur
Census$cage.cedr <- 6 * (Census$cage -1) + Census$cedr
##Municipal level variables
Census$c.arena<- munilevel$c.arena[Census$municode]
Census$c.fmln <- munilevel$c.fmln[Census$municode]
#run individual-level opinion model
individual.model1 <- glmer(formula = catholic ~ (1|ur.female) + (1|age)
+ (1|edr) + (1|age.edr) + (1|municode) + p.arena +p.fmln
,data=ES.data, family=binomial(link="logit"))
display(individual.model1)
#examine random effects and standard errors for urban-female
ranef(individual.model1)$ur.female
se.ranef(individual.model1)$ur.female
#create vector of state ranefs and then fill in missing ones
muni.ranefs <- array(NA,c(66,1))
dimnames(muni.ranefs) <- list(c(munilevel$municode),"effect")
for(i in munilevel$municode){
muni.ranefs[i,1] <- ranef(individual.model1)$municode[i,1]
}
muni.ranefs[,1][is.na(muni.ranefs[,1])] <- 0 #set states with missing REs (b/c not in data) to zero
#create a prediction for each cell in Census data
cellpred1 <- invlogit(fixef(individual.model1)["(Intercept)"]
+ranef(individual.model1)$ur.female[Census$cur.cfemale,1]
+ranef(individual.model1)$age[Census$cage,1]
+ranef(individual.model1)$edr[Census$cedr,1]
+ranef(individual.model1)$age.edr[Census$cage.cedr,1]
+muni.ranefs[Census$municode,1]
+(fixef(individual.model1)["p.fmln"] *Census$c.fmln) # municipal level
+(fixef(individual.model1)["p.arena"] *Census$c.arena)) # municipal level
#weights the prediction by the freq of cell
cellpredweighted1 <- cellpred1 * Census$cpercent.muni
#calculates the percent within each municipality (weighted average of responses)
munipred <- 100* as.vector(tapply(cellpredweighted1, Census$municode, sum))
munipred
答
大量的代码是完全没有数据的冗余!我想你在对象cellpredweighted1
中有NA
s,默认情况下sum()
将NA
s传播给答案,因为如果一个向量的一个或多个元素是NA
那么根据定义,那些元素的总和也是NA
。
如果上面是这里的情况,那么简单地将na.rm = TRUE
添加到tapply()
调用应该可以解决问题。
tapply(cellpredweighted1, Census$municode, sum, na.rm = TRUE)
你应该问自己,为什么有在这个阶段,如果从早期的过程中的错误,这些结果是NA
秒。