根据另一列替换一列中的NA值
问题描述:
Id authId sessionId
139 "56763313.wrpy" "4233a31b52f92c6fe8af4f04f2116657"
123 "221156400" "ae04ddacadaa3429ca77dab674a008bf"
126 "221156400" "ae04ddacadaa3429ca77dab674a008bf"
144 "221156400" "ae04ddacadaa3429ca77dab674a008bf"
143 "221156400" "ae04ddacadaa3429ca77dab674a008bf"
118 NA "ae04ddacadaa3429ca77dab674a008bf"
121 NA "ae04ddacadaa3429ca77dab674a008bf"
122 NA "ae04ddacadaa3429ca77dab674a008bf"
75 "5676614888888" "ca673b5e60a6f70963bf3017e3cb0780"
276 "56711325.cc79" "f6075188c0f479d7a423744f6c8655b3"
256 "56711325.cc79" "f6075188c0f479d7a423744f6c8655b3"
275 "56711325.cc79" "f6075188c0f479d7a423744f6c8655b3"
152 NA "f6075188c0f479d7a423744f6c8655b3"
158 NA "f6075188c0f479d7a423744f6c8655b3"
28 "221124184" "fc71064548bb35d05293bd67d55f1693"
31 "221124184" "fc71064548bb35d05293bd67d55f1693"
我想根据sessionId
填补缺失的authId
。我试图做到这一点,而不使用循环。例如:根据另一列替换一列中的NA值
143 "221156400" "ae04ddacadaa3429ca77dab674a008bf"
118 "221156400" "ae04ddacadaa3429ca77dab674a008bf"
答
首先创建的authId
和sessionId
独特的组合一个数据帧。然后找到sessionId
任何authId
即NA
。运用独特的表来查找sessionId
的相关authId
:
df <- read.table(text="Id authId sessionId
139 56763313.wrpy 4233a31b52f92c6fe8af4f04f2116657
123 221156400 ae04ddacadaa3429ca77dab674a008bf
126 221156400 ae04ddacadaa3429ca77dab674a008bf
144 221156400 ae04ddacadaa3429ca77dab674a008bf
143 221156400 ae04ddacadaa3429ca77dab674a008bf
118 NA ae04ddacadaa3429ca77dab674a008bf
121 NA ae04ddacadaa3429ca77dab674a008bf
122 NA ae04ddacadaa3429ca77dab674a008bf
75 5676614888888 ca673b5e60a6f70963bf3017e3cb0780
276 56711325.cc79 f6075188c0f479d7a423744f6c8655b3
256 56711325.cc79 f6075188c0f479d7a423744f6c8655b3
275 56711325.cc79 f6075188c0f479d7a423744f6c8655b3
152 NA f6075188c0f479d7a423744f6c8655b3
158 NA f6075188c0f479d7a423744f6c8655b3
28 221124184 fc71064548bb35d05293bd67d55f1693
31 221124184 fc71064548bb35d05293bd67d55f1693", header=T)
# find unique combinations of authId and sessionID, but not when authId is NA
uniques <- unique(df[c("authId", "sessionId")])
uniques <- uniques[!is.na(uniques$authId),]
# replace authID's that are NA with the unique authId associated with the sessionId
na.authId <- which(is.na(df$authId))
na.sessionId <- df$sessionId[na.authId]
df$authId[na.indices] <- uniques$authId[match(na.sessionId, uniques$sessionId)]
# Id authId sessionId
# 1 139 56763313.wrpy 4233a31b52f92c6fe8af4f04f2116657
# 2 123 221156400 ae04ddacadaa3429ca77dab674a008bf
# 3 126 221156400 ae04ddacadaa3429ca77dab674a008bf
# 4 144 221156400 ae04ddacadaa3429ca77dab674a008bf
# 5 143 221156400 ae04ddacadaa3429ca77dab674a008bf
# 6 118 221156400 ae04ddacadaa3429ca77dab674a008bf
# 7 121 221156400 ae04ddacadaa3429ca77dab674a008bf
# 8 122 221156400 ae04ddacadaa3429ca77dab674a008bf
# 9 75 5676614888888 ca673b5e60a6f70963bf3017e3cb0780
# 10 276 56711325.cc79 f6075188c0f479d7a423744f6c8655b3
# 11 256 56711325.cc79 f6075188c0f479d7a423744f6c8655b3
# 12 275 56711325.cc79 f6075188c0f479d7a423744f6c8655b3
# 13 152 56711325.cc79 f6075188c0f479d7a423744f6c8655b3
# 14 158 56711325.cc79 f6075188c0f479d7a423744f6c8655b3
# 15 28 221124184 fc71064548bb35d05293bd67d55f1693
# 16 31 221124184 fc71064548bb35d05293bd67d55f1693
供参考:这是没有必要的语言添加到标题。这就是标签的用途。 – joran