具有相同行的个人组

问题描述：

我正在处理包含10.000个人的数据。数据有8个二进制（0,1）变量。如果调查模块存在== 1或不== 0，则每个变量是一个指标。总体而言，每个个体可能有2^8 = 256个可能的0和1的组合，并且每个个体都可能是。具有相同行的个人组

目标：我想将具有相同行的个人（即参与相同模块的个人）组合在一起。

我的数据看起来像下面的例子与onlye三个变量：

# example 
dat <- data.frame(id = 1:8,   # unique ID 
        v1 = rep(0:1, 4), 
        v2 = rep(1:0, 4), 
        v3 = rep(1:1, 4)) 

# I can find the unique rows 
unique(dat[ , -1]) 

# I also can count the number of occurence of the unique rows (as suggested by http://stackoverflow.com/questions/12495345/find-indices-of-duplicated-rows) 
library(plyr) 
ddply(dat[ , -1], .(v1, v2, v3), nrow) 

# But I need the information of the occurence on the individual level like this: 
dat$v4 <- rep(c("group1", "group2"), 4) 

# The number of rows alone is not sufficient because, different combinations can be the same counting

'互动（DAT [-1 ]，drop = TRUE）' – user20650

难道你不能只用'（dat，v1 + 2 * v2 + 4 * v3）'作为分组变量吗？ –

谢谢@ user20650 !!!这有助于并且是一个非常简单的解决方案 – maller

答

我建议你从 “data.table” .GRP此：

library(data.table) 
> as.data.table(dat)[, v4 := sprintf("group_%s", .GRP), .(v1, v2, v3)][] 
    id v1 v2 v3  v4 
1: 1 0 1 1 group_1 
2: 2 1 0 1 group_2 
3: 3 0 1 1 group_1 
4: 4 1 0 1 group_2 
5: 5 0 1 1 group_1 
6: 6 1 0 1 group_2 
7: 7 0 1 1 group_1 
8: 8 1 0 1 group_2

具有相同行的个人组

相关推荐