R data.table重塑数据
问题描述:
我使用data.table重塑数据。R data.table重塑数据
library(data.table)
market <- data.table(
stkcd=c(1,2),
type =c(1,0),
roa2013=c(2,3),
roa2014=c(4,5),
lev2013=c(6,7),
lev2016=c(8,9))
market
# stkcd type roa2013 roa2014 lev2013 lev2016
# 1: 1 1 2 4 6 8
# 2: 2 0 3 5 7 9
melt(market,
measure.vars = patterns("^roa", "^lev"),
variable.name = "year",
value.name = c("roa","lev"))
# stkcd type year roa lev
# 1: 1 1 1 2 6
# 2: 2 0 1 3 7
# 3: 1 1 2 4 8
# 4: 2 0 2 5 9
这就是最终数据的样子。
# stkcd type year roa lev
# 1 1 1 2013 2 6
# 2 1 1 2014 4 NA
# 3 1 1 2016 NA 8
# 4 2 0 2013 3 7
# 5 2 0 2014 5 NA
# 6 2 0 2016 NA 9
有没有人有任何好的方法呢? 谢谢。
答
我们可以用splitstackshape
轻松做到这一点。创建感兴趣的列的数字和非数字部分之间的分隔符,然后使用merged.stack
重塑成“长”和改变” .time_1`列名‘年’
library(splitstackshape)
names(market) <- sub("(\\d+)", "_\\1", names(market))
res <- merged.stack(market, var.stubs = c("roa", "lev"), sep="_")
setnames(res, ".time_1", "year")
res
# stkcd type year roa lev
#1: 1 1 2013 2 6
#2: 1 1 2014 4 NA
#3: 1 1 2016 NA 8
#4: 2 0 2013 3 7
#5: 2 0 2014 5 NA
#6: 2 0 2016 NA 9
答
1.使用重塑{}统计,
library(data.table)
market <- data.table(
stkcd=c(1,2),
type =c(1,0),
roa2013=c(2,3),
roa2014=c(4,5),
lev2013=c(6,7),
lev2016=c(8,9))
market[,`:=`(roa2016=NA,lev2014=NA)]
long <- reshape(market,
idvar = "stkcd",
varying = c("roa2013","lev2013",
"roa2014","lev2014",
"roa2016","lev2016"),
sep = "",
timevar = "year",
direction = "long")
setorder(long,stkcd,year)
long
# stkcd type year roa lev
# 1: 1 1 2013 2 6
# 2: 1 1 2014 4 NA
# 3: 1 1 2016 NA 8
# 4: 2 0 2013 3 7
# 5: 2 0 2014 5 NA
# 6: 2 0 2016 NA 9
2.str_extract海峡
library(data.table)
library(stringr)
market <- data.table(
stkcd=c(1,2),
type =c(1,0),
roa2013=c(2,3),
roa2014=c(4,5),
lev2013=c(6,7),
lev2016=c(8,9))
market
long <- melt(market,
id.vars = c("stkcd","type"))
long[,`:=`(year=str_extract(variable,pattern = "[0-9]{4}"),
vars=str_extract(variable,pattern = "[a-zA-Z]{1,}"))][,variable:=NULL]
long <- dcast(long, stkcd + type + year ~ vars, value.var = "value")
long
# stkcd type year lev roa
# 1: 1 1 2013 6 2
# 2: 1 1 2014 NA 4
# 3: 1 1 2016 8 NA
# 4: 2 0 2013 7 3
# 5: 2 0 2014 NA 5
# 6: 2 0 2016 9 NA
...
对于命名的“年“的值,请参阅[在融化使用模式之后将'变量'列的数值表示转换为原始字符串](https://stackoverflow.com/questions/41883573/convert-numeric-representation-of-variable-column-to-original -string-以下)。 – Henrik
谢谢。我会尝试重塑{统计数据}。 – Cheng