两个字符串之间不同的行提取字符
问题描述:
我在一个数据框中有两列字符串,并且对于每一行我都想看到不同的字符。两个字符串之间不同的行提取字符
E.g给出
Lines <- "
a b
cat car
dog ding
cow haw"
df <- read.table(text = Lines, header = TRUE, as.is = TRUE)
回报
a b diff
cat car t
dog ding o
cow haw co
我见过
以及
其中返回一些整齐的解决方案,这将工作的各行(第一参考),或充当排聪明但不正是我想要的(第二参考)。
理想我想使用这样的事情:
Reduce(setdiff, strsplit(c(a, b), split = ""))
我想:
apply(df, function(a,b) Reduce(setdiff, strsplit(c(a, b), split = "")))
,但无济于事。
这怎么办?
p.s.我特别希望如果有可能做到这一点使用dplyr,但仅限于在年底的注意重复性显示格式上的原因
答
假设df
定义一个函数Diff
它接受字符串两个vecdors,运行setdiff他们和粘贴结果在一起,然后使用mapply
在将它们分解为单个字符之后在两列上运行。
Diff <- function(x, y) paste(setdiff(x, y), collapse = "")
transform(df, diff = mapply(Diff, strsplit(a, ""), strsplit(b, "")))
,并提供:
a b diff
1 cat car t
2 dog ding o
3 cow haw co
注:上面所用的输入df
是:
Lines <- "
a b
cat car
dog ding
cow haw"
df <- read.table(text = Lines, header = TRUE, as.is = TRUE)
答
下面是使用Map
另一个基R法。
diffList <- Map(setdiff, strsplit(dat[[1]], ""), strsplit(dat[[2]], ""))
diffList
[[1]]
[1] "t"
[[2]]
[1] "o"
[[3]]
[1] "c" "o"
您可以将其包装在sapply
中为您的数据返回一个字符向量。帧:
dat$charDiffs <-sapply(diffList, paste, collapse="")
返回
dat
a b charDiffs
1 cat car t
2 dog ding o
3 cow haw co
数据(从dput
)
dat <-
structure(list(a = c("cat", "dog", "cow"), b = c("car", "ding",
"haw")), .Names = c("a", "b"), row.names = c(NA, -3L), class = "data.frame")
答
从tidyverse
和stringr
溶液。
library(tidyverse)
library(stringr)
dt2 <- dt %>%
mutate(a_list = str_split(a, pattern = ""), b_list = str_split(b, pattern = "")) %>%
mutate(diff = map2(a_list, b_list, setdiff)) %>%
mutate(diff = map_chr(diff, ~paste(., collapse = ""))) %>%
select_if(~!is.list(.))
dt2
# A tibble: 3 x 3
a b diff
<chr> <chr> <chr>
1 cat car t
2 dog ding o
3 cow haw co
DATA
dt <- read.table(text = "a b
cat car
dog ding
cow haw",
header = TRUE, stringsAsFactors = FALSE)
答
使用dplyr
library(dplyr)
ff = data.frame(a = c("dog","chair","love"),b = c("dot","liar","over"),stringsAsFactors = F)
st = ff %>% mutate(diff = sapply(Map(setdiff,strsplit(a,""),strsplit(b,"")),paste,collapse = ""))
> st
a b diff
1 dog dot g
2 chair liar ch
3 love over l
你举的例子是不可重现。请考虑使用'dput'。例如,我们会查看您的列中是否实际存在字符向量或因素,这是造成混淆的常见原因。 – lmo