分割字符串递归
说我有文字是这样的:分割字符串递归
pattern = "This_is some word/expression I'd like to parse:intelligently(using special symbols-like '.')"
的挑战是如何使用单词分隔符从
c(" ","-","/","\\","_",":","(",")",".",",")
家人将其分割成单词。
期望的结果:
"This" "is" "some" "word" "expression" "I'd" "like" "to" "parse" "intelligently" "using" "special" "symbols" "like"
方法:
我可以用做sapply
或for
循环:
keywords = unlist(strsplit(pattern," "))
keywords = unlist(strsplit(keywords,"-"))
#等
问题:
但是什么解决方案使用Reduce(f, x, init, accummulate=TRUE)
?
您可以使用选项perl = TRUE
再拆标点符号或空间
> strsplit(pattern, '[[:punct:]]|[[:space:]]', perl = TRUE)
[[1]]
[1] "This" "is" "some" "word" "expression"
[6] "I" "d" "like" "to" "parse"
[11] "intelligently" "using" "special" "symbols" "like"
[16] ""
您不应该在这里需要Reduce
。你应该能够做到像下面这样:
splitters <- c(" ","/","\\","_",":","(",")",".",",","-") # dash should come last
pattern <- paste0("[", paste(splitters, collapse = ""), "]")
string <- "This_is some word/expression I'd like to parse:intelligently(using special symbols-like '.')"
strsplit(string, pattern)[[1]]
# [1] "This" "is" "some" "word"
# [5] "expression" "I'd" "like" "to"
# [9] "parse" "intelligently" "using" "special"
# [13] "symbols" "like" "'" "'"
注意,在一个正则表达式字符类-
应该摆在第一个或最后一个,所以我已经编辑相应的“分离器”的载体。此外,您可能希望在“模式”末尾添加+
,以防止您想将多个空格合并为一个空格。
@DavidArenburg,它更接近了。 – A5C1D2H2I1M1N2O1R2T1 2014-09-02 10:42:14
非常有帮助的情况下,需要添加自定义到其他答案 – 2014-09-02 10:49:10
为什么“短跑应该最后”的任何原因? – 2014-09-02 12:47:14
我会去(这将让"I'd"
在一起)
strsplit(pattern, "[^[:alnum:][:digit:]']")
## [[1]]
## [1] "This" "is" "some" "word" "expression" "I'd" "like" "to" "parse"
## [10] "intelligently" "using" "special" "symbols" "like" "'" "'"
的确非常优雅! – 2014-09-02 10:25:39
虽然... – 2014-09-02 10:34:21
其实并不介意“我”+“d”与“我会”。为了简单起见,我将在 – 2014-09-02 10:45:56