如何将字符串拆分为重复的子字符串

问题描述：

我有一些字符串，其中每个字符串都是一个或多个字符串的副本。例如：如何将字符串拆分为重复的子字符串

L = "hellohellohello" 
M = "good" 
N = "wherewhere" 
O = "antant"

我想这样的字符串分割成一个列表，以便每个元素只是有重复的部分。例如：

splitstring(L) ---> ["hello", "hello", "hello"] 
splitstring(M) ---> ["good"] 
splitstring(N) ---> ["where", "where"] 
splitstring(O) ---> ["ant", "ant"]

由于字符串每个字符长度大约为1000个字符，所以如果这个字符相当快，那也是很好的。

请注意，在我的情况下，重复都从字符串的开始处开始，并且它们之间没有间隔，所以它比在字符串中查找最大重复的一般问题要简单得多。

如何做到这一点？

看看这个[问题]（http://stackoverflow.com/questions/11090289/find-longest-repetitive-sequence-i n-a-string）我认为你正在寻找类似的东西？此外，这种方法给出的复杂度是O（n），所以它应该是非常快的按照您的要求。 –

@MridulKashyap我的问题非常简单，因为我的重复从字符串的开头开始，并且在它们之间没有任何间隔。 – eleanora

答

使用正则表达式来查找重复的话，那么只需要创建一个适当长度的列表：

def splitstring(string): 
    match= re.match(r'(.*?)(?:\1)*$', string) 
    word= match.group(1) 
    return [word] * (len(string)//len(word))

好主意，我也想过做这样的事情。 – alex

答

该方法我会用：

import re 

L = "hellohellohello" 
N = "good" 
N = "wherewhere" 

cnt = 0 
result = '' 
for i in range(1,len(L)+1): 
    if cnt <= len(re.findall(L[0:i],L)): 
     cnt = len(re.findall(L[0:i],L)) 
     result = re.findall(L[0:i],L)[0] 

print(result)

给出与相应的可变以下输出：

hello 
good 
where

答

尝试此。它不是缩减列表，而是专注于找到最短的模式，然后通过重复该模式适当的次数创建一个新列表。

def splitstring(s): 
    # searching the number of characters to split on 
    proposed_pattern = s[0] 
    for i, c in enumerate(s[1:], 1): 
     if proposed_pattern == s[i:(i+len(proposed_pattern))]: 
      # found it 
      break 
     else: 
      proposed_pattern += c 
    else: 
     print 'found no pattern' 
     exit(1) 
    # generating the list 
    n = len(proposed_pattern) 
    return [proposed_pattern]*(len(s)//n) 


if __name__ == '__main__': 
    L = 'hellohellohellohello' 
    print splitstring(L) # prints ['hello', 'hello', 'hello', 'hello']

我不知道这三件事中的任何一件，谢谢先生。我会测试这个并编辑 – BusyAnt

答

假设重复单词的长度大于1长这会工作：

a = "hellohellohello" 

def splitstring(string): 
    for number in range(1, len(string)): 
     if string[:number] == string[number:number+number]: 
      return string[:number] 
    #in case there is no repetition 
    return string 

splitstring(a)

它在“aabaab”上失败。 – cogitovita

答

#_*_ coding:utf-8 _*_ 
import re 
''' 
refer to the code of Gábor Erds below 
''' 

N = "wherewhere" 
cnt = 0 
result = '' 
countN = 0 
showresult = [] 

for i in range(1,len(N)+1): 
    if cnt <= len(re.findall(N[0:i],N)): 
     cnt = len(re.findall(N[0:i],N)) 
     result = re.findall(N[0:i],N)[0] 
     countN = len(N)/len(result) 
for i in range(0,countN): 
    showresult.append(result) 
print showresult

请为您的代码与[GáborErdős的帖子]（http://stackoverflow.com/a/38608219/5488275）不同之处添加说明。 –

如何将字符串拆分为重复的子字符串

相关推荐