检测到2个字符串相同但顺序不同

问题描述：

我的目标是检测2个字符串是否相同，但顺序不同。检测到2个字符串相同但顺序不同

Example 
"hello world my name is foobar" is the same as "my name is foobar world hello"

我已经试过的是将两个字符串拆分成列表并在循环内进行比较。

text = "hello world my name is foobar" 
textSplit = text.split() 

pattern = "foobar is my name world hello" 
pattern = pattern.split() 

count = 0 
for substring in pattern: 
    if substring in textSplit: 
     count += 1 

if (count == len(pattern)): 
    print ("same string detected")

它返回我的意图，但这是真正正确和有效的方式？也许还有另一种方法。任何关于该主题的期刊建议都会非常好。

编辑1：重复的话是重要

text = "fish the fish the fish fish fish" 
pattern = "the fish"

它必须返回false

怎么样在哪里重复单词？是“鱼”还是“鱼鱼鱼鱼”一样呢？ –

'sorted（text）== sorted（pattern）'maybe？它效率不高，但实施起来相当容易。 – ozgur

如果dups不重要，'len（set（text）.difference（pattern））== 0' – Vinny

答

如果你想检查2句具有相同的话（与相同数量的出现次数的），你可以在单词的句子拆分，并比较str12的lenght对它们进行排序：

>>> sorted("hello world my name is foobar".split()) 
['foobar', 'hello', 'is', 'my', 'name', 'world'] 
>>> sorted("my name is foobar world hello".split()) 
['foobar', 'hello', 'is', 'my', 'name', 'world']

你可以在一个函数定义检查：

def have_same_words(sentence1, sentence2): 
    return sorted(sentence1.split()) == sorted(sentence2.split()) 

print(have_same_words("hello world my name is foobar", "my name is foobar world hello")) 
# True 

print(have_same_words("hello world my name is foobar", "my name is foobar world hello")) 
# True 

print(have_same_words("hello", "hello hello")) 
# False 

print(have_same_words("hello", "holle")) 
# False

如果情况并不重要，你可以比较小写的句子：

def have_same_words(sentence1, sentence2): 
    return sorted(sentence1.lower().split()) == sorted(sentence2.lower().split()) 

print(have_same_words("Hello world", "World hello")) 
# True

注意：您也可以使用collections.Counter而不是sorted。复杂性将是O(n)而不是O(n.log(n))，无论如何这并不是很大的差别。 import collections可能比排序字符串需要更长的时间：

from collections import Counter 

def have_same_words(sentence1, sentence2): 
    return Counter(sentence1.lower().split()) == Counter(sentence2.lower().split()) 

print(have_same_words("Hello world", "World hello")) 
# True 

print(have_same_words("hello world my name is foobar", "my name is foobar world hello")) 
# True 

print(have_same_words("hello", "hello hello")) 
# False 

print(have_same_words("hello", "holle")) 
# False

谢谢。它按预期工作。您能否总结或链接我是什么复杂性，最坏的情况以及为什么/如何将复杂度定义为O（n）。这将是非常有帮助的。 –

排序是'O（n.log（n）'，计数是'O（n）'。除了：考虑到句子的大小，我们不应该在乎复杂性。 –

看起来像我需要开始弄清楚那些符号是什么哈哈。 –

答

，你可以从每个字符串列表，并计算出它们之间的串路口;如果它的长度与第一个长度相同，那么它们是相同的。

text = "hello world my name is foobar" 
pattern = "foobar is my name world hello" 
text = text.split(" ") 
pattern = pattern.split(" ") 
result = True 
if len(text) != len(pattern): 
    result = false 
else: 
    l = list(set(text) & set(pattern)) 
    if len(l)!=len(text): 
     result = False 
if result == True: 
    print ("same string detected") 
else: 
    print ("Not the same string")

你需要警惕你的长度检查那里......如果len（l）！= len（文本）' - 因为'l'已删除重复项，那么'text'有重复的词 - 这个检查不会发生可靠...... –

'set（text）'和'set（pattern）'删除重复项 –

答

我想你的实现然后文本中的额外单词被忽略（也许这是有意？）。

也就是说，如果text = "a b"和pattern = "a"然后你打印"same string detected"

我会做到这一点：比较，其中顺序无关紧要让我想起sets。因此，与集的解决办法是：

same = set(text.split()) == set(pattern.split())

编辑：考虑到重复的文字编辑的问题：

from collections import Counter 
split_text = text.split() 
split_pattern = pattern.split() 
same = (Counter(split_text) == Counter(split_pattern))

你的解决方案认为''hello''和''hello hello''是相等的。目前尚不清楚这是否是理想的行为。 –

@Eric在这种情况下，一个'set'换掉'collections.Counter' ... –

问题更新 –

答

您还可以从你想要的字符串做一个新的字符串str12比较。然后用2 *（str12不重复）

str1 = "hello world my name is foobar" 
str2 = "my name is foobar world hello" 


str12 = (str1 + " " +str2).split(" ") 

str12_remove_duplicate = list(set(str12)) 

if len(str12) == 2 * len(str12_remove_duplicate): 
    print("String '%s' and '%s' are SAME but different order" % (str1, str2)) 
else: 
    print("String '%s' and '%s' are NOT SAME" % (str1, str2))

检测到2个字符串相同但顺序不同

相关推荐