比较两个字符串并返回最相似的一个

问题描述：

我必须编写一个函数，它将一个字符串作为参数，并将此字符串比较为两个其他字符串，并返回最相似的字符串和差异数。比较两个字符串并返回最相似的一个

def func("LUMB"): 
    lst=["JIBM", "NUNE", "NUMB"] 
should return: 
("NUMB",1)

我曾尝试：

def f(word): 
    lst=["JIBM", "NUNE", "NUMB"] 
    for i in lst: 
     d=k(word, lst) 
     return differences 
     for n in d: 
      print min(sum(n))

其中：

def k(word1, word2): 
    L=[] 
    for w in range(len(word1)): 
     if word1[w] != word2[w]: 
      L.append(1) 
     else: 
      L.append(0) 
    return L

，使我得到如列表，[1,0,0,0]如果字1 =“NUMB “和word2 =”LUMB“

你见过[Text difference algorithm]（http://stackoverflow.com/questions/145607/text-difference-algorithm）和[用于模糊字符串比较的好Python模块]（http://stackoverflow.com/questions）/682367 /好-python-modules-for-fuzzy-string-comparison） – Chris 2011-12-15 11:21:26

很多答案都可以在这个链接上获得http://stackoverflow.com/questions/682367/good-python-modules-for-fuzzy -string-comparison – 2011-12-15 11:32:09

答

看起来Shawn Chin提供了最好的解决方案，但是如果您阻止使用非内置模块，则看起来好像get_close_matches从difflib可能帮助：

import difflib 
difflib.get_close_matches("LUMB", ["JIBM", "NUNE", "NUMB"], 1)

的差异的数目可以使用SequenceMatcher的get_opcodes方法，并用它的返回值来工作得到。

答

使用pylevenshtein计算Levenshtein distance：

>>> from Levenshtein import distance 
>>> from operator import itemgetter 
>>> lst = ["JIBM", "NUNE", "NUMB"] 
>>> min([(x, distance("LUMB", x)) for x in lst], key=itemgetter(1)) 
('NUMB', 1)

，或作为功能：

from Levenshtein import distance 
from operator import itemgetter 
def closest(word, lst): 
    return min([(x, distance(word, x)) for x in lst], key=itemgetter(1)) 

print closest("NUMB", ["JIBM", "NUNE", "NUMB"])

附：如果你想避免额外的依赖，你可以实现自己的函数来计算距离。例如，在wikibooks中提出了几个版本，每个版本都有自己的优缺点。

但是，如果性能是一个问题，请考虑坚持定制模块。除了pylevenshtein，还有python-levenshtein和nltk.metrics.distance（如果您碰巧已经使用NLTK）。

比较两个字符串并返回最相似的一个

相关推荐