生成和蟒蛇

问题描述：

应用的diff是否有蟒蛇的“外的即用”的方式来产生一个差异列表两个文本之间，然后将这种差异，以一个文件来获取其他的，以后呢？生成和蟒蛇

我想保持文本的修订历史记录，但我不希望保存整个文本的每个版本，如果有只是一个单一的编辑行。我看着difflib，但我看不到如何生成只是编辑的行的列表，仍然可以用来修改一个文本来获取另一个文本。

答

你看看从谷歌DIFF匹配补丁？大致google Docs使用这套算法。它不仅包含diff模块，还包含修补程序模块，因此您可以使用旧文件和差异文件生成最新的文件。

包含一个python版本。

http://code.google.com/p/google-diff-match-patch/

正是我在找的东西！我试着用google搜索“python”，“diff”，“patch”，“revision”的不同组合，但还没有找到。 – noio 2010-02-22 13:08:36

谷歌差异匹配补丁似乎存储整个文件。它将所有元素都保存在元组中：（0，'stuff'）表示'stuff'出现在两个字符串中。该系统非常简单，它存储了字面上的每个字符，以便它可以遍历它们并根据需要修改文本。 – Paragon 2012-05-05 00:54:05

我如何使用这个API与Python>？如果可以用例子 – qre0ct 2013-05-15 04:47:32

答

是否必须是python解决方案？
我对解决方案的第一个想法是使用版本控制系统（Subversion，Git等）或者对于unix系统标准的diff/patch实用程序，或者对于基于Windows的系统是cygwin的一部分。

它必须是纯粹的python解决方案，因为我想将它部署在AppEngine中。 'diff' /'patch'会很理想，但是在python中。 – noio 2010-02-22 11:24:48

请注意，这种计算速度通常较慢，因此可能会在较低级别上进行更深入的放大！ – Pithikos 2016-10-10 19:05:28

答

据我所知大多数差异算法使用简单Longest Common Subsequence匹配，找到两个文本，无论是离开被认为是差的公共部分。编写自己的动态编程算法来完成python中的代码不应该太困难，上面的维基百科页面也提供了该算法。

答

是否difflib.unified_diff想得到你想要什么？有一个例子here。

投票你的答案。内置的difflib看起来很强大，但有点令人困惑，只是学习曲线的问题。在这里看到我的类似帖子：http://stackoverflow.com/questions/4743359/python-difflib-deltas-and-compare-ndiff/4743621#4743621 – NealWalters 2011-01-21 16:38:21

该库没有办法应用'difflib.unified_diff'的输出。它有'diff'，但没有'patch'。因此，如果你想保持在python中，'difflib.unified_diff'是没有用的。 – 2016-01-07 05:23:52

答

也许你可以使用unified_diff生成一个文件差异列表。只有文件中已更改的文本可以写入新的文本文件，以便将来参考。这是帮助您仅将差异写入新文件的代码。我希望这是你要求的！

diff = difflib.unified_diff(old_file, new_file, lineterm='') 
    lines = list(diff)[2:] 
    # linesT = list(diff)[0:3] 
    print (lines[0]) 
    added = [lineA for lineA in lines if lineA[0] == '+'] 


    with open("output.txt", "w") as fh1: 
    for line in added: 
     fh1.write(line) 
    print '+',added 
    removed = [lineB for lineB in lines if lineB[0] == '-'] 
    with open("output.txt", "a") as fh1: 
    for line in removed: 
     fh1.write(line) 
    print '-',removed

在您的代码中使用此选项可仅保存差异输出！

答

我已经实现了一个纯Python功能适用差异补丁，以恢复或者输入字符串，我希望有人发现它是有用的。它使用分析Unified diff format。

import re 

_hdr_pat = re.compile("^@@ -(\d+),?(\d+)? \+(\d+),?(\d+)? @@$") 

def apply_patch(s,patch,revert=False): 
    """ 
    Apply unified diff patch to string s to recover newer string. 
    If revert is True, treat s as the newer string, recover older string. 
    """ 
    s = s.splitlines(True) 
    p = patch.splitlines(True) 
    t = '' 
    i = sl = 0 
    (midx,sign) = (1,'+') if not revert else (3,'-') 
    while i < len(p) and p[i].startswith(("---","+++")): i += 1 # skip header lines 
    while i < len(p): 
    m = _hdr_pat.match(p[i]) 
    if not m: raise Exception("Cannot process diff") 
    i += 1 
    l = int(m.group(midx))-1 + (m.group(midx+1) == '0') 
    t += ''.join(s[sl:l]) 
    sl = l 
    while i < len(p) and p[i][0] != '@': 
     if i+1 < len(p) and p[i+1][0] == '\\': line = p[i][:-1]; i += 2 
     else: line = p[i]; i += 1 
     if len(line) > 0: 
     if line[0] == sign or line[0] == ' ': t += line[1:] 
     sl += (line[0] != sign) 
    t += ''.join(s[sl:]) 
    return t

如果有标题行("--- ...\n","+++ ...\n")它跳过它们。如果我们有一个统一的diff串diffstr代表oldstr和newstr之间的差异：使用difflib（标准库的一部分）

# recreate `newstr` from `oldstr`+patch 
newstr = apply_patch(oldstr, diffstr) 
# recreate `oldstr` from `newstr`+patch 
oldstr = apply_patch(newstr, diffstr, True)

在Python可以生成两个字符串的统一差异：

import difflib 
_no_eol = "\ No newline at end of file" 

def make_patch(a,b): 
    """ 
    Get unified string diff between two strings. Trims top two lines. 
    Returns empty string if strings are identical. 
    """ 
    diffs = difflib.unified_diff(a.splitlines(True),b.splitlines(True),n=0) 
    try: _,_ = next(diffs),next(diffs) 
    except StopIteration: pass 
    return ''.join([d if d[-1] == '\n' else d+'\n'+_no_eol+'\n' for d in diffs])

在UNIX上：diff -U0 a.txt b.txt

代码是在GitHub这里用ASCII和随机Unicode字符测试一起：https://gist.github.com/noporpoise/16e731849eb1231e86d78f9dfeca3abc

相关推荐