匹配词典集。最好的解决方案。 Python

问题描述：

给出两个字典列表，新的和旧的。字典表示这两个列表中的相同对象。我需要找到差异，并产生新的词典列表，其中只有来自新词典的对象和来自旧词典的更新属性。
例子：匹配词典集。最好的解决方案。 Python

list_new=[ 
      { 'id':1, 
       'name':'bob', 
       'desc': 'cool gay' 
       }, 

      { 'id':2, 
       'name':'Bill', 
       'desc': 'bad gay' 
       }, 

       { 'id':3, 
       'name':'Vasya', 
       'desc': None 
       }, 
     ] 

    list_old=[ 
      { 'id':1, 
       'name':'boby', 
       'desc': 'cool gay', 
       'some_data' : '12345' 
       }, 
      { 'id':2, 
       'name':'Bill', 
       'desc': 'cool gay', 
       'some_data' : '12345' 

       }, 
       { 'id':3, 
       'name':'vasya', 
       'desc': 'the man', 
       'some_data' : '12345' 
       }, 
       { 'id':4, 
       'name':'Elvis', 
       'desc': 'singer', 
       'some_data' : '12345' 
       }, 
      ]

所以..在那个例子我想产生新的列表，其中将从list_new唯一的新同性恋者与更新的数据。匹配id。所以Bob会变成Boby，Bill会变成同性恋，Vasya变成 - 男人。终结猫王必须缺席。

给我一个优雅的解决方案。用较少的迭代循环量。

有办法解决这个问题。哪个不是最好的：

def match_dict(new_list, old_list) 
    ids_new=[] 
    for item in new_list: 
      ids_new.append(item['id']) 
    result=[] 
    for item_old in old_medias: 
     if item_old['id'] in ids_new: 
      for item_new in new_list: 
       if item_new['id']=item_old['id'] 
        item_new['some_data']=item_old['some_data'] 
        result.append(item_new) 
    return result

我之所以质疑，是因为里面有循环循环。如果将有2000个物品的清单，则该过程将需要相同的时间。

只是想知道为什么你拼写'家伙''同性恋'？ – DTing 2011-03-09 21:49:08

请让猫王逗留:) – 2011-03-09 21:49:42

您是否从某处检索此列表？你可以使用__id__作为字典的关键字重构字典列表吗？ – 2011-03-09 21:51:44

答

步骤：

创建查找字典list_old通过ID
遍历list_new类型的字典创建针对每个如果合并字典它存在于旧的

代码：

def match_dict(new_list, old_list): 
    old = dict((v['id'], v) for v in old_list) 
    return [dict(d, **old[d['id']]) for d in new_list if d['id'] in old]

编辑：内部函数不正确地命名变量。

我喜欢这个解决方案。美丽。但它已经由科布拉斯给出。谢谢。 – Pol 2011-03-10 15:21:23

只有一个问题是它不保存新对象。这也可以来。但我没有提到它。 – Pol 2011-03-10 17:50:49

仅供参考，此函数不会返回与原始match_dict（）函数相匹配的结果。由于它的名单颠倒了。 – koblas 2011-03-14 16:40:34

答

在old_list，搜索辞典new_list用相同的ID每个字典，然后执行：old_dict.update(new_dict)

消除各new_dict，更新后，从new_list和循环之后的剩余，未使用的http://stardict.sourceforge.net/Dictionaries.php下载追加。

答

像这样的东西是你所需要的：

l = [] 
for d in list_old: 
    for e in list_new: 
     if e['id'] == d['id']: 
      l.append(dict(e, **d)) 
print l

关于如何合并字典读here。

答

如果您的顶级数据结构是字典而不是列表，那么您会好得多。那么这将是：

dict_new.update(dict_old)

但是，你确实有，试试这个：

result_list = [] 
for item in list_new: 
    found_item = [d for d in list_old if d["id"] == item["id"]] 
    if found_item: 
     result_list.append(dict(item, **found_item[0]))

这实际上仍然在循环中循环（内循环是在列表中的“隐藏”理解），所以它仍然是O（N ** 2）。在大型数据集上，将其转换为字典无疑会更快，然后将其更改为列表。

答

你可以做这样的事情：

def match_dict(new_list, old_list): 
    new_dict = dict((obj['id'], obj) for obj in new_list) 
    old_dict = dict((obj['id'], obj) for obj in old_list) 
    for k in new_dict.iterkeys(): 
     if k in old_dict: 
      new_dict[k].update(old_dict[k]) 
     else: 
      del new_dict[k] 
    return new_dict.values()

如果你正在做的这个时候，我会建议将数据存储为与ID为重点，而非列表字典，这样你就不必每次都进行转换。

编辑：以下示例显示如何将数据存储在字典中。

list_new = [{'desc': 'cool guy', 'id': 1, 'name': 'bob'}, {'desc': 'bad guy', 'id': 2, 'name': 'Bill'}, {'desc': None, 'id': 3, 'name': 'Vasya'}] 
# create a dictionary with the value of 'id' as the key 
dict_new = dict((obj['id'], obj) for obj in list_new) 
# now you can access entries by their id instead of having to loop through the list 
print dict_new[2] 
# {'id': 2, 'name': 'Bill', 'desc': 'bad guy'}

你是什么意思字典作为关键？我可以有一些文档链接吗？或看到一些例子？ – Pol 2011-03-10 14:59:59

答

不能完全得到它的一条线，但这里有一个简单的版本：

def match_new(new_list, old_list) : 
    ids = dict((item['id'], item) for item in new_list) 
    return [ids[item['id']] for item in old_list if item['id'] in ids]

答

不知道你的数据的制约，我会假设id是每个列表中独一无二的，你的列表只包含可变的类型（string，int，...），它们是可散列的。

# first index each list by id 
new = {item['id']: item for item in list_new} 
old = {item['id']: item for item in list_old} 

# now you can see which ids appeared in the new list 
created = set(new.keys())-set(old.keys()) 
# or which ids were deleted 
deleted = set(old.keys())-set(new.keys()) 
# or which ids exists in the 2 lists 
intersect = set(new.keys()).intersection(set(old.keys())) 

# using the same 'conversion to set' trick, 
# you can see what is different for each item 
diff = {id: dict(set(new[id].items())-set(old[id].items())) for id in intersect} 

# using your example data set, diff now contains the differences for items which exists in the two lists: 
# {1: {'name': 'bob'}, 2: {'desc': 'bad gay'}, 3: {'name': 'Vasya', 'desc': None}} 

# you can now add the new ids to this diff 
diff.update({id: new[id] for id in created}) 
# and get your data back into the original format: 
list_diff = [dict(data, **{'id': id}) for id,data in diff.items()]

这是使用python 3语法，但应该很容易地移植到Python 2

编辑：

new = dict((item['id'],item) for item in list_new) old = dict((item['id'],item) for item in list_old) created = set(new.keys())-set(old.keys()) deleted = set(old.keys())-set(new.keys()) intersect = set(new.keys()).intersection(set(old.keys())) diff = dict((id,dict(set(new[id].items())-set(old[id].items()))) for id in intersect) diff.update(dict(id,new[id]) for id in created)) list_diff = [dict(data, **{'id': id}) for id,data in diff.items()]

：这里是相同的代码为Python 2.5编写（注意如何在没有词典理解的情况下代码不易读）

是的id是唯一的。 Python 2.6 – Pol 2011-03-10 05:37:34

这很不错。有5个循环。但是x * 5小于x * x。如果x有时可以等于300.谢谢。 – Pol 2011-03-10 14:27:16

答

你可能喜欢这个。请看看，谢谢。

def match_dict(new_list, old_list): 
    id_new = [item_new.get("id") for item_new in list_new] 
    id_old = [item_old.get("id") for item_old in list_old] 

    for idx_old in id_old: 
     if idx_old in id_new: 
      list_new[id_new.index(idx_old)].update(list_old[id_old.index(idx_old)]) 

    return list_new 

from pprint import pprint 
pprint(match_dict(list_new, list_old))

输出：

[{'desc': 'cool gay', 'id': 1, 'name': 'boby', 'some_data': '12345'}, 
{'desc': 'cool gay', 'id': 2, 'name': 'Bill', 'some_data': '12345'}, 
{'desc': 'the man', 'id': 3, 'name': 'vasya', 'some_data': '12345'}]

答

[od for od in list_old if od['id'] in {nd['id'] for nd in list_new}]

这个不会更新新字典附带的其他数据。 – Pol 2011-03-10 07:33:34

匹配词典集。最好的解决方案。 Python

相关推荐