匹配词典集。最好的解决方案。 Python
给出两个字典列表,新的和旧的。字典表示这两个列表中的相同对象。 我需要找到差异,并产生新的词典列表,其中只有来自新词典的对象和来自旧词典的更新属性。
例子:匹配词典集。最好的解决方案。 Python
list_new=[
{ 'id':1,
'name':'bob',
'desc': 'cool gay'
},
{ 'id':2,
'name':'Bill',
'desc': 'bad gay'
},
{ 'id':3,
'name':'Vasya',
'desc': None
},
]
list_old=[
{ 'id':1,
'name':'boby',
'desc': 'cool gay',
'some_data' : '12345'
},
{ 'id':2,
'name':'Bill',
'desc': 'cool gay',
'some_data' : '12345'
},
{ 'id':3,
'name':'vasya',
'desc': 'the man',
'some_data' : '12345'
},
{ 'id':4,
'name':'Elvis',
'desc': 'singer',
'some_data' : '12345'
},
]
所以..在那个例子我想产生新的列表,其中将从list_new唯一的新同性恋者与更新的数据。匹配id
。所以Bob会变成Boby,Bill会变成同性恋,Vasya变成 - 男人。终结猫王必须缺席。
给我一个优雅的解决方案。用较少的迭代循环量。
有办法解决这个问题。哪个不是最好的:
def match_dict(new_list, old_list)
ids_new=[]
for item in new_list:
ids_new.append(item['id'])
result=[]
for item_old in old_medias:
if item_old['id'] in ids_new:
for item_new in new_list:
if item_new['id']=item_old['id']
item_new['some_data']=item_old['some_data']
result.append(item_new)
return result
我之所以质疑,是因为里面有循环循环。如果将有2000个物品的清单,则该过程将需要相同的时间。
步骤:
- 创建查找字典list_old通过ID
- 遍历list_new类型的字典创建针对每个如果合并字典它存在于旧的
代码:
def match_dict(new_list, old_list):
old = dict((v['id'], v) for v in old_list)
return [dict(d, **old[d['id']]) for d in new_list if d['id'] in old]
编辑:内部函数不正确地命名变量。
在old_list,搜索辞典new_list用相同的ID每个字典,然后执行:old_dict.update(new_dict)
消除各new_dict,更新后,从new_list和循环之后的剩余,未使用的http://stardict.sourceforge.net/Dictionaries.php下载追加。
像这样的东西是你所需要的:
l = []
for d in list_old:
for e in list_new:
if e['id'] == d['id']:
l.append(dict(e, **d))
print l
关于如何合并字典读here。
如果您的顶级数据结构是字典而不是列表,那么您会好得多。那么这将是:
dict_new.update(dict_old)
但是,你确实有,试试这个:
result_list = []
for item in list_new:
found_item = [d for d in list_old if d["id"] == item["id"]]
if found_item:
result_list.append(dict(item, **found_item[0]))
这实际上仍然在循环中循环(内循环是在列表中的“隐藏”理解),所以它仍然是O(N ** 2)。在大型数据集上,将其转换为字典无疑会更快,然后将其更改为列表。
你可以做这样的事情:
def match_dict(new_list, old_list):
new_dict = dict((obj['id'], obj) for obj in new_list)
old_dict = dict((obj['id'], obj) for obj in old_list)
for k in new_dict.iterkeys():
if k in old_dict:
new_dict[k].update(old_dict[k])
else:
del new_dict[k]
return new_dict.values()
如果你正在做的这个时候,我会建议将数据存储为与ID为重点,而非列表字典,这样你就不必每次都进行转换。
编辑:以下示例显示如何将数据存储在字典中。
list_new = [{'desc': 'cool guy', 'id': 1, 'name': 'bob'}, {'desc': 'bad guy', 'id': 2, 'name': 'Bill'}, {'desc': None, 'id': 3, 'name': 'Vasya'}]
# create a dictionary with the value of 'id' as the key
dict_new = dict((obj['id'], obj) for obj in list_new)
# now you can access entries by their id instead of having to loop through the list
print dict_new[2]
# {'id': 2, 'name': 'Bill', 'desc': 'bad guy'}
你是什么意思字典作为关键?我可以有一些文档链接吗?或看到一些例子? – Pol 2011-03-10 14:59:59
不能完全得到它的一条线,但这里有一个简单的版本:
def match_new(new_list, old_list) :
ids = dict((item['id'], item) for item in new_list)
return [ids[item['id']] for item in old_list if item['id'] in ids]
不知道你的数据的制约,我会假设id
是每个列表中独一无二的,你的列表只包含可变的类型(string,int,...),它们是可散列的。
# first index each list by id
new = {item['id']: item for item in list_new}
old = {item['id']: item for item in list_old}
# now you can see which ids appeared in the new list
created = set(new.keys())-set(old.keys())
# or which ids were deleted
deleted = set(old.keys())-set(new.keys())
# or which ids exists in the 2 lists
intersect = set(new.keys()).intersection(set(old.keys()))
# using the same 'conversion to set' trick,
# you can see what is different for each item
diff = {id: dict(set(new[id].items())-set(old[id].items())) for id in intersect}
# using your example data set, diff now contains the differences for items which exists in the two lists:
# {1: {'name': 'bob'}, 2: {'desc': 'bad gay'}, 3: {'name': 'Vasya', 'desc': None}}
# you can now add the new ids to this diff
diff.update({id: new[id] for id in created})
# and get your data back into the original format:
list_diff = [dict(data, **{'id': id}) for id,data in diff.items()]
这是使用python 3语法,但应该很容易地移植到Python 2
编辑:
new = dict((item['id'],item) for item in list_new)
old = dict((item['id'],item) for item in list_old)
created = set(new.keys())-set(old.keys())
deleted = set(old.keys())-set(new.keys())
intersect = set(new.keys()).intersection(set(old.keys()))
diff = dict((id,dict(set(new[id].items())-set(old[id].items()))) for id in intersect)
diff.update(dict(id,new[id]) for id in created))
list_diff = [dict(data, **{'id': id}) for id,data in diff.items()]
:这里是相同的代码为Python 2.5编写(注意如何在没有词典理解的情况下代码不易读)
你可能喜欢这个。请看看,谢谢。
def match_dict(new_list, old_list):
id_new = [item_new.get("id") for item_new in list_new]
id_old = [item_old.get("id") for item_old in list_old]
for idx_old in id_old:
if idx_old in id_new:
list_new[id_new.index(idx_old)].update(list_old[id_old.index(idx_old)])
return list_new
from pprint import pprint
pprint(match_dict(list_new, list_old))
输出:
[{'desc': 'cool gay', 'id': 1, 'name': 'boby', 'some_data': '12345'},
{'desc': 'cool gay', 'id': 2, 'name': 'Bill', 'some_data': '12345'},
{'desc': 'the man', 'id': 3, 'name': 'vasya', 'some_data': '12345'}]
[od for od in list_old if od['id'] in {nd['id'] for nd in list_new}]
这个不会更新新字典附带的其他数据。 – Pol 2011-03-10 07:33:34
只是想知道为什么你拼写'家伙''同性恋'? – DTing 2011-03-09 21:49:08
请让猫王逗留:) – 2011-03-09 21:49:42
您是否从某处检索此列表?你可以使用__id__作为字典的关键字重构字典列表吗? – 2011-03-09 21:51:44