xpath到dic python,lxml和xml
问题描述:
有没有一种快速的方法,使用Python中的lxml的xpath将下面的xml转换为字典?或者其他有效的方式?xpath到dic python,lxml和xml
<rec item="1">
<tag name="atr1">random text</tag>
<tag name="atr2">random text</tag>
..................................
</rec>
<rec item="2">
<tag name="atr1">random text2</tag>
<tag name="atr2">random text2</tag>
..................................
</rec>
<rec item="3">
<tag name="atr1">random text3</tag>
<tag name="atr2">random text3</tag>
..................................
</rec>
需要字典这样的,或其他呈三角:
dic = [
{
'attr1':'random text',
'attr2':'random text'
},
{
'attr1':'random text2',
'attr2':'random text2'
},
{
'attr1':'random text3',
'attr2':'random text3'
}
]
答
您可以使用列表与词典一起理解Ÿ理解:
[{ tag.xpath('string(@name)') : tag.xpath('string()') for tag in record.xpath('tag')} for record in records.xpath('//rec')]
下面是一个完整的例子:
from lxml import etree as ET
xml = '''<records>
<rec item="1">
<tag name="atr1">random text</tag>
<tag name="atr2">random text</tag>
..................................
</rec>
<rec item="2">
<tag name="atr1">random text2</tag>
<tag name="atr2">random text2</tag>
..................................
</rec>
<rec item="3">
<tag name="atr1">random text3</tag>
<tag name="atr2">random text3</tag>
..................................
</rec>
</records>'''
records = ET.fromstring(xml)
rec_list = [{ tag.xpath('string(@name)') : tag.xpath('string()') for tag in rec.xpath('tag') } for rec in records.xpath('rec')]
print(rec_list)
输出
[{'atr1': 'random text', 'atr2': 'random text'}, {'atr1': 'random text2', 'atr2': 'random text2'}, {'atr1': 'random text3', 'atr2': 'random text3'}]
答
你可以试试下面的代码:
source = lxml.etree.fromstring('xml_source_is_here')
[{attr:text} for attr,text in zip(source.xpath('//tag/@name'), source.xpath('//tag/text()'))]
输出:
[{'atr1': 'random text'}, {'atr2': 'random text'},
{'atr1': 'random text2'}, {'atr2': 'random text2'},
{'atr1': 'random text3'}, {'atr2': 'random text3'}]
它的工作原理!现在我正在研究如何改进输出。正如我事先知道的属性名称(name =“attr1)可能更有效的方法将是具有以下结构: – bogumbiker
attribute_name = {'atr1','atr2'} attribute_values = [{'random text','随机文本'},{'随机文本2','随机文本2'},{'随机文本3','随机文本3'}] 但不确定它会带来什么价值? – bogumbiker