xpath到dic python,lxml和xml

问题描述:

有没有一种快速的方法,使用Python中的lxml的xpath将下面的xml转换为字典?或者其他有效的方式?xpath到dic python,lxml和xml

<rec item="1"> 
    <tag name="atr1">random text</tag> 
    <tag name="atr2">random text</tag> 
    ..................................   
</rec> 
<rec item="2"> 
    <tag name="atr1">random text2</tag> 
    <tag name="atr2">random text2</tag> 
    ..................................   
</rec> 
<rec item="3"> 
    <tag name="atr1">random text3</tag> 
    <tag name="atr2">random text3</tag> 
    ..................................   
</rec> 

需要字典这样的,或其他呈三角:

dic = [ 
    {  
     'attr1':'random text', 
     'attr2':'random text' 
    }, 
    {  
     'attr1':'random text2', 
     'attr2':'random text2' 
    }, 
    {  
     'attr1':'random text3', 
     'attr2':'random text3' 
    } 
] 

您可以使用列表与词典一起理解Ÿ理解:

[{ tag.xpath('string(@name)') : tag.xpath('string()') for tag in record.xpath('tag')} for record in records.xpath('//rec')] 

下面是一个完整的例子:

from lxml import etree as ET 
xml = '''<records> 
<rec item="1"> 
    <tag name="atr1">random text</tag> 
    <tag name="atr2">random text</tag> 
    ..................................   
</rec> 
<rec item="2"> 
    <tag name="atr1">random text2</tag> 
    <tag name="atr2">random text2</tag> 
    ..................................   
</rec> 
<rec item="3"> 
    <tag name="atr1">random text3</tag> 
    <tag name="atr2">random text3</tag> 
    ..................................   
</rec> 
</records>''' 
records = ET.fromstring(xml) 
rec_list = [{ tag.xpath('string(@name)') : tag.xpath('string()') for tag in rec.xpath('tag') } for rec in records.xpath('rec')] 
print(rec_list) 

输出

[{'atr1': 'random text', 'atr2': 'random text'}, {'atr1': 'random text2', 'atr2': 'random text2'}, {'atr1': 'random text3', 'atr2': 'random text3'}] 
+0

它的工作原理!现在我正在研究如何改进输出。正如我事先知道的属性名称(name =“attr1)可能更有效的方法将是具有以下结构: – bogumbiker

+0

attribute_name = {'atr1','atr2'} attribute_values = [{'random text','随机文本'},{'随机文本2','随机文本2'},{'随机文本3','随机文本3'}] 但不确定它会带来什么价值? – bogumbiker

你可以试试下面的代码:

source = lxml.etree.fromstring('xml_source_is_here') 
[{attr:text} for attr,text in zip(source.xpath('//tag/@name'), source.xpath('//tag/text()'))] 

输出:

[{'atr1': 'random text'}, {'atr2': 'random text'}, 
{'atr1': 'random text2'}, {'atr2': 'random text2'}, 
{'atr1': 'random text3'}, {'atr2': 'random text3'}]