如何解析分层XML字符串

问题描述：

我有一个XML字符串，我需要解析蟒，看起来像这样：如何解析分层XML字符串

<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/"> 
    <s:Body> 
     <PostLoadsResponse xmlns="http://webservices.truckstop.com/v11"> 
      <PostLoadsResult xmlns:a="http://schemas.datacontract.org/2004/07/WebServices.Objects" xmlns:i="http://www.w3.org/2001/XMLSchema-instance"> 
       <Errors xmlns="http://schemas.datacontract.org/2004/07/WebServices"> 
        <Error> 
         <ErrorMessage>Invalid Location</ErrorMessage> 
        </Error> 
       </Errors> 
      </PostLoadsResult> 
     </PostLoadsResponse> 
    </s:Body> 
</s:Envelope>'

我在使用xmltree去这棵树没有错误消息的麻烦是这样的：

import xml.etree.ElementTree as ET 
ET.fromstring(text).findall('{http://schemas.xmlsoap.org/soap/envelope/}Body')[0].getchildren()[0].getchildren()[0].getchildren()

答

使用部分XPath support：

ET.fromstring(text).find('.//{http://schemas.datacontract.org/2004/07/WebServices}ErrorMessage')

这将指示它找到网络第一个名称为ErrorMessage的元素，其名称空间为http://schemas.datacontract.org/2004/07/WebServices。

但是，它可能会更快使用类似

ET.fromstring(text).find('{http://schemas.xmlsoap.org/soap/envelope/}Body').find('{http://webservices.truckstop.com/v11}PostLoadsResponse').find('{http://webservices.truckstop.com/v11}PostLoadsResult').find('{http://schemas.datacontract.org/2004/07/WebServices}Errors').find('{http://schemas.datacontract.org/2004/07/WebServices}Error').find('{http://schemas.datacontract.org/2004/07/WebServices}ErrorMessage'

如果你知道你的消息总是包含这些元素。

答

你需要handle namespaces，你可以用xml.etree.ElementTree做到这一点：

tree = ET.fromstring(data) 

namespaces = { 
    's': 'http://schemas.xmlsoap.org/soap/envelope/', 
    'd': "http://schemas.datacontract.org/2004/07/WebServices" 
} 
print(tree.find(".//d:ErrorMessage", namespaces=namespaces).text)

打印Invalid Location。

答

您可以使用树上的getiterator方法遍历其中的项目。你可以检查每个项目上的tag，看它是否是正确的。

>>> err = [node.text for node in tree.getiterator() if node.tag.endswith('ErrorMessage')] 
>>> err 
['Invalid Location']

如何解析分层XML字符串

相关推荐