解析具有相同标记和不同属性的xml
我有一个使用Netconvert从osm创建的网络文件。根元素是具有不同属性的边缘。例如,在文件的第一部分,边缘的组织如下。解析具有相同标记和不同属性的xml
<edge id=":367367171_1" function="internal">
<lane id=":367367171_1_0" index="0" disallow="tram rail_urban rail rail_electric ship" speed="5.56" length="15.86" shape="7413.68,8096.43 7409.39,8098.94 7406.50,8098.93 7405.03,8096.39 7404.96,8091.32"/>
</edge>
<edge id=":367367171_2" function="internal">
<lane id=":367367171_2_0" index="0" disallow="tram rail_urban rail rail_electric ship" speed="5.56" length="9.40" shape="7413.68,8096.43 7412.34,8099.01 7410.83,8099.98 7409.14,8099.36 7407.28,8097.13"/>
</edge>
<edge id=":367367171_3" function="internal">
<lane id=":367367171_3_0" index="0" disallow="tram rail_urban rail rail_electric ship" speed="5.56" length="5.56" shape="7408.25,8091.65 7407.28,8097.13"/>
</edge>
<edge id=":367367171_4" function="internal">
<lane id=":367367171_4_0" index="0" disallow="tram rail_urban rail rail_electric ship" speed="5.56" length="5.69" shape="7408.25,8091.65 7408.69,8097.32"/>
</edge>
在第二部分的边缘,文件更改的属性,它看起来像下面
<edge id="102323265#13" from="1181188708" to="1181188720" priority="1" type="highway.cycleway">
<lane id="102323265#13_0" index="0" allow="bicycle" speed="5.56" length="1.96" width="1.00" shape="14310.67,8986.24 14309.63,8984.59"/>
</edge>
<edge id="102323265#2" from="2577245263" to="1721713370" priority="1" type="highway.cycleway" shape="14903.54,9214.01 14891.64,9210.58 14796.11,9178.46 14789.16,9175.24">
<lane id="102323265#2_0" index="0" allow="bicycle" speed="5.56" length="113.82" width="1.00" shape="14898.81,9213.21 14891.49,9211.10 14795.93,9178.98 14791.04,9176.72"/>
</edge>
<edge id="102323265#3" from="1721713370" to="1193980046" priority="1" type="highway.cycleway" shape="14789.16,9175.24 14783.34,9171.87 14779.91,9168.83 14776.75,9165.32">
<lane id="102323265#3_0" index="0" allow="bicycle" speed="5.56" length="9.86" width="1.00" shape="14786.63,9174.41 14783.01,9172.31 14779.55,9169.24 14778.85,9168.47"/>
</edge>
<edge id="102323265#4" from="1193980046" to="1193980047" priority="1" type="highway.cycleway" shape="14776.75,9165.32 14764.89,9151.27 14762.54,9144.61">
<lane id="102323265#4_0" index="0" allow="bicycle" speed="5.56" length="20.05" width="1.00" shape="14774.71,9163.77 14764.40,9151.55 14763.05,9147.72"/>
</edge>
<edge id="102323265#5" from="1193980047" to="1193980057" priority="1" type="highway.cycleway" shape="14762.54,9144.61 14760.31,9140.42 14753.93,9131.92 14749.20,9127.42 14743.90,9123.46 14738.81,9120.77 14731.67,9118.17 14707.61,9110.82">
<lane id="102323265#5_0" index="0" allow="bicycle" speed="5.56" length="60.21" width="1.00" shape="14760.51,9141.98 14759.82,9140.67 14753.49,9132.25 14748.82,9127.82 14743.57,9123.90 14738.55,9121.26 14731.49,9118.68 14710.43,9112.25"/>
</edge>
正如你所看到的,有该元素的边缘不同的属性。当我尝试使用下面的代码来访问元素,
for elem in netFile.iter(tag='edge'):
print(elem.attrib['from'])
我得到的'internal'
一个KeyError:'from'
当我改变的关键在于'function'
而不是'from'
,代码打印我多行,当它接近第一部分结束时,再次把我
KeyError: 'function'
。
我知道我必须有选择地遍历属性'from'
存在的边缘,但不知道如何继续。有人可以帮忙吗?
感谢
Python对字典get()方法是在这些情况下非常有用,因为它在一个关键并不在dict
发现返回None
。
for elem in netFile.iter(tag='edge'):
if elem.attrib.get('from'):
# from stuff
else:
# other stuff
你可以发现你是属性的存在等待处理的文件至极部分,例如:
# The !required! attributes for each part
part1_attributes = ["id", "function"]
part2_attributes = ["id", "from", "to", "priority", "type"]
for elem in netFile.iter(tag='edge'):
if all([attr in elem.attrib for attr in part1_attributes]):
# part 1
print("function: " + elem.attrib["function"])
elif all([attr in elem.attrib for attr in part2_attributes]):
# part 2
print("from: " + elem.attrib["from"])
else:
print("Unknown part found while parsing xml")
# or raise Exception("message...") or exit program etc.
如果其中一个边缘suddendly不包含的属性之一,这将对其进行排序并返回错误(或只是打印并继续),而不是返回None
,如gr1zzly be4r's answer。
该解决方案非常优雅,完全符合我的需求。谢谢 – Mechanic
谢谢,很高兴我可以帮助:) –
所以有更简单的方法来选择地在其中“从”属性存在边缘迭代,你可以使用下面的XPath来查找所有具有从边缘您已经标记了该LXML属性:
for e in root.xpath("//edge[@from]")
如果您要检查有多种属性,你可以使用和:
.xpath("//edge[@from and @function]")
感谢˚F或简单的解决方案。 – Mechanic