使用Python来提取的XML信息,输出作为一个列表

问题描述:

我想从这个XML文档中提取数据,并具有输出是一个列表:使用Python来提取的XML信息,输出作为一个列表

例如:

['10-Yard Fight (USA, Europe)', '1942 (Japan, USA)', .......] 

我可以只弄清楚如何制作许多独立的名单。

例如:

['10-Yard Fight (USA, Europe)'] 
['1942 (Japan, USA)'] 
[.......] 

XML示例:

<?xml version="1.0"?> 
<menu> 
<header> 
    <listname>Nintendo Entertainment System</listname> 
    <id>003</id> 
    <lastlistupdate>10/16/2014</lastlistupdate> 
    <listversion>1.1 Final</listversion> 
    <manufacturer>Nintendo</manufacturer> 
    <media> 
     <artwork></artwork> 
     <video></video> 
    </media> 
    <exporterversion>HyperList XML Exporter Version 1.3 Copywrite (c) 2009-2011 William Strong</exporterversion> 
</header> 
<game name="10-Yard Fight (USA, Europe)" index="true" image="1" id="0034232"> 
    <description>10-Yard Fight (USA, Europe)</description> 
    <cloneof></cloneof> 
    <crc>3D564757</crc> 
    <manufacturer>Nintendo</manufacturer> 
    <year>1985</year> 
    <genre>Football/Sports</genre> 
    <rating>HSRS - GA (General Audience)</rating> 
    <enabled>Yes</enabled> 
</game> 
<game name="1942 (Japan, USA)" index="" image=""> 
    <description>1942 (Japan, USA)</description> 
    <cloneof></cloneof> 
    <crc>171251E3</crc> 
    <manufacturer>Capcom</manufacturer> 
    <year>1986</year> 
    <genre>Shoot-&apos;Em-Up</genre> 
    <rating>HSRS - GA (General Audience)</rating> 
    <enabled>Yes</enabled> 
</game> 
<game name="1943 - The Battle of Midway (USA)" index="" image=""> 
    <description>1943 - The Battle of Midway (USA)</description> 
    <cloneof></cloneof> 
    <crc>12C6D5C7</crc> 
    <manufacturer>Capcom</manufacturer> 
    <year>1988</year> 
    <genre>Shoot-&apos;Em-Up</genre> 
    <rating>HSRS - GA (General Audience)</rating> 
    <enabled>Yes</enabled> 
</game> 
</menu> 

我的样品Python代码

from xml.dom import minidom 

def databaseGameExtraction(xml): 
    xmldoc = minidom.parse(xml) 
    games = xmldoc.getElementsByTagName('game') 
    for game in games: 
     romKey = game.attributes['name'] 
     roms = [romKey.value] 
     print(roms) 
    return roms 

databaseGameExtraction('Nintendo Entertainment System.xml') 

还,我希望得到的 'Nintendo娱乐系统' 的值是也返回。

在完美的世界中,当从另一个函数调用时,函数将返回列表形式的rom和列表形式的系统名称。

感谢,

  • 一个很初级编码器

我想你需要

roms = [] 

for game in games: 
    romKey = game.attributes['name'] 
    roms.append(romKey.value) 

print("all roms:", roms) 
+0

这工作太感谢 –

您需要从XML反复建立roms列表:

roms = [] 
for game in games: 
    rom_key = game.attributes['name'] 
    roms.append(rom_key.value) 

或更好写成list-comprehension

roms = [game.attributes['name'].value for game in games] 

您也可以提取“任天堂娱乐系统”使用:

xmldoc.getElementsByTagName('listname')[0].firstChild.data 

这给我们留下了:

from xml.dom import minidom 

def databaseGameExtraction(xml): 
    xmldoc = minidom.parse(xml) 
    roms = [game.attributes['name'].value 
      for game in xmldoc.getElementsByTagName('game')] 
    compagny = xmldoc.getElementsByTagName('listname')[0].childNodes[0].data 
    return roms, compagny 

roms, compagny = databaseGameExtraction('Nintendo Entertainment System.xml') 
print(compagny) 
print(roms) 
+0

谢谢,这是它的一部分,我无法得到“列表理解工作的示例代码,但得到了我想要的结果。 –

+0

@BenElder Woops。忘了使用'.value'从'minidom.Attr'对象中提取字符串。现在修复。 –