使用Python将XML转换为CSV
问题描述:
我使用Python将XML文件转换为CSV。我需要Testitemname
标记中的内容作为CSV
标题和Testvalue
标记中的内容作为CSV中的值。有人可以帮我弄这个吗?使用Python将XML转换为CSV
示例XML文件(输入)
<sample:batch xmlns:sample="http://sample.com/schema/sampleimport">
<sample:TestData>
<sample:Testitem>
<sample:TestitemName>Field1</sample:TestitemName>
<sample:Testvalue>1</sample:Testvalue>
</sample:Testitem>
<sample:Testitem>
<sample:TestitemName>Field2</sample:TestitemName>
<sample:Testvalue>Hi</sample:Testvalue>
</sample:Testitem>
<sample:Testitem>
<sample:TestitemName>Field3</sample:TestitemName>
<sample:Testvalue>1234</sample:Testvalue>
</sample:TestData>
<sample:TestData>
<sample:Testitem>
<sample:TestitemName>Field1</sample:TestitemName>
<sample:Testvalue>3</sample:Testvalue>
</sample:Testitem>
<sample:Testitem>
<sample:TestitemName>Field2</sample:TestitemName>
<sample:Testvalue>Hello</sample:Testvalue>
</sample:Testitem>
<sample:Testitem>
<sample:TestitemName>Field3</sample:TestitemName>
<sample:Testvalue>999</sample:Testvalue>
</sample:TestData>
希望的CSV文件(输出)
Field1,Field2,Filed3 (Header field names)
1,Hi,1234 (1st record)
3,Hello,999 (2nd record)
答
BeautifulSoup
可用于解析XML数据。使用组织良好的数据,您只需循环嵌套标签类型并随时收集数据。
代码:
from BeautifulSoup import BeautifulSoup as Soup
def parse_xml(file_like):
data = []
names = []
soup = Soup(file_like)
for batch in soup.findAll('sample:batch'):
for test_data in batch.findAll('sample:testdata'):
item = {}
for test_item in test_data.findAll('sample:testitem'):
name = test_item.find('sample:testitemname').text
value = test_item.find('sample:testvalue').text
item[name] = value
if name not in names:
names.append(name)
data.append(item)
return [names] + [[datum.get(name) for name in names] for datum in data]
测试代码:
data = parse_xml(xml_data)
for datum in data:
print(','.join(datum))
测试数据:
from io import StringIO
xml_data = StringIO(u"""
<sample:batch xmlns:sample="http://sample.com/schema/sampleimport">
<sample:TestData>
<sample:Testitem>
<sample:TestitemName>Field1</sample:TestitemName>
<sample:Testvalue>1</sample:Testvalue>
</sample:Testitem>
<sample:Testitem>
<sample:TestitemName>Field2</sample:TestitemName>
<sample:Testvalue>Hi</sample:Testvalue>
</sample:Testitem>
<sample:Testitem>
<sample:TestitemName>Field3</sample:TestitemName>
<sample:Testvalue>1234</sample:Testvalue>
</sample:TestData>
<sample:TestData>
<sample:Testitem>
<sample:TestitemName>Field1</sample:TestitemName>
<sample:Testvalue>3</sample:Testvalue>
</sample:Testitem>
<sample:Testitem>
<sample:TestitemName>Field2</sample:TestitemName>
<sample:Testvalue>Hello</sample:Testvalue>
</sample:Testitem>
<sample:Testitem>
<sample:TestitemName>Field3</sample:TestitemName>
<sample:Testvalue>999</sample:Testvalue>
</sample:TestItem>
</sample:TestData>
</sample:batch>
""")
结果:
Field1,Field2,Field3
1,Hi,1234
3,Hello,999
+0
谢谢史蒂芬,它工作!我想将输出写入CSV文件。你能再帮我一次吗? – Santhosh
+0
我显示的输出是CSV ...只需写入文件而不是打印到屏幕上 –
你到目前为止试过了什么? – sgrg