使用Python将XML转换为CSV

问题描述:

我使用Python将XML文件转换为CSV。我需要Testitemname标记中的内容作为CSV标题和Testvalue标记中的内容作为CSV中的值。有人可以帮我弄这个吗?使用Python将XML转换为CSV

示例XML文件(输入)

<sample:batch xmlns:sample="http://sample.com/schema/sampleimport"> 
    <sample:TestData> 
     <sample:Testitem> 
      <sample:TestitemName>Field1</sample:TestitemName> 
      <sample:Testvalue>1</sample:Testvalue> 
     </sample:Testitem> 
     <sample:Testitem> 
      <sample:TestitemName>Field2</sample:TestitemName> 
      <sample:Testvalue>Hi</sample:Testvalue> 
     </sample:Testitem> 
     <sample:Testitem> 
      <sample:TestitemName>Field3</sample:TestitemName> 
      <sample:Testvalue>1234</sample:Testvalue> 
     </sample:TestData> 
     <sample:TestData> 
     <sample:Testitem> 
      <sample:TestitemName>Field1</sample:TestitemName> 
      <sample:Testvalue>3</sample:Testvalue> 
     </sample:Testitem> 
     <sample:Testitem> 
      <sample:TestitemName>Field2</sample:TestitemName> 
      <sample:Testvalue>Hello</sample:Testvalue> 
     </sample:Testitem> 
     <sample:Testitem> 
      <sample:TestitemName>Field3</sample:TestitemName> 
      <sample:Testvalue>999</sample:Testvalue> 
     </sample:TestData> 

希望的CSV文件(输出)

Field1,Field2,Filed3 (Header field names) 
1,Hi,1234 (1st record) 
3,Hello,999 (2nd record) 
+0

你到目前为止试过了什么? – sgrg

BeautifulSoup可用于解析XML数据。使用组织良好的数据,您只需循环嵌套标签类型并随时收集数据。

代码:

from BeautifulSoup import BeautifulSoup as Soup 

def parse_xml(file_like): 
    data = [] 
    names = [] 
    soup = Soup(file_like) 
    for batch in soup.findAll('sample:batch'): 
     for test_data in batch.findAll('sample:testdata'): 
      item = {} 
      for test_item in test_data.findAll('sample:testitem'): 
       name = test_item.find('sample:testitemname').text 
       value = test_item.find('sample:testvalue').text 
       item[name] = value 
       if name not in names: 
        names.append(name) 
      data.append(item) 

    return [names] + [[datum.get(name) for name in names] for datum in data] 

测试代码:

data = parse_xml(xml_data) 
for datum in data: 
    print(','.join(datum)) 

测试数据:

from io import StringIO 
xml_data = StringIO(u""" 
    <sample:batch xmlns:sample="http://sample.com/schema/sampleimport"> 
     <sample:TestData> 
      <sample:Testitem> 
       <sample:TestitemName>Field1</sample:TestitemName> 
       <sample:Testvalue>1</sample:Testvalue> 
      </sample:Testitem> 
      <sample:Testitem> 
       <sample:TestitemName>Field2</sample:TestitemName> 
       <sample:Testvalue>Hi</sample:Testvalue> 
      </sample:Testitem> 
      <sample:Testitem> 
       <sample:TestitemName>Field3</sample:TestitemName> 
       <sample:Testvalue>1234</sample:Testvalue> 
     </sample:TestData> 
     <sample:TestData> 
      <sample:Testitem> 
       <sample:TestitemName>Field1</sample:TestitemName> 
       <sample:Testvalue>3</sample:Testvalue> 
      </sample:Testitem> 
      <sample:Testitem> 
       <sample:TestitemName>Field2</sample:TestitemName> 
       <sample:Testvalue>Hello</sample:Testvalue> 
      </sample:Testitem> 
      <sample:Testitem> 
       <sample:TestitemName>Field3</sample:TestitemName> 
       <sample:Testvalue>999</sample:Testvalue> 
      </sample:TestItem> 
     </sample:TestData> 
    </sample:batch> 
""") 

结果:

Field1,Field2,Field3 
1,Hi,1234 
3,Hello,999 
+0

谢谢史蒂芬,它工作!我想将输出写入CSV文件。你能再帮我一次吗? – Santhosh

+0

我显示的输出是CSV ...只需写入文件而不是打印到屏幕上 –