lxml：将命名空间添加到输入文件

问题描述：

我解析由外部program生成的xml文件。然后我想使用我自己的命名空间将自定义注释添加到此文件。我输入看起来如下：lxml：将命名空间添加到输入文件

<sbml xmlns="http://www.sbml.org/sbml/level2/version4" xmlns:celldesigner="http://www.sbml.org/2001/ns/celldesigner" level="2" version="4"> 
    <model metaid="untitled" id="untitled"> 
    <annotation>...</annotation> 
    <listOfUnitDefinitions>...</listOfUnitDefinitions> 
    <listOfCompartments>...</listOfCompartments> 
    <listOfSpecies> 
     <species metaid="s1" id="s1" name="GenA" compartment="default" initialAmount="0"> 
     <annotation> 
      <celldesigner:extension>...</celldesigner:extension> 
     </annotation> 
     </species> 
     <species metaid="s2" id="s2" name="s2" compartment="default" initialAmount="0"> 
     <annotation> 
      <celldesigner:extension>...</celldesigner:extension> 
     </annotation> 
     </species> 
    </listOfSpecies> 
    <listOfReactions>...</listOfReactions> 
    </model> 
</sbml>

问题在于LXML只有声明命名空间的时候都使用，这意味着声明重复了很多次，像这样（简化）：

<sbml xmlns="namespace" xmlns:celldesigner="morenamespace" level="2" version="4"> 
    <listOfSpecies> 
    <species> 
     <kjw:test xmlns:kjw="http://this.is.some/custom_namespace"/> 
     <celldesigner:data>Some important data which must be kept</celldesigner:data> 
    </species> 
    <species> 
     <kjw:test xmlns:kjw="http://this.is.some/custom_namespace"/> 
    </species> 
    .... 
    </listOfSpecies> 
</sbml>

是它可能强制lxml只在一个父元素中写入该声明一次，如sbml或listOfSpecies？或者有没有这样做的好理由？我想要的结果将是：

<sbml xmlns="namespace" xmlns:celldesigner="morenamespace" level="2" version="4" xmlns:kjw="http://this.is.some/custom_namespace"> 
    <listOfSpecies> 
    <species> 
     <kjw:test/> 
     <celldesigner:data>Some important data which must be kept</celldesigner:data> 
    </species> 
    <species> 
     <kjw:test/> 
    </species> 
    .... 
    </listOfSpecies> 
</sbml>

重要的问题是，这是从文件中读取现有的数据必须保持，所以我不能只是做一个新的根元素（我想？）。

编辑：代码附在下面。

def annotateSbml(sbml_input): 
    from lxml import etree 

    checkSbml(sbml_input) # Makes sure the input is valid sbml/xml. 

    ns = "http://this.is.some/custom_namespace" 
    etree.register_namespace('kjw', ns) 

    sbml_doc = etree.ElementTree() 
    root = sbml_doc.parse(sbml_input, etree.XMLParser(remove_blank_text=True)) 
    nsmap = root.nsmap 
    nsmap['sbml'] = nsmap[None] # Makes code more readable, but seems ugly. Any alternatives to this? 
    nsmap['kjw'] = ns 
    ns = '{' + ns + '}' 
    sbmlns = '{' + nsmap['sbml'] + '}' 

    for species in root.findall('sbml:model/sbml:listOfSpecies/sbml:species', nsmap): 
    species.append(etree.Element(ns + 'test')) 

    sbml_doc.write("test.sbml.xml", pretty_print=True, xml_declaration=True) 

    return

显示你的代码。 – Marcin 2012-07-05 14:36:18

@Marcin：完成。有小费吗？ – kai 2012-07-05 16:06:25

@ mzjin我的输入包含除''标签以外的所有内容。目的是为这个列表中的每个物种插入这样的标签（或类似的，例如'kjw：score'或'kjw：length'）。这是否有意义，还是应该发布整个文件（认为我的原始问题足够长）？ – kai 2012-07-05 16:30:44

答

修改节点的命名空间映射是不可能的LXML。请参阅this open ticket将此功能作为心愿单项目。

它起源于lxml邮件列表上的this thread，其中workaround replacing the root node作为替代方案。尽管取代根节点存在一些问题：请参阅上面的票证。

我会在这里把建议根部替换解决办法代码的完整性：

>>> DOC = """<sbml xmlns="http://www.sbml.org/sbml/level2/version4" xmlns:celldesigner="http://www.sbml.org/2001/ns/celldesigner" level="2" version="4"> 
... <model metaid="untitled" id="untitled"> 
...  <annotation>...</annotation> 
...  <listOfUnitDefinitions>...</listOfUnitDefinitions> 
...  <listOfCompartments>...</listOfCompartments> 
...  <listOfSpecies> 
...  <species metaid="s1" id="s1" name="GenA" compartment="default" initialAmount="0"> 
...   <annotation> 
...   <celldesigner:extension>...</celldesigner:extension> 
...   </annotation> 
...  </species> 
...  <species metaid="s2" id="s2" name="s2" compartment="default" initialAmount="0"> 
...   <annotation> 
...   <celldesigner:extension>...</celldesigner:extension> 
...   </annotation> 
...  </species> 
...  </listOfSpecies> 
...  <listOfReactions>...</listOfReactions> 
... </model> 
... </sbml>""" 
>>> 
>>> from lxml import etree 
>>> from StringIO import StringIO 
>>> NS = "http://this.is.some/custom_namespace" 
>>> tree = etree.ElementTree(element=None, file=StringIO(DOC)) 
>>> root = tree.getroot() 
>>> nsmap = root.nsmap 
>>> nsmap['kjw'] = NS 
>>> new_root = etree.Element(root.tag, nsmap=nsmap) 
>>> new_root[:] = root[:] 
>>> new_root.append(etree.Element('{%s}%s' % (NS, 'test'))) 
>>> new_root.append(etree.Element('{%s}%s' % (NS, 'test'))) 

>>> print etree.tostring(new_root, pretty_print=True) 
<sbml xmlns:celldesigner="http://www.sbml.org/2001/ns/celldesigner" xmlns:kjw="http://this.is.some/custom_namespace" xmlns="http://www.sbml.org/sbml/level2/version4"><model metaid="untitled" id="untitled"> 
    <annotation>...</annotation> 
    <listOfUnitDefinitions>...</listOfUnitDefinitions> 
    <listOfCompartments>...</listOfCompartments> 
    <listOfSpecies> 
     <species metaid="s1" id="s1" name="GenA" compartment="default" initialAmount="0"> 
     <annotation> 
      <celldesigner:extension>...</celldesigner:extension> 
     </annotation> 
     </species> 
     <species metaid="s2" id="s2" name="s2" compartment="default" initialAmount="0"> 
     <annotation> 
      <celldesigner:extension>...</celldesigner:extension> 
     </annotation> 
     </species> 
    </listOfSpecies> 
    <listOfReactions>...</listOfReactions> 
    </model> 
<kjw:test/><kjw:test/></sbml>

为了将来的参考，这需要一个小的修改（至少在Python 3.2上），否则当它命中'None：'命名空间'时，由'** root.nsmap'给出一个TypeError，因为None不是一个字符串。使用'nsmap = root.nsmap;'nsmap ['kjw'] = NS;''new_root = etree.Element（root.tag，nsmap = nsmap）;'工作。 – kai 2012-07-05 20:25:09

很好的捕获，更新 – jterrace 2012-07-05 20:27:41

你还需要复制attrib，文本和（不太可能，但只是为了complete）尾巴。 'nsmap = dict（kjw = NS，nsmap = nsmap））'是错误的;它应该只是'nsmap = nsmap' – jfs 2012-07-05 20:50:35

答

您可以替换根元素以将'kjw'添加到其nsmap。然后，xmlns声明将只在根元素中。

答

，而不是直接与原始的XML处理，你也可以目光投向LibSBML，图书馆与语言绑定操作SBML文件，其中包括python。在那里你可以这样使用它：

 
>>> from libsbml import * 
>>> doc = readSBML('Dropbox/SBML Models/BorisEJB.xml') 
>>> species = doc.getModel().getSpecies('MAPK') 
>>> species.appendAnnotation('<kjw:test xmlns:kjw="http://this.is.some/custom_namespace"/>') 
0 
>>> species.toSBML() 
'<species id="MAPK" compartment="compartment" initialConcentration="280" boundaryCondition="false">\n <annotation>\n 
<kjw:test xmlns:kjw="http://this.is.some/custom_namespace"/>\n </annotation>\n</species>' 
>>>

答

如果临时向名称节点添加一个名称空间属性，那么这个技巧就行了。

ns = '{http://this.is.some/custom_namespace}' 

# add 'kjw:foobar' attribute to root node 
root.set(ns+'foobar', 'foobar') 

# add kjw namespace elements (or attributes) elsewhere 
... get child element species ... 
species.append(etree.Element(ns + 'test')) 

# remove temporary namespaced attribute from root node 
del root.attrib[ns+'foobar']

答

我知道这是老问题，但它仍然有效，为LXML 3.5.0的，有可能是更好的解决了这个问题：

cleanup_namespaces()接受新参数top_nsmap是定义移动提供的前缀名称空间映射到树的顶部。

所以，现在的命名空间地图可以用简单的通话将上升到这一点：

nsmap = {'kjw': 'http://this.is.some/custom_namespace'} 
etree.cleanup_namespaces(root, top_nsmap=nsmap)

lxml：将命名空间添加到输入文件

相关推荐