选择两个节点之间的兄弟节点
问题描述:
我必须收集所有类别名称及其下的所有div,并以'config-entry'开始。选择两个节点之间的兄弟节点
<h2>category 1</h2>
<div class='clear10'></div>
<div class='config-entry selected-block'>...</div>
<div class='config-entry '>...</div>
<div class='config-entry '>...</div>
<div class='config-entry '>...</div>
<h2>category 2</h2>
<div class='clear10'></div>
<div class='config-entry selected-block'>...</div>
<div class='config-entry '>...</div>
<div class='config-entry '>...</div>
<div class='config-entry '>...</div>
<div class='config-entry '>...</div>
<h2>category 3</h2>
<div class='clear10'></div>
<div class='config-entry selected-block'>...</div>
<div class='config-entry '>...</div>
<h2>category 4</h2>
<div class='clear10'></div>
<div class='config-entry selected-block'>...</div>
<div class='config-entry '>...</div>
<div class='config-entry '>...</div>
<div class='config-entry '>...</div>
我使用XPath //h2[1]/following-sibling::h2[1]/preceding-sibling::div[starts-with(@class,'config-entry')]
这样的:
categories = root.xpath("//h2")
for i in xrange(len(categories)):
print "----%s----" % categories[i].text
contents = root.xpath("//h2[1]/following-sibling::h2[1]/preceding-sibling::div[starts-with(@class,'config-entry')]")
print len(contents)
此代码仅适用于1类作品好选择第1类和2之间的所有div但砸了以后。我玩过h2[1]
,将它改为0,2,3,但没有具体。任何线索?
答
我建议使用h2
标签和div
标签,这将返回他们在文档顺序,然后当你处理它们,每个格“属于”的工会最后h2
你所看到的。
E.g.
'//h2|//div[contains(@class,"config-entry")]'
工作例如:
from lxml import etree
doc = etree.HTML("""
<html>
<h2>category 1</h2>
<div class='clear10'></div>
<div class='config-entry selected-block'>...</div>
<div class='config-entry '>...</div>
<div class='config-entry '>...</div>
<div class='config-entry '>...</div>
<h2>category 2</h2>
<div class='clear10'></div>
<div class='config-entry selected-block'>...</div>
<div class='config-entry '>...</div>
<div class='config-entry '>...</div>
<div class='config-entry '>...</div>
<div class='config-entry '>...</div>
<h2>category 3</h2>
<div class='clear10'></div>
<div class='config-entry selected-block'>...</div>
<div class='config-entry '>...</div>
<h2>category 4</h2>
<div class='clear10'></div>
<div class='config-entry selected-block'>...</div>
<div class='config-entry '>...</div>
<div class='config-entry '>...</div>
<div class='config-entry '>...</div>
</html>""")
category = None
for ele in doc.xpath('//h2|//div[contains(@class,"config-entry")]'):
if ele.tag == 'h2':
category = str(ele.text)
else:
if category:
print "%s: %s, %r" % (category,ele.tag,ele.attrib)
产量:
category 1: div, {'class': 'config-entry selected-block'}
category 1: div, {'class': 'config-entry '}
category 1: div, {'class': 'config-entry '}
category 1: div, {'class': 'config-entry '}
category 2: div, {'class': 'config-entry selected-block'}
category 2: div, {'class': 'config-entry '}
category 2: div, {'class': 'config-entry '}
category 2: div, {'class': 'config-entry '}
category 2: div, {'class': 'config-entry '}
category 3: div, {'class': 'config-entry selected-block'}
category 3: div, {'class': 'config-entry '}
category 4: div, {'class': 'config-entry selected-block'}
category 4: div, {'class': 'config-entry '}
category 4: div, {'class': 'config-entry '}
category 4: div, {'class': 'config-entry '}
+1那工作.. – jerrymouse 2012-03-28 10:12:42