提取文本形式与特定属性
问题描述:
我想写的XPath,将选择下div[@class="content"]
但p[position() > 1 and position() < last() - 1]
提取文本形式与特定属性
到目前为止,我有这个<h3>, <ul> and <p>
标签嵌套节点....
//div[@class="content"]/*[self::h3 or self::ul or self::p[position() > 1 and position() < last() - 1]]//text()
但它不没有工作。
答
确定你的XML并没有很好地形成,所以我固定的这首。
<?xml version="1.0" encoding="UTF-8"?>
<div class="content">
<h1/>
<h2>
<p>Certified Nursing Assistant - Full Time</p>
Job Summary</h2>
<p>Responsible for providing personal care and assistance for residents in long
term care facility.</p>
<h2>
</h2>
<h3>Essential Functions:</h3>
<ul>
<li>
<span style="line-height: 1.5;">Responsible</span> for providing
personal care and assistance to residents </li>
<li>Assist residents in and out of bed, dressing, feeding, grooming and
personal hygiene. </li>
<li>Provide basic treatments as required and directed by nursing staff.
</li>
<li>Responsible for observing and reporting changes in residents' physical
and emotional conditions to charge nurse. </li>
</ul>
<h3>Qualifications: </h3>
<p>Education:</p>
<ul>
<li>High school diploma or equivalent </li>
<li>Successful completion of state approved certified nursing assistance
course </li>
</ul>
<p>Experience:</p>
<ul>
<li>Previous health care related experience preferred </li>
</ul>
<a id="ctl00_ctl01_namelink" class="btn" href="employment-application.aspx?
positionid=34">Apply Online</a>
<br/>
<br/>
<h2>
Apply in Person</h2>
<p>
To apply in persion please stop by Shenandoah Medical Center to pick up a job
application.</p>
<h2>
Apply by Mail</h2>
<p>
To apply by mail, download and print <a target="_blank" href="/filesimages/Careers/SMC
Employment Application.pdf">
this form</a>. Please fill out the application and then mail to:<br/>
<br/>
<strong>Shenandoah Medical Center, Human Resources<br/>
</strong>300 Pershing Avenue<br/>
Shenandoah, IA 51601</p>
</div>
现在,如果我正确地理解你的问题,你想找到所有的H3,UL和p标签,这是DIV的子节点[=“内容” @类]每个选定子节点必须满足条件[position()> 1和position()< last() - 1]。对于这个我认为这个单一的XPATH将做到:
//div[@class="content"]/h3[position() > 1 and position() < last() - 1] |
//div[@class="content"]/p[position() > 1 and position() < last() - 1] |
//div[@class="content"]/ul[position() > 1 and position() < last() - 1]
我在'firebug + firepath'工作。你尝试过'import lxml.html'吗? – kev 2013-03-15 06:27:40