PHP的XPath查询的标签从HREF得到规范字符
问题描述:
<a href="http://www.example.com/5809/book>Origin of Species</a>
<a href="http://www.example.com/author/id=124>Darwin</a>
<a href="http://www.example.com/196/genres>Science, Biology</a>
<span class="Xbkznofv">24/11/1859</span>
我该如何在标签上使用XPath查询ID号从HREF?
欲导致这样例如:
5809,124,196,24/11/1859
PHP代码
$url = 'http://www.example.com/Books/Default.aspx';
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTMLFile($url);
$xpath = new DOMXpath($doc);
$elements1 = $xpath->query('//a[contains(@href, "www.example.com/Book/")]');
$elements2 = $xpath->query('//a[contains(@href, "www.example.com/author/id=")]');
$elements3 = $xpath->query('//a[contains(@href, "www.example.com/genres/")]');
$elements4 = $xpath->query('//span[contains(@class, "")]');
if (!is_null($elements)) {
foreach ($elements as $element) {
echo "<br/>". "";
$nodes = $element->childNodes;
foreach ($nodes as $node) {
echo $node->nodeValue. "\n";
}
}
}
答
Xpath的1.0有一些有限的字符串操作,但是在某些时候,读取属性并使用正则表达式提取值会更加容易。
不过这里是使用XPath只是一个例子:
$html = <<<'HTML'
<a href="http://www.example.com/5809/book">Origin of Species</a>
<a href="http://www.example.com/author/id=124">Darwin</a>
<a href="http://www.example.com/196/genres">Science, Biology</a>
<span class="Xbkznofv">24/11/1859</span>
HTML;
$document = new DOMDocument();
$document->loadHtml($html);
$xpath = new DOMXpath($document);
$data = [
'book_title' => $xpath->evaluate(
'string(//a[contains(@href, "www.example.com") and contains(@href, "/book")])'
),
'book_id' => $xpath->evaluate(
'substring-before(
substring-after(
//a[contains(@href, "www.example.com") and contains(@href, "/book")]/@href,
"www.example.com/"
),
"/"
)'
),
'author_id' => $xpath->evaluate(
'substring-after(
//a[contains(@href, "www.example.com/author/id=")]/@href,
"/id="
)'
)
];
var_dump($data);
输出:
array(3) {
["book_title"]=>
string(17) "Origin of Species"
["book_id"]=>
string(4) "5809"
["author_id"]=>
string(3) "124"
}
这些表达将只与DOMXpath::evaluate()
工作,DOMXpath::query()
只能返回节点列表。
大多数情况下,您将使用一个表达式来获取节点列表,迭代它们并使用多个表达式来获取值。下面是一个简单的例子:
$html = <<<'HTML'
<div class="book">
<a href="#1">Origin of Species</a>
</div>
<div class="book">
<a href="#2">On the Shoulders of Giants</a>
</div>
HTML;
$document = new DOMDocument();
$document->loadHtml($html);
$xpath = new DOMXpath($document);
foreach ($xpath->evaluate('//div[@class="book"]') as $book) {
var_dump(
$xpath->evaluate('string(.//a)', $book),
$xpath->evaluate('string(.//a/@href)', $book)
);
}
输出:
string(17) "Origin of Species"
string(2) "#1"
string(26) "On the Shoulders of Giants"
string(2) "#2"
非常感谢你。还有一件事...如何使这个例子在foreach循环函数中为多个书籍,作者... 和结果是以逗号分隔的格式? –
DOMXpath :: evaluate()的第二个参数是上下文节点。你需要做一些像'$ xpath-> evaluate('string(.// a)',$ outerNode)''。 './/是当前上下文节点的后代。对于CSV写入,请查找'fputcsv()'。 – ThW
我刚开始学习。如果你可以写一个简单的例子...如果你不忙?这对我很重要。 –