PHP的XPath查询的标签从HREF得到规范字符

问题描述：

<a href="http://www.example.com/5809/book>Origin of Species</a> 
<a href="http://www.example.com/author/id=124>Darwin</a> 
<a href="http://www.example.com/196/genres>Science, Biology</a> 
<span class="Xbkznofv">24/11/1859</span>

我该如何在标签上使用XPath查询ID号从HREF？

欲导致这样例如：

5809，124，196，24/11/1859

PHP代码

$url = 'http://www.example.com/Books/Default.aspx'; 
libxml_use_internal_errors(true); 
$doc = new DOMDocument(); 
$doc->loadHTMLFile($url); 
$xpath = new DOMXpath($doc); 

$elements1 = $xpath->query('//a[contains(@href, "www.example.com/Book/")]'); 
$elements2 = $xpath->query('//a[contains(@href, "www.example.com/author/id=")]'); 
$elements3 = $xpath->query('//a[contains(@href, "www.example.com/genres/")]'); 
$elements4 = $xpath->query('//span[contains(@class, "")]'); 

if (!is_null($elements)) { 
    foreach ($elements as $element) { 
echo "<br/>". ""; 

$nodes = $element->childNodes; 
foreach ($nodes as $node) { 
    echo $node->nodeValue. "\n"; 
    } 
    } 
}

答

Xpath的1.0有一些有限的字符串操作，但是在某些时候，读取属性并使用正则表达式提取值会更加容易。

不过这里是使用XPath只是一个例子：

$html = <<<'HTML' 
<a href="http://www.example.com/5809/book">Origin of Species</a> 
<a href="http://www.example.com/author/id=124">Darwin</a> 
<a href="http://www.example.com/196/genres">Science, Biology</a> 
<span class="Xbkznofv">24/11/1859</span> 
HTML; 

$document = new DOMDocument(); 
$document->loadHtml($html); 
$xpath = new DOMXpath($document); 

$data = [ 
    'book_title' => $xpath->evaluate(
    'string(//a[contains(@href, "www.example.com") and contains(@href, "/book")])' 
), 
    'book_id' => $xpath->evaluate(
    'substring-before(
     substring-after(
     //a[contains(@href, "www.example.com") and contains(@href, "/book")]/@href, 
     "www.example.com/" 
    ), 
     "/" 
    )' 
), 
    'author_id' => $xpath->evaluate(
    'substring-after(
     //a[contains(@href, "www.example.com/author/id=")]/@href, 
     "/id=" 
    )' 
) 
]; 

var_dump($data);

输出：

array(3) { 
    ["book_title"]=> 
    string(17) "Origin of Species" 
    ["book_id"]=> 
    string(4) "5809" 
    ["author_id"]=> 
    string(3) "124" 
}

这些表达将只与DOMXpath::evaluate()工作，DOMXpath::query()只能返回节点列表。

大多数情况下，您将使用一个表达式来获取节点列表，迭代它们并使用多个表达式来获取值。下面是一个简单的例子：

$html = <<<'HTML' 
<div class="book"> 
    <a href="#1">Origin of Species</a> 
</div> 
<div class="book"> 
    <a href="#2">On the Shoulders of Giants</a> 
</div> 
HTML; 

$document = new DOMDocument(); 
$document->loadHtml($html); 
$xpath = new DOMXpath($document); 

foreach ($xpath->evaluate('//div[@class="book"]') as $book) { 
    var_dump(
    $xpath->evaluate('string(.//a)', $book), 
    $xpath->evaluate('string(.//a/@href)', $book) 
); 
}

输出：

string(17) "Origin of Species" 
string(2) "#1" 
string(26) "On the Shoulders of Giants" 
string(2) "#2"

非常感谢你。还有一件事...如何使这个例子在foreach循环函数中为多个书籍，作者... 和结果是以逗号分隔的格式？ –

DOMXpath :: evaluate（）的第二个参数是上下文节点。你需要做一些像'$ xpath-> evaluate（'string（.// a）'，$ outerNode）''。 './/是当前上下文节点的后代。对于CSV写入，请查找'fputcsv（）'。 – ThW

我刚开始学习。如果你可以写一个简单的例子...如果你不忙？这对我很重要。 –

PHP的XPath查询的标签从HREF得到规范字符

相关推荐