如何使用CURL解析HTML文件中的内容？

问题描述：

我想使用CURL解析XHTML内容。如何报废交易号码，重量，高度，宽度之间<table>标签。如何从此HTML文档中仅删除的内容，并使用CURL将其作为数组获取？如何使用CURL解析HTML文件中的内容？

transactions.php 

<table border=0 cellspacing=0 width=100%> 
     <tr> 
     <td colspan="2">&nbsp;</td> 
     </tr> 
     <tr> 
     <td width="30%" class="Mellemrubrikker">Transaction Number::</td> 
     <td width="70%">24752734576547IN</td> 
     </tr> 
     <tr> 
     <td width="30%" class="Mellemrubrikker">Weight:</td> 
     <td width="70%">0.85 kg</td> 
     </tr> 
     <tr> 
     <td width="30%" class="Mellemrubrikker">Length:</td> 
     <td width="70%">543 mm.</td> 
     </tr> 
     <tr> 
     <td width="30%" class="Mellemrubrikker">Height:</td> 
     <td width="70%">156 mm.</td> 
     </tr> 
     <tr> 
     <td width="30%" class="Mellemrubrikker">Width:</td> 
     <td width="70%">61 mm.</td> 
     </tr> 
     <tr> 
     <td colspan="2">&nbsp;</td> 
     </tr>  
    </table>

的index.php

<?php 
$url = "http://localhost/htmlparse/transactions.php"; 
$ch = curl_init(); 
curl_setopt($ch, CURLOPT_URL, $url); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_BASIC); 
$output = curl_exec($ch); 
$info = curl_getinfo($ch); 
curl_close($ch); 
//print_r($output); 
echo $output; 
?>

此代码从transactions.php整个HTML内容。如何获取<table>作为数组值的数据？

这不是为我的网站做我的工作。你试过了什么，以及没有达到预期的效果？ – Randy

是的，我尝试使用卷曲，但我不熟悉preg_match。 –

关于使用正则表达式解析HTML，请参见[“RegEx匹配开放标记，但XHTML自包含标记”]（http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained标签都有效/ 1732454＃1732454）。 – outis

答

我会使用文档对象模型而不是编写自己的解析代码或（上帝禁止！）正则表达式。

下面是PHP的例子：PHP Parse HTML code

答

尝试从http://simplehtmldom.sourceforge.net/

简单的HTML DOM如果你不介意使用Python和Perl你可以使用beautifulsoup或WWW-机械化

来到这里建议相同。 :) – iHaveacomputer

如何使用CURL解析HTML文件中的内容？

相关推荐