如何从html链接抓取并下载所有pdf文件？

问题描述：

这是我的代码来抓取所有的pdf链接，但它不起作用。如何从这些链接下载并保存到我的电脑上的文件夹？如何从html链接抓取并下载所有pdf文件？

<?php 
set_time_limit(0); 
include 'simple_html_dom.php'; 

$url = 'http://example.com'; 
$html = file_get_html($url) or die ('invalid url'); 

//extrack pdf links 
foreach($html->find('a[href=[^"]*\.pdf]') as $element) 
echo $element->href.'<br>'; 
?>

它看起来像你有一个错字，在foreach循环中，$ htnl应该是$ html。如果这不在你的原始代码中，你得到的错误究竟是什么？ – ggreiner 2012-02-01 22:05:48

“does not work”>。 2012-02-01 22:12:15

@ggreiner在我的ori代码中，没有错别字，对不起。我错过了这里的错字。空白结果在我的网页 – puresmile 2012-02-01 22:54:06

答

foreach($htnl->find('a[href=[^"]*\.pdf]') as element) 
      ^---typo. should be an 'm'  ^---typo. need a $ here

请问你的代码不是因为上面的错字的“不行”，其他？

ups，对不起，在我的原始代码中，没有错字 - .-。它不起作用，空白结果在我的网页 – puresmile 2012-02-01 22:52:41

答

你看过phpquery吗？ http://code.google.com/p/phpquery/

如何从html链接抓取并下载所有pdf文件？

相关推荐