AJAX点击按钮使用Perl WWW:机械化
问题描述:
我正在为客户端执行一个项目,他需要能够扫描特定页面上的目录。我修改了他现有的代码来运行一个循环,因为现在有多个页面来提取内容。我试图扫描的其中一个页面:https://marriage.ag.gov.au/marriagecelebrants/civilAJAX点击按钮使用Perl WWW:机械化
您可以看到有162页看起来在AJAX上运行以加载下一批内容。现有的代码将基于点击输入名称属性:
ctl00 $ $搜索Maincontent $ gridCelebrants $ ctl00 $ ctl02 $ ctl00 ctl04 到目前为止我的代码确实基本上是刷新页面,扫相同内容的162倍。
这是目前的片段:
use warnings;
use WWW::Mechanize;
use Data::Dumper;
use HTML::TableExtract;
use Spreadsheet::WriteExcel;
#header();
# create max page array to handle civil and other page.
# number indicates how many times to click through
# first item in array is https://marriage.ag.gov.au/marriagecelebrants/civil
# second item is https://marriage.ag.gov.au/marriagecelebrants/other
my @max_page_array = qw(
162
11
);
# create URL array for the 2 pages to scrape
my @url_array = qw(
https://marriage.ag.gov.au/marriagecelebrants/civil
https://marriage.ag.gov.au/marriagecelebrants/other
);
# get size of array
my $url_array_size = scalar @url_array;
# declare vars
my $n = 0;
my $i = 0;
# time to loop through the url's
while($i < $url_array_size){
open (raw, ">output-dev-$i.txt");
close(raw);
$n = 0;
my $mech = WWW::Mechanize->new(autocheck => 1);
$mech->get($url_array[$i]);
open (raw, ">>output-dev-$i.txt");
while($n < $max_page_array[$i]){
my $c = $mech->content;
my $te = HTML::TableExtract->new(br_translate => 1,keep_html => 0);
$te->parse($c);
foreach my $ts ($te->tables) {
foreach my $row ($ts->rows) {
print raw join(',', @$row);
}
}
#this was existing code
#$mech->click("ctl00\$MainContent\$gridCelebrants\$ctl00\$ctl02\$ctl00\$ctl04");
#tried multiple variations based on documentation and got nowhere
$mech->click_button('ctl00$MainContent$gridCelebrants$ctl00$ctl02$ctl00$ctl04');
$n++;
}
close raw;
$i++;
} # while loop - url array size
我的问题是,当你点击下一步,怎样才能让我的perl脚本加载一个页面,扫下一组数据?
答
我的问题是,当你点击下一步,我如何让我的Perl脚本来加载下一页,并扫描下一组数据?
WWW::Mechanize不支持JavaScript,根据the FAQ。它提供了一个list of alternatives这样做,另请参阅WWW::Mechanize::PhantomJS。