AJAX点击按钮使用Perl WWW:机械化

问题描述:

我正在为客户端执行一个项目,他需要能够扫描特定页面上的目录。我修改了他现有的代码来运行一个循环,因为现在有多个页面来提取内容。我试图扫描的其中一个页面:https://marriage.ag.gov.au/marriagecelebrants/civilAJAX点击按钮使用Perl WWW:机械化

您可以看到有162页看起来在AJAX上运行以加载下一批内容。现有的代码将基于点击输入名称属性:

ctl00 $ $搜索Maincontent $ gridCelebrants $ ctl00 $ ctl02 $ ctl00 ctl04 到目前为止我的代码确实基本上是刷新页面,扫相同内容的162倍。

这是目前的片段:

use warnings; 
use WWW::Mechanize; 
use Data::Dumper; 
use HTML::TableExtract; 
use Spreadsheet::WriteExcel; 

#header(); 
# create max page array to handle civil and other page. 
# number indicates how many times to click through 
# first item in array is https://marriage.ag.gov.au/marriagecelebrants/civil 
# second item is   https://marriage.ag.gov.au/marriagecelebrants/other 
my @max_page_array = qw(
    162 
    11 
); 

# create URL array for the 2 pages to scrape 
my @url_array = qw(
    https://marriage.ag.gov.au/marriagecelebrants/civil 
    https://marriage.ag.gov.au/marriagecelebrants/other 
); 
# get size of array 
my $url_array_size = scalar @url_array; 

# declare vars 
my $n = 0; 
my $i = 0; 
# time to loop through the url's 
while($i < $url_array_size){ 
    open (raw, ">output-dev-$i.txt"); 
    close(raw); 
    $n = 0; 
    my $mech = WWW::Mechanize->new(autocheck => 1); 
    $mech->get($url_array[$i]); 

    open (raw, ">>output-dev-$i.txt"); 
    while($n < $max_page_array[$i]){ 
     my $c = $mech->content; 
     my $te = HTML::TableExtract->new(br_translate => 1,keep_html => 0); 
     $te->parse($c); 
     foreach my $ts ($te->tables) { 
      foreach my $row ($ts->rows) { 
       print raw join(',', @$row); 
      } 
     } 

     #this was existing code 
     #$mech->click("ctl00\$MainContent\$gridCelebrants\$ctl00\$ctl02\$ctl00\$ctl04"); 

     #tried multiple variations based on documentation and got nowhere 
     $mech->click_button('ctl00$MainContent$gridCelebrants$ctl00$ctl02$ctl00$ctl04'); 
     $n++; 
    } 
    close raw; 
    $i++; 
} # while loop - url array size 

我的问题是,当你点击下一步,怎样才能让我的perl脚本加载一个页面,扫下一组数据?

我的问题是,当你点击下一步,我如何让我的Perl脚本来加载下一页,并扫描下一组数据?

WWW::Mechanize不支持JavaScript,根据the FAQ。它提供了一个list of alternatives这样做,另请参阅WWW::Mechanize::PhantomJS