Perl的XML :: DOM解析器::

问题描述:

我试图让从RSS源全光照perl的,XML DOM ::和XML SOM信息::分析器。 我有一个困难时期对XML越来越SOM文档:: DOM和XML解析器:: :(Perl的XML :: DOM解析器::

这是RSS提要支出。

<rss version="2.0"> 
<channel> 
    <item> 
     <title>The title numer 1</title> 
     <link> 
     http://www.example.com/link1.php?getfile=1&sha=1234567890 
     </link> 
     <description> 
     File 1 
     </description> 
    </item> 
    <item> 
     <title>The title numer 2</title> 
     <link> 
     http://www.example.com/link1.php?getfile=2&sha=0192837465 
     </link> 
     <description> 
     File 2 
     </description> 
    </item> 
     <item> 
     <title>The title numer 3</title> 
     <link> 
     http://www.example.com/link1.php?getfile=1&sha=0987654321 
     </link> 
     <description> 
     File 3 
     </description> 
    </item> 
</channel> 

所以我想获得“冠军”,并从这个RSS提要的“链接”。

我不能使用XML ::的libxml或XML ::简单或XML :: RSS

我得到的错误尝试安装它,但它看起来像这样:

use XML::DOM::Parser qw(); 
use XML::XQL   qw(); 
use XML::XQL::DOM qw(); 

my $parser = XML::DOM::Parser->new(); 
my $doc = $parser->parsefile("file.xml"); 

for my $item_node ($doc->xql('/channel/item')) { 
    my $title = join '', $item_node->xql('title/textNode()'); 
    my $link = join '', $item_node->xql('link/textNode()'); 
    ... 
} 
+0

为什么downvote? – ikegami

解析您的RSS XML文件存在问题。对于文件

<xml> 
<channel> 
    <item> 
     <title>The title numer 1</title> 
     </item> 

    <item> 
     <title>The title numer 2</title> 
     </item> 
</channel> 
</xml> 

你可以做

use strict; 
use warnings; 
use XML::Parser; 
use Data::Dumper; 
use XML::DOM::Lite qw(Parser XPath); 

my $parser = Parser->new(); 
my $doc = $parser->parseFile('2.xml', whitespace => 'strip'); 


#XML::DOM::Lite::NodeList - blessed array ref for containing Node objects 
my $nlist = $doc->selectNodes('/xml/channel/item/title'); 


foreach my $node (@{$nlist}) 
{ 
    print $node->firstChild()->nodeValue() . "\n"; 
} 

有一个问题,您的XML数据(不带引号的 '&' 字符):

线,如

...getfile=1&sha... 

绝写为

...getfile=1&amp;sha... 

一旦这是固定的,你可以使用XML ::阅读:PP解析XML:

use strict; 
use warnings; 

use XML::Reader::PP; 

my $rdr = XML::Reader::PP->new(\*DATA, { mode => 'branches' }, 
    { root => '/rss/channel/item', branch => [ '/title', '/link' ] }); 

while ($rdr->iterate) { 
    my ($title, $link) = $rdr->value; 

    for ($title, $link) { 
     $_ = '' unless defined $_; 
    } 

    print "title = '$title'\n"; 
    print "link = '$link'\n"; 
} 

__DATA__ 
<rss version="2.0"> 
    <channel> 
    <item> 
     <title>The title numer 1</title> 
     <link> 
     http://www.example.com/link1.php?getfile=1&amp;sha=1234567890 
     </link> 
     <description> 
     File 1 
     </description> 
    </item> 
    <item> 
     <title>The title numer 2</title> 
     <link> 
     http://www.example.com/link1.php?getfile=2&amp;sha=0192837465 
     </link> 
     <description> 
     File 2 
     </description> 
    </item> 
     <item> 
     <title>The title numer 3</title> 
     <link> 
     http://www.example.com/link1.php?getfile=1&amp;sha=0987654321 
     </link> 
     <description> 
     File 3 
     </description> 
    </item> 
    </channel> 
</rss>