Perl的XML :: DOM解析器::
问题描述:
我试图让从RSS源全光照perl的,XML DOM ::和XML SOM信息::分析器。 我有一个困难时期对XML越来越SOM文档:: DOM和XML解析器:: :(Perl的XML :: DOM解析器::
这是RSS提要支出。
<rss version="2.0">
<channel>
<item>
<title>The title numer 1</title>
<link>
http://www.example.com/link1.php?getfile=1&sha=1234567890
</link>
<description>
File 1
</description>
</item>
<item>
<title>The title numer 2</title>
<link>
http://www.example.com/link1.php?getfile=2&sha=0192837465
</link>
<description>
File 2
</description>
</item>
<item>
<title>The title numer 3</title>
<link>
http://www.example.com/link1.php?getfile=1&sha=0987654321
</link>
<description>
File 3
</description>
</item>
</channel>
所以我想获得“冠军”,并从这个RSS提要的“链接”。
我不能使用XML ::的libxml或XML ::简单或XML :: RSS
答
我得到的错误尝试安装它,但它看起来像这样:
use XML::DOM::Parser qw();
use XML::XQL qw();
use XML::XQL::DOM qw();
my $parser = XML::DOM::Parser->new();
my $doc = $parser->parsefile("file.xml");
for my $item_node ($doc->xql('/channel/item')) {
my $title = join '', $item_node->xql('title/textNode()');
my $link = join '', $item_node->xql('link/textNode()');
...
}
答
解析您的RSS XML文件存在问题。对于文件
<xml>
<channel>
<item>
<title>The title numer 1</title>
</item>
<item>
<title>The title numer 2</title>
</item>
</channel>
</xml>
你可以做
use strict;
use warnings;
use XML::Parser;
use Data::Dumper;
use XML::DOM::Lite qw(Parser XPath);
my $parser = Parser->new();
my $doc = $parser->parseFile('2.xml', whitespace => 'strip');
#XML::DOM::Lite::NodeList - blessed array ref for containing Node objects
my $nlist = $doc->selectNodes('/xml/channel/item/title');
foreach my $node (@{$nlist})
{
print $node->firstChild()->nodeValue() . "\n";
}
答
有一个问题,您的XML数据(不带引号的 '&' 字符):
线,如
...getfile=1&sha...
绝写为
...getfile=1&sha...
一旦这是固定的,你可以使用XML ::阅读:PP解析XML:
use strict;
use warnings;
use XML::Reader::PP;
my $rdr = XML::Reader::PP->new(\*DATA, { mode => 'branches' },
{ root => '/rss/channel/item', branch => [ '/title', '/link' ] });
while ($rdr->iterate) {
my ($title, $link) = $rdr->value;
for ($title, $link) {
$_ = '' unless defined $_;
}
print "title = '$title'\n";
print "link = '$link'\n";
}
__DATA__
<rss version="2.0">
<channel>
<item>
<title>The title numer 1</title>
<link>
http://www.example.com/link1.php?getfile=1&sha=1234567890
</link>
<description>
File 1
</description>
</item>
<item>
<title>The title numer 2</title>
<link>
http://www.example.com/link1.php?getfile=2&sha=0192837465
</link>
<description>
File 2
</description>
</item>
<item>
<title>The title numer 3</title>
<link>
http://www.example.com/link1.php?getfile=1&sha=0987654321
</link>
<description>
File 3
</description>
</item>
</channel>
</rss>
为什么downvote? – ikegami