使用perl解析Photobucket RSS提要?
试图从Photobucket中读取和解析RSS提要,并且我很难获取元素的子元素。以下是RSS XML的示例:使用perl解析Photobucket RSS提要?
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:media="http://search.yahoo.com/mrss/">
<channel>
<title>BlahBlah's Photobucket websitePic album media</title>
<description>A feed of BlahBlah's images and videos for this album</description>
<pubDate>Sun, 7 Aug 2011 20:11:31 MDT</pubDate>
<link>http://s1100.photobucket.com/albums/g409/BlahBlah/websitePic/?sort=ascending</link>
<lastBuildDate>Mon, 13 Feb 2012 21:04:43 MST</lastBuildDate>
<generator>Photobucket feed generator</generator>
<language>en-us</language>
<ttl>60</ttl>
<item>
<title>F1 sidecar</title>
<link>http://s1100.photobucket.com/albums/g409/BlahBlah/websitePic/?action=view&current=IMG_0673.jpg&sort=ascending</link>
<dc:creator>BlahBlah</dc:creator>
<description><p><a href="http://s1100.photobucket.com/albums/g409/BlahBlah/">BlahBlah</a> posted a photo</a></p><p><a href="http://s1100.photobucket.com/albums/g409/BlahBlah/websitePic/?action=view&current=IMG_0673.jpg&sort=ascending" title="IMG_0673.jpg"><img src="http://i1100.photobucket.com/albums/g409/BlahBlah/websitePic/th_IMG_0673.jpg" alt="IMG_0673.jpg" /></a><br>F1 sidecar - IMG_0673.jpg</p></description>
<guid>http://i1100.photobucket.com/albums/g409/BlahBlah/websitePic/IMG_0673.jpg</guid>
<enclosure type="image/jpeg" url="http://i1100.photobucket.com/albums/g409/BlahBlah/websitePic/IMG_0673.jpg" />
<media:content medium="image" type="image/jpeg" url="http://i1100.photobucket.com/albums/g409/BlahBlah/websitePic/IMG_0673.jpg">
<media:title>F1 car</media:title>
<media:description />
<media:thumbnail url="http://i1100.photobucket.com/albums/g409/BlahBlah/websitePic/th_IMG_0673.jpg" />
</media:content>
<pubDate>Sun, 7 Aug 2011 20:11:31 MDT</pubDate>
</item>
我想获取元素以获取其值。这里是我的代码,它不工作...它会遍历所有通道项目的
use strict;
use CGI;
use XML::RSS;
use LWP::Simple;
my $test = CGI->new;
my $url = "http://feed1100.photobucket.com/albums/g409/BlahBlah/websitePic/feed.rss";
my $rss = XML::RSS->new();
my $data = get($url);
$rss->parse($data);
$rss->add_module(prefix=>'media', uri=>'http://search.yahoo.com/mrss/');
print $test->header("text/html");
my $channel = $rss->{channel};
foreach my $item (@{ $rss->{items} })
{
my $link = $item->{link};
my $title = $item->{title};
my $thumb = '';
foreach my $b ({ $item->{'http://search.yahoo.com/mrss/'}->{'content'} })
{
print "here\n";
if($b->{'http://search.yahoo.com/mrss/'}->{'thumbnail'}->{'url'})
{
$thumb = $thumb . ' ' . $b->{'http://search.yahoo.com/mrss/'}->{'thumbnail'}->{'url'};
}
}
print $title, "\n", $link, "\nthumb=", $thumb, "\n\n\n";
}
print $test->end_html;
,并找到元素,但我似乎无法得到的子元素。我认为我的语法很接近。想法?
items
事情被解析成这样的结构:
items => [
{
dc => {
creator => "BlahBlah"
},
description => "<p><a href=\"http://s1100.photobucket.com/albums/g409/BlahBlah/\">BlahBlah</a> posted a photo</a></p><p><a href=\"http://s1100.photobucket.com/albums/g409/BlahBlah/websitePic/?action=view¤t=IMG_0673.jpg&sort=ascending\" title=\"IMG_0673.jpg\"><img src=\"http://i1100.photobucket.com/albums/g409/BlahBlah/websitePic/th_IMG_0673.jpg\" alt=\"IMG_0673.jpg\" /></a><br>F1 sidecar - IMG_0673.jpg</p>",
enclosure => {
type => "image/jpeg",
url => "http://i1100.photobucket.com/albums/g409/BlahBlah/websitePic/IMG_0673.jpg"
},
guid => "http://i1100.photobucket.com/albums/g409/BlahBlah/websitePic/IMG_0673.jpg",
"http://purl.org/dc/elements/1.1/" => {
creator => "BlahBlah"
},
"http://search.yahoo.com/mrss/" => {
content => "\n ...",
title => "F1 car"
},
isPermaLink => "",
item => "\n \n\n ...",
link => "http://s1100.photobucket.com/albums/g409/BlahBlah/websitePic/?action=view¤t=IMG_0673.jpg&sort=ascending",
pubDate => "Sun, 7 Aug 2011 20:11:31 MDT",
title => "F1 sidecar"
}
],
比如我没有看到"http://search.yahoo.com/mrss/"
在任何"thumbnail"
关键。转储数据以查看它们的外观如何是个好主意。看到这样的模块,如Data::Dump。
它看起来像在这条线一个错字:
foreach my $b ({ $item->{'http://search.yahoo.com/mrss/'}->{'content'} })
我想你错过了在第一个前面的“@”,“{”那里。从你的例子
如果我这样做,然后我得到下面的错误不能使用字符串(“ ” ......)的数组引用,而在使用 – MonkeyWrench 2012-02-14 04:49:21
‘严参’我不知道呢。我认为我从未有过处理XML :: RSS使用的命名空间语法的“乐趣”。但它确实意味着'$ item-> {'http://search.yahoo.com/mrss/'} - > {'content'}'不是你可以迭代的东西。你需要找到一种不同的方式。我建议插入“use Data :: Dumper; print Dumper($ item);”就在你的foreach之上,告诉你实际上在那里。也可能不是每个帖子都有media:content标签,在这种情况下,您需要首先检测。 – zostay 2012-02-14 05:00:38
这是我想要的元素,我需要元素的'url'属性:但是,谢谢,我会尝试数据转储。 –
MonkeyWrench
2012-02-14 16:02:15