How to retrieve images from namespace, enclosure and description nodes in RSS, using Actionscript 3
One of the things that I really wanted Telexer to be good at, was the ability to show thumbnail images from as many feeds as possible. As there are now so many flavors of RSS, you’ll need to try several options to retrieve images from a feed. In this three-part article I’ll describe how I tackled this:
- a quick look at the types of RSS nodes containing images that you could encounter
- how to retrieve images in different types of xml nodes (like namespace, enclosure and description node), using Actionscript3
- two Actionscript3 ways to load those images in the AIR app
1. RSS nodes that could contain images
There are roughly three ways to reference images in an RSS feed: in the description, in the media tag, and as an enclosure. Here are a few examples of those different flavors of RSS that I looked at. I highlighted the relevant node in black and the string with the image’s source in red.
description node
The older rss 0.9 variants and some new rss 2.0 feeds as well, could have images in html in the description node:
//Slate video: rss 2.0, img in description
//also in rss 0.9 variants; src could be gif as well
<item>
<description><img src="http://snip/ObamaGM-thumb.jpg?pubId=78144477"/>
President Barack Obama said Monday that neither General Motors nor Chrysler has
proposed sweeping enough changes to justify further large federal bailouts.
</description>
</item>
As the description contains html, some feeds wrap it in the cdata tag. And note that it’s also not a given that there is just a space between “img” and “src”:
//eBay search result: img in description, cdata
<item>
<description><![CDATA[<table border="0" cellpadding="8"><tr><td><a href="http://snip">
<img border="0" src="http://thumbs.ebaystatic.com/pict/260465871236_0.jpg"></a></td><td>
<strong>US $10.00</strong> (1 Bid)<br /> End Date: Friday Aug-28-2009 5:10:56 PDT<br />
<a href="http://snip">Bid now</a> | <a href="http://snip">Add to watch list</a>
</td></tr></table>]]>
</description>
</item>
the media node: using namespace
Some rss 2.0 feeds use the media node for images, which is used with a namespace. You could encounter several media nodes in one item:
//NYTimes: rss 2.0, using namespace media:content
<item>
<media:content url="http://graphics8.nytimes.com/snip/ts-krugman-75.jpg"
medium="image" height="75" width="75"/>
<media:description>Paul Krugman</media:description>
</item>
The thing to look for in the above example is “media:content”. Colin Moock has a wonderful way of explaning the use of namespaces: your “orange” could be a color, or it could be a fruit – so you’d have
//BBC: rss 2.0, namespace media:thumbnail
<item>
<media:thumbnail width="66" height="49"
url="http://newsimg.bbc.co.uk/snip/pic-1.jpg" />
</item>
//digg: namespace thumbnail and content
<item>
<media:thumbnail url="http://digg.com/snip/t.jpg" width="80" height="80" />
<media:group>
<media:content url="http://digg.com/snip/t.jpg" medium="image"
type="image/jpeg" width="80" height="80" />
<media:content url="http://digg.com/snip/a.jpg" medium="image"
type="image/jpeg" width="30" height="30" />
<media:content url="http://digg.com/snip/s.jpg" medium="image"
type="image/jpeg" isDefault="true" width="48" height="48" />
</media:group>
</item>
enclosures
In Atom feeds – for example in Flickr or Feedburner feeds, you can come across yet another construction: the link node with a “rel” attribute set to “enclosure”.
//flickr / feedburner feeds: atom, using enclosure
<entry>
<link rel="alternate" type="text/html" href="http://www.flickr.com/snip/83663/"/>
<link rel="enclosure" type="image/jpeg"
href="http://farm4.static.flickr.com/snip/5f36d83332_o.jpg" />
</entry>
problem cases
These are basically the result of a poor understanding of RSS on the side of the feed provider, IMHO. The feed provider should deliver “really simple” aka plain structured content, and the feed reader should render it. That’s what “syndication” means: someone else renders the raw content you provide.
As the description tag can and will contain html, some feed providers just put lots of stuff in there that belong to the rendering side – table markup, borders around images, or even print icons. Those images – that do not provide any description of the content and should not be in that node – could show up as thumbnail if you’re not careful:
//Volkskrant: custom or old-style feedburner?
//description has gif images for "print", "tell a friend" etc
<item>
<description>AMSTERDAM - [txt snipped].
(16:30, 24-03-09)<div class="feedflare">
<a href="http://feeds.volkskrant.nl/~ff/laatstenieuws?a=5dhlMTEQfiA:KX8OSGlCKHU:yIl2AUoC8zA">
<img src="http://feeds2.feedburner.com/~ff/laatstenieuws?d=yIl2AUoC8zA" border="0"></img></a>
<a href="http://feeds.volkskrant.nl/~ff/laatstenieuws?a=5dhlMTEQfiA:KX8OSGlCKHU:F7zBnMyn0Lo">
<img src="http://feeds2.feedburner.com/~ff/laatstenieuws?i=5dhlMTEQfiA:KX8OSGlCKHU:F7zBnMyn0Lo" border="0"></img></a>
</div><img src="http://feeds2.feedburner.com/~r/laatstenieuws/~4/5dhlMTEQfiA" height="1" width="1"/></description>
</item>
Some feeds go even further to disable an easy retrieval of their images, by using a custom xml tag. The Telexer is not going to be able to find the image in this one:
//bgnewsroom.com, a russian feed; rss 2.0, custom tag
<item>
<pictire>http://bgnewsroom.com/uploads/news/p1/00121038.jpg</pictire>
</item>
Next up, the juicy bits.
In part 2 I’ll be taking a look at the Actionscript 3 that is needed to retrieve images from these sources:
One Comment
Great article, thank you for putting all of this information together.