Grabbing del.icio.us posts with Python and RSS
Fredrik Lundh | July 2007
In Grabbing del.icio.us posts with Python, I used JSON to fetch recent posts from the del.icio.us link management site.
Here’s another approach, which uses their RSS interface instead, and a simple RSS 1.0 parser built on ElementTree:
import urllib import xml.etree.ElementTree as ET # Python 2.5 # import elementtree.ElementTree as ET def RSS(tag): return "{http://purl.org/rss/1.0/}" + tag def DC(tag): return "{http://purl.org/dc/elements/1.1/}" + tag class Post(object): def __init__(self, item): self.link = item.findtext(RSS("link")) self.title = item.findtext(RSS("title")) self.description = item.findtext(RSS("description")) self.pubdate = item.findtext(DC("date")) self.tags = item.findtext(DC("subject"), "").split() def getposts(user, tag=""): if isinstance(tag, tuple): tag = "+".join(tag) uri = "http://del.icio.us/rss/%s/%s" % (user, tag) tree = ET.parse(urllib.urlopen(uri)) return map(Post, tree.getiterator(RSS("item")))
Note that the parser has a limited understanding of the RSS format; it just locates all RSS 1.0 item elements in the document, and pulls out the relevant subelements using findtext.
To try it out, you can do something like:
for post in getposts("effbot"): print post.link, post.tags
Which, when I write this, gives me something like:
http://shovelglove.com/ ['humor', 'training'] http://www.kanyewest.com/?content=video_cant_tell_alt ['fun', 'music'] http://www.svd.se/images/berglin/berglin070722.gif ['berglin'] http://www.svd.se/images/berglin/berglin_20070708.gif ['berglin'] http://shockingcats.ytmnd.com/ ['animals'] ...
See the previous article for more tips and tricks.
Lazy Parsing
The version of getposts used above pulls in the entire RSS document, and then uses getiterator to locate all item elements. Another, somewhat more elegant approach is to use ET’s iterparse interface to parse the document as it arrives, and yield populated Post objects as they’re being created:
def getposts(user, tag=""): if isinstance(tag, tuple): tag = "+".join(tag) uri = "http://del.icio.us/rss/%s/%s" % (user, tag) for event, elem in ET.iterparse(urllib.urlopen(uri)): if elem.tag == RSS("item"): yield Post(elem) elem.clear()
This version has lower latency and uses less memory than the first version.