We're back after a server migration that caused effbot.org to fall over a bit harder than expected. Expect some glitches.

Grabbing del.icio.us posts with Python

Fredrik Lundh | September 24, 2006 | Originally posted to online.effbot.org

The del.icio.us link management site offers a convenient JSON interface for fetching the last few posts as a JSON object. While JSON is designed for use in JavaScript environments, it turns out that the JSON produced by del.icio.us is really easy to use from Python.

For example, fetching http://del.icio.us/feeds/json/effbot?raw URL gives you the 15 most recent additions to my del.icio.us feed as a single JSON object. With some extra linefeeds added for clarity, the object might look something like this:

[{"u":"http://faassen.n--tree.net/blog/view/weblog/2006/02/24/0",
  "d":"Martijn Faassen: lxml and (c)ElementTree",
  "t":["python","xml","elementtree","effbot:link","date:20060224"]},
 {"u":"http://article.gmane.org/gmane.comp.python.tutor/24986",
  "d":"Danny Yoo: elementtree mini-tutorial",
  "t":["python","xml","elementtree","effbot:link","date:20050524"]},
  ...

This looks a lot like Python, of course. In fact, it’s perfectly compatible with Python’s syntax for dictionaries, lists, and ordinary strings. To convert it into a Python object, you can simply pass it to eval:

>>> import urllib, pprint
>>> url = "http://del.icio.us/feeds/json/effbot?raw"
>>> pprint.pprint(eval(urllib.urlopen(url).read()))
[{'d': 'Martijn Faassen: lxml and (c)ElementTree',
  't': ['python', 'xml', 'elementtree', 'effbot:link', 'date:20060224'],
  'u': 'http://faassen.n--tree.net/blog/view/weblog/2006/02/24/0'},
 {'d': 'Danny Yoo: elementtree mini-tutorial',
  't': ['python', 'xml', 'elementtree', 'effbot:link', 'date:20050524'],
  'u': 'http://article.gmane.org/gmane.comp.python.tutor/24986'},
  ...
]

Not bad. A complete del.icio.us post grabber in what’s basically one line of Python.

Well, almost complete, at least. The JSON object uses UTF-8 encoding for non-ASCII text, so to be on the safe side, you should decode the strings before using them. To deal with this, and make the data a little easier to use, you can use a wrapper to represent the individual posts:

import urllib

def utf8(s):
    return unicode(s, "utf-8")

class Post(object):
    def __init__(self, item):
        self.link = utf8(item["u"])
        self.title = utf8(item["d"])
        self.description = utf8(item.get("n", ""))
        self.tags = map(utf8, item["t"])

def getposts(user):
    url = "http://del.icio.us/feeds/json/%s/?raw" % user
    return map(Post, eval(urllib.urlopen(url).read()))

for post in getposts("effbot"):
    print post.link, post.tags

The del.icio.us JSON interface provides two additional features; you can fetch up to 100 posts in each requests, and you can filter on individual tags or tag combinations. Here’s an enhanced version of the getposts function that takes optional tag and count arguments:

def getposts(user, tag="", count=15):
    if isinstance(tag, tuple):
	tag = "+".join(tag)
    url = "http://del.icio.us/feeds/json/%s/%s?raw&count=%d" % (
	user, tag, count
	)
    return map(Post, eval(urllib.urlopen(url).read()))

The tag argument can be either a single string or a tuple of strings. For example, to get all my pil-related links, you can use:

>>> for post in getposts("effbot", "pil"):
>>>     print post.link
http://louhi.kempele.fi/~skyostil/uv/fretsonfire/
http://effbot.python-hosting.com/milestone/pil-1.1.6-beta
...

To get an “official” elementtree bibliography, you can specify both effbot:link (which I’m using for bibliographic entries) and elementtree:

>>> for post in getposts("effbot", ("effbot:link", "elementtree"), 100):
>>>     print post.link
http://faassen.n--tree.net/blog/view/weblog/2006/02/24/0
http://article.gmane.org/gmane.comp.python.tutor/24986
...

As can be seen in the raw dumps above, bibliography links also include date: tags. The following snippet sorts the list by publication date:

posts = getposts("effbot", ("effbot:link", "elementtree"), 100)

def getdate(post):
    for tag in post.tags:
	if tag.startswith("date:"):
	    return tag[5:]
    return None

posts.sort(key=getdate)

for post in posts:
    print post.link

Running this gives us:

http://www.xml.com/pub/a/2003/02/12/py-xml.html
http://www-128.ibm.com/developerworks/library/x-matters28/
http://www.idealliance.org/papers/dx_xml03/papers/06-02-03/06-02-03.html
http://www.xml.com/pub/a/2004/06/30/py-xml.html
...

Generating HTML instead is straight-forward; running:

import cgi

print "<ul>"
for post in posts:
    print "<li><a href='%s'>%s</a>" % (post.link, cgi.escape(post.title))
print "</ul>"

gives us:

To simplify even more, you can move the HTML anchor code into the Post class; by adding a __str__ method, you can simply print the post object to get a link:

class Post(object):
    def __init__(self, item):
        self.link = utf8(item["u"])
        self.title = utf8(item["d"])
        self.description = utf8(item.get("n", ""))
        self.tags = map(utf8, item["t"])
    def __str__(self):
	 return "<a href='%s'>%s</a>" % (self.link, cgi.escape(self.title))

...

print "<ul>"
for post in posts:
    print "<li>", post
print "</ul>"

More on this later.