We're back after a server migration that caused effbot.org to fall over a bit harder than expected. Expect some glitches.

The robotparser module

(New in 2.0). This module reads robots.txt files, which are used to implement the Robot Exclusion Protocol.

If you’re implementing an HTTP robot that will visit arbitrary sites on the net (not just your own sites), it’s a good idea to use this module to check that you really are welcome.

Example: Using the robotparser module
# File: robotparser-example-1.py

import robotparser

r = robotparser.RobotFileParser()
r.set_url("http://www.python.org/robots.txt")
r.read()

if r.can_fetch("*", "/index.html"):
    print "may fetch the home page"

if r.can_fetch("*", "/tim_one/index.html"):
    print "may fetch the tim peters archive"

$ python robotparser-example-1.py
may fetch the home page