The robotparser module
(New in 2.0). This module reads robots.txt files, which are used to implement the Robot Exclusion Protocol (dead link).
If you’re implementing an HTTP robot that will visit arbitrary sites on the net (not just your own sites), it’s a good idea to use this module to check that you really are welcome.
Example: Using the robotparser module
# File: robotparser-example-1.py import robotparser r = robotparser.RobotFileParser() r.set_url("http://www.python.org/robots.txt") r.read() if r.can_fetch("*", "/index.html"): print "may fetch the home page" if r.can_fetch("*", "/tim_one/index.html"): print "may fetch the tim peters archive"
$ python robotparser-example-1.py may fetch the home page