Step 1: Using Django as Front-End Cache
Fredrik Lundh | February 2006
This note describes the first step in the proposed deployment plan. In this step, a Django server is placed in front of the current site, and content is then migrated over to the content wiki, page by page. Throughout this step, the python.org site will still look pretty much as it looks today.
The initial setup uses a page-mapping Django application running under mod_python, and a memcached server for page caching.
During simulations on a low-end test server on a local network, this configuration is able to serve well over 1,000,000 pages per hour (whether this would apply to python.org is more than I can tell, though).
Page Mapping
The Apache setup is modified so that the current content is made available via the /data prefix. All other requests are passed to a simple Django application that uses a page table to map public URL:s to source resources. This table has the following layout:
- path
- Document path. This is the original URL, minus the site part (e.g. /index.html for the front page).
- status
- Document status. This is the HTTP code to return for this document. For documents that are not found in this table, the renderer searches the file storage, adds the document if found, or returns 404 if not found.
- source
- The internal document source. If the status is 200, this can be either a wiki pseudo-URL or a file URL (see below). If the status is 301 or 302, this should be a http URL. If the status is 404 or 410, this field is ignored, and can be left blank.
Document Sources
Documents can be stored in two locations: in the content wiki, or as plain files on disk (relative to a specified directory).
- file:
- File resources refer to static resources on disk. In the initial phase, such resources will be returned to the client as is.
- wiki:
- Wiki resources live in the content wiki. In the initial phase, such resources will receive basic styling by the renderer, to make them look like existing python.org pages.
(In addition, resources can also be stored elsewhere in the http: space, via redirects).
Data Resources
Requests for data resources (downloads, etc) are redirected to the /data tree (using temporary redirects), where they are handled by Apache.
To Do #
(Preliminary)
- Set up mod_python on the www.python.org server.
Install Django 0.91(dead link) under e.g. /django.- Install memcached (a 64 or 128 megabyte cache is probably enough). Don’t forget the client library!
-
- server (or install from RPM repository)
- client
- Clean up the pydotorg.dyndns.org mapping application and install under /django.
- Populate the page map (this can be done automatically by the mapping application, simply by letting it look in the file tree for any resource that’s not found in the mapping table).
- When everything feels right, update the Apache configuration so that /data requests are handled by Apache, and everything else is goes through Django.
In the next step, we’ll configure the content wiki, and start serving portions of the site from the wiki. More on that later.
Notes
The Django Performance Tips page warns against serving “media” (i.e. data and images) from the same Apache instance as the main site.
Technical Note: Django Model
(Preliminary)
HTTP_STATUS = ( (200, '200 OK'), (301, '301 Permanent Redirect'), (302, '302 Temporary Redirect'), (404, '404 Not Found'), (410, '410 Gone'), ) class Page(meta.Model): path = meta.CharField("URI", maxlength=200) # FIXME: index? primary key? status = meta.IntegerField(choices=HTTP_STATUS) source = meta.CharField("File (or redirect URI)", maxlength=200, blank=True) # TODO: add timestamps? auto-flags? def __repr__(self): return self.path class META: admin = meta.Admin( list_display=["path", "status", "target"], search_fields=["path"] )