Common Log Format
March 2004 | Fredrik Lundh
Here’s a simple regular expression that can be used to parse server log files, in the Common Log Format.
p = re.compile( '([^ ]*) ([^ ]*) ([^ ]*) \[([^]]*)\] "([^"]*)" ([^ ]*) ([^ ]*)' ) for line in file.readlines(): m = p.match(line) if not m: continue host, ignore, user, date, request, status, size = m.groups() ...
Here’s a variation that parses the Extended Common Log Format, which contains additional referrer and user-agent fields.
p = re.compile( '([^ ]*) ([^ ]*) ([^ ]*) \[([^]]*)\] "([^"]*)" ([^ ]*) ([^ ]*)' ' "([^"]*)" "([^"]*)"' # extensions ) for line in file.readlines(): m = p.match(line) if not m: continue host, ignore, user, date, request, status, size, referer, agent = m.groups() ...