The urlparse module
This module contains functions to process uniform resource locators (URLs), and to convert between URLs and platform-specific filenames.
Example: Using the urlparse module
# File: urlparse-example-1.py import urlparse print urlparse.urlparse("http://host/path;params?query#fragment")
('http', 'host', '/path', 'params', 'query', 'fragment')
A common use is to split an HTTP URLs into host and path components (an HTTP request involves asking the host to return data identified by the path):
Example: Using the urlparse module to parse HTTP locators
# File: urlparse-example-2.py import urlparse scheme, host, path, params, query, fragment =\ urlparse.urlparse("http://host/path;params?query#fragment") if scheme == "http": print "host", "=>", host if params: path = path + ";" + params if query: path = path + "?" + query print "path", "=>", path
host => host path => /path;params?query
Alternatively, you can use the urlunparse function to put the URL back together again:
Example: Using the urlparse module to parse HTTP locators
# File: urlparse-example-3.py import urlparse scheme, host, path, params, query, fragment =\ urlparse.urlparse("http://host/path;params?query#fragment") if scheme == "http": print "host", "=>", host print "path", "=>", urlparse.urlunparse((None, None, path, params, query, None))
host => host path => /path;params?query
The urljoin function is used to combine an absolute URL with a second, possibly relative URL:
Example: Using the urlparse module to combine relative locators
# File: urlparse-example-4.py import urlparse base = "http://spam.egg/my/little/pony" for path in "/index", "goldfish", "../black/cat": print path, "=>", urlparse.urljoin(base, path)
/index => http://spam.egg/index goldfish => http://spam.egg/my/little/goldfish ../black/cat => http://spam.egg/my/black/cat