November 17, 2003 | Fredrik Lundh
Note: A distribution kit containing the source code for this article is available from the effbot.org downloads site (look for ElementSOAP 0.1 or later).
Talking to Google #
The Google Web API is a SOAP interface to Google’s search service, and a few other services.
Note: To run the examples in this article, you need ElementTree 1.2a5 or later. Earlier versions may create invalid XML for request structures that contain more than one typed value.
The search service is implemented by the doGoogleSearch SOAP request. The request takes a structure containing the query, the number of results, filter settings, and a few more options, and returns a list of hits. Here’s a typical request:<soap:Envelope> <soap:Body> <google:doGoogleSearch soap:encodingStyle="..."> <key xsd:type="xsi:string">... license key ...</key> <q xsd:type="xsi:string">query</q> <start xsd:type="xsi:int">0</start> <maxResults xsd:type="xsi:int">10</maxResults> ... more fields ... </google:doGoogleSearch> </soap:Body> </soap:Envelope>key field is a personal Google license key. You can get your own key from Google.
The q field contains the query. You can download up to ten results per request; to get more results, you have to issue the same request again, with a different start setting.
The request structure must contain a few more mandatory fields; see the sample code below for a list.
We can use the SoapService class to create a simple service wrapper for the Google API; here’s a first version:
from ElementSOAP import SoapService, SoapRequest, SoapElement class GoogleService(SoapService): url = "http://api.google.com/search/beta2" def __init__(self, key, url=None): self.__key = key SoapService.__init__(self, url) def doGoogleSearch(self, query, start=0, maxResults=10): action = "urn:GoogleSearchAction" request = SoapRequest("{urn:GoogleSearch}doGoogleSearch") SoapElement(request, "key", "string", self.__key) SoapElement(request, "q", "string", query) SoapElement(request, "start", "int", str(start)) SoapElement(request, "maxResults", "int", str(maxResults)) SoapElement(request, "filter", "boolean", "true") SoapElement(request, "restrict", "string", "") SoapElement(request, "safeSearch", "boolean", "false") SoapElement(request, "lr", "string", "") SoapElement(request, "ie", "string", "utf-8") SoapElement(request, "oe", "string", "utf-8") return self.call(action, request).find("return")
The doGoogleSearch method takes a query string, and optional start and count arguments. It returns a complete response structure, which contains information about the search, as well as elements describing each individual hit. To get a list of hit descriptors, use findall(“.//item”).
Here’s an example:
g = GoogleService("... license key ...") response = g.doGoogleSearch("... query ...") print response.findtext("estimatedTotalResultsCount") for item in response.findall(".//item"): print item.findtext("URL"), repr(item.findtext("title"))cachedSize, hostName, snippet (an HTML fragment), and summary.
Response subelements include estimatedTotalResultsCount, searchTime, startIndex, endIndex, and searchQuery (the original query string).
Handling Server Errors #
If you pass in invalid argument values to the Google search service, you’re likely to get server errors from the HTTP client layer. For example, let’s play with the maxResults setting:>>> g = GoogleService("... license key ...") >>> g.doGoogleSearch("hello", maxResults=10) <Element return at 9dd83c>Getting 10 results in one operation worked just fine. What about 10 times as many?
>>> g.doGoogleSearch("hello", maxResults=100) Traceback (most recent call last): File "myprogram.py", line 24, in doGoogleSearch File "ElementSOAP.py", line 64, in call extra_headers=[("SOAPAction", action)] File "HTTPClient.py", line 114, in do_request raise HTTPError(errcode, errmsg, headers, h.getfile()) HTTPClient.HTTPError: (500, 'Internal Server Error')
Oops. Looks like someone didn’t like that request.
This error isn’t really an HTTP server error in the usual sense; rather, it’s the error code used by SOAP 1.1 to signal that a request couldn’t be processed. When this happens, the HTTP response body may contain a SOAP response with a special SOAP Fault element.
If you disable the errcode check in HTTPClient, and print out the response structure, you’ll get something like:
<soap:Envelope> <soap:Body> <soap:Fault> <faultcode>SOAP-ENV:Server</faultcode> <faultstring>Exception from service object: maxResults must be 10 or less.</faultstring> <faultactor>/search/beta2</faultactor> <detail> ... additional elements ... </detail> ... </soap:Fault> </soap:Body> </soap:Envelope>
This looks like an ordinary response, but instead of the return element, it contains a soap:Fault element, with more information about what really happened.
It would be nice if our SOAP code raised some kind of SOAP-specific exception, instead of hiding the error message down in the HTTP client layer.
To deal with this, I’ve modified the HTTPClient slightly; if the status code isn’t 200, the client will still raise an exception, but the exception object contains not only the HTTP status code, but also the HTTP header dictionary and the file handle. This allows the exception handler to look at the header, and if necessary, parse the response body.
class HTTPError(Exception): pass class HTTPClient: def do_request(...): ... if errcode != 200: raise HTTPError(errcode, errmsg, headers, h.getfile()) ...
With this in place, it’s time to teach the SoapService class to look for SOAP fault responses. Here’s the updated server class:
from HTTPClient import HTTPClient, HTTPError class SoapService: def __init__(self, url=None): self.__client = HTTPClient(url or self.url) def call(self, action, request): # build SOAP envelope envelope = Element(NS_SOAP_ENV + "Envelope") body = SubElement(envelope, NS_SOAP_ENV + "Body") body.append(request) # call the server try: response = self.__client.do_request( tostring(envelope), extra_headers=[("SOAPAction", action)] ) except HTTPError, v: if v[0] == 500: # might be a SOAP fault response = ElementTree.parse(v[3]) response = response.getroot().find(body.tag)[0] if response.tag == NS_SOAP_ENV + "Fault": raise SoapFault( response.findtext("faultcode"), response.findtext("faultstring"), response.findtext("faultactor"), response.find("detail") ) return response
And here’s the exception class used for SOAP fault responses:
class SoapFault(Exception): def __init__(self, faultcode, faultstring, faultactor, detail): Exception.__init__(self, faultcode, faultstring, faultactor, detail) self.faultcode = faultcode self.faultstring = faultstring self.faultactor = faultactor self.detail = detail
After these modifications, the error message is a bit more informative:
>>> g.doGoogleSearch("hello", maxResults=100) Traceback (most recent call last): File "myprogram.py", line 24, in doGoogleSearch File "ElementSOAP.py", line 77, in call response.find("detail") ElementSOAP.SoapFault: ('SOAP-ENV:Server', 'Exception from service object: maxResults must be 10 or less.', '/search/beta2', <Element detail at 9dd6e4>)(if you know your SOAP, you’ll notice a slight problem hidden in that error message. I’ll get back to that later on).
More Google Services #
By the way, the current version of the Google service provides two more methods. The doGetCachedPage method returns the cached version of a given document, if available:class GoogleService: ... def doGetCachedPage(self, url): action = "urn:GoogleSearchAction" request = SoapRequest("{urn:GoogleSearch}doGetCachedPage") SoapElement(request, "key", "string", self.__key) SoapElement(request, "url", "string", url) response = self.call(action, request).findtext("return") if response: import base64 response = base64.decodestring(response) return response
>>> g = GoogleService(key) >>> g.doGetCachedPage("online.effbot.org")[:40] '<meta http-equiv="Content-Type" content= ...' >>> len(g.doGetCachedPage("online.effbot.org")) 10154The doSpellingSuggestion method suggests an alternative spelling for a given phrase:
class GoogleService: ... def doSpellingSuggestion(self, phrase): action = "urn:GoogleSearchAction" request = SoapRequest("{urn:GoogleSearch}doSpellingSuggestion") SoapElement(request, "key", "string", self.__key) SoapElement(request, "phrase", "string", phrase) return self.call(action, request).findtext("return")
>>> g = GoogleService(key) >>> g.doSpellingSuggestion("pyhton") 'python'(you’re beginning to see a pattern here, right?)