Getting the Size of an Image on a Web Page

Getting the Size of an Image on a Web Page

Q. I need to dynamically access a webpage and find the largest graphic used on that page, so I tried the following…

Largest as in “number of pixels”, or “number of bytes”?

If you want to get the size in bytes, you don’t need to decode the image at all; just grab the data, and use the len() function to get the number of bytes.

import urllib

def getsize(uri):
    file = urllib.urlopen(url)
    return len(file.read())

print getsize("http://www.pythonware.com/images/small-yoyo.gif")
# 10965

If you don’t want to load the entire image, check the header instead:

import urllib

def getsize(uri):
    file = urllib.urlopen(uri)
    size = file.headers.get("content-length")
    file.close()
    return int(size)

print getsize("http://www.pythonware.com/images/small-yoyo.gif")
# 10965

(In theory, some servers may not set the content-length header, so it could be a good idea to fall back on len(file.read()) if the get method returns None).

Note that urllib always uses the HTTP GET method; to be a bit nicer to the server, you can use the HTTP HEAD method instead. This helper shows you how to do that:

import httplib, urlparse

def getsize(uri):

    # check the uri
    scheme, host, path, params, query, fragment = urlparse.urlparse(uri)
    if scheme != "http":
        raise ValueError("only supports HTTP requests")
    if not path:
        path = "/"
    if params:
        path = path + ";" + params
    if query:
        path = path + "?" + query

    # make a http HEAD request
    h = httplib.HTTP(host)
    h.putrequest("HEAD", path)
    h.putheader("Host", host)
    h.endheaders()

    status, reason, headers = h.getreply()

    h.close()

    return headers.get("content-length")

print getsize("http://www.pythonware.com/images/small-yoyo.gif")
# 10965

Note: The urllib2 library provides an easier way to do HEAD requests. See this page for an example.

If you want to get the size in pixels, use the ImageFile.Parser class or the ImageFileIO module (or read the whole thing, and wrap it in a StringIO object).

The following helper returns both the size of the file, and the size of the image, usually without loading more than 1k or so.

import urllib
import ImageFile

def getsizes(uri):
    # get file size *and* image size (None if not known)
    file = urllib.urlopen(uri)
    size = file.headers.get("content-length")
    if size: size = int(size)
    p = ImageFile.Parser()
    while 1:
        data = file.read(1024)
        if not data:
            break
        p.feed(data)
        if p.image:
            return size, p.image.size
            break
    file.close()
    return size, None

print getsizes("http://www.pythonware.com/images/small-yoyo.gif")
# (10965, (179, 188)