The Consumer Interface in Python

The consumer interface is a simple “data sink” interface, used by standard Python modules such as xmllib and sgmllib.

Other examples include the GZIP consumer and PIL’s ImageParser class.

The consumer will typically convert incoming raw data in some way, and pass it on to a another layer. For example, XML parsers implementing this protocol usually parse the data stream into a stream of XML tokens (that is, start tags, character data, end tags, etc).

Interface

feed(data)

Process incoming data. The data argument should be a byte string. The application can call this method as many times as it wants (or not at all, if the source is empty). The data buffer may contain zero or more bytes of data.

close()

No more data is available. The application should call this method when it has reached the end of the source stream.

reset() (optional)

Reset the consumer. Note that this method isn’t part of the core consumer protocol, and applications should be prepared to deal with consumers that don’t provide this method.

Examples:

try:
     reset = consumer.reset
except AttributeError:
     pass
else:
     reset()
or:

if hasattr(consumer, "reset"):
    consumer.reset()

Patterns 

Read a file piece by piece:

c = consumer(...)

f = open(filename, "rb")

while 1:
    s = f.read(8192)
    if not s:
        break
    c.feed(s)

c.close()
f.close()

Read and parse a file in a single operation:

c = consumer(...)
f = open(filename, "rb")
c.feed(f.read())
f.close()
c.close()

Read and parse a file as it arrives over a network (this example uses the asyncore library):

class protocol_client(asyncore.dispatcher):

    ...

    def handle_connect(self):
        self.consumer = consumer(...)
        ...

    def handle_read(self, data):
        self.consumer.feed(data)

    def handle_close(self):
        self.consumer.close()
        self.close()

    ...

Incremental Parsing Using the Consumer API