Element Library Functions

March 22, 2004 | Fredrik Lundh

ElementTree 1.3 may include a new module, ElementLib, with a number of convenient helper functions.

The exact contents are yet to be determined; here are some of the current proposals (from various sources, in no specific order):

Helpers to add subelements, with a nicer syntax.

Wrappers to access elements via attributes (templating).

copy

copy/deepcopy: Copy element structures. [the copy module is supposed to work /F]

flatten

flatten: Recursively extract text content.

def flatten(elem, include_tail=0):
    text = elem.text or ""
    for e in elem:
        text += flatten(e, 1)
    if include_tail and elem.tail: text += elem.tail
    return text

To get rid of all subelements to a given element, and keep just the text, you can do:

elem.text = flatten(elem); del elem[:]

append

append: Like elem.append, but accepts either an element or a string (which is added to the tail).

def append(elem, item):
    if isinstance(item, basestring):
        if len(elem):
            elem[-1].tail = (elem[-1].tail or "") + item
        else:
            elem.text = (elem.text or "") + item
    else:
        elem.append(item)

walk

walk: A generator that walks a tree in depth-first order. I think this is the same as “getiterator” but the docs are confusing. [the docs say “document order”, which is the order elements are stored in an XML document. same as depth-first, in other words /F]

reverse_walk: Like walk but in the reverse order.

walkaround: Walks around the outside of a tree. Each non-terminal node is visited twice. Each node should have a attribute whose values can be NONE, DONE, FIRST, SECOND, and LEAF.

kill

kill/hoist: Removes a node from a tree. It is replaced by its children.

prettyprint

prettyprint: Prints a tree with each node indented according to its depth. This is done by first indenting the tree (see below), and then serializing it as usual.

indent: Adds whitespace to the tree, so that saving it as usual results in a prettyprinted tree.

# in-place prettyprint formatter

def indent(elem, level=0):
    i = "\n" + level*"  "
    if len(elem):
        if not elem.text or not elem.text.strip():
            elem.text = i + "  "
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
        for elem in elem:
            indent(elem, level+1)
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
    else:
        if level and (not elem.tail or not elem.tail.strip()):
            elem.tail = i

tostringlist, fromstringlist

tostringlist and fromstringlist to serialize to and from lists of string fragments. This can improve performance a lot when you’re not really interested in the entire string:

out.write(tostring(elem))

out.writelines(tostringlist(elem))

class XMLGenerator:
    def __init__(self, elem):
        self.iter = iter(tostringlist(elem))
    def more(self):
        try:
            return self.iter.next()
        except StopIteration:
            return None

Namespace helpers.

class NS:
    def __init__(self, uri):
        self.uri = uri
    def __getattr__(self, tag):
        return self.uri + tag
    def __call__(self, path):
        return "/".join(getattr(self, tag) for tag in path.split("/"))

XHTML = NS("{http://www.w3.org/1999/xhtml}")

for elem in tree.findall(XHTML("ul/li")):