Element Library Functions
March 22, 2004 | Fredrik Lundh
ElementTree 1.3 may include a new module, ElementLib, with a number of convenient helper functions.
The exact contents are yet to be determined; here are some of the current proposals (from various sources, in no specific order):
Helpers to add subelements, with a nicer syntax.
Wrappers to access elements via attributes (templating).
copy
copy/deepcopy: Copy element structures. [the copy module is supposed to work /F]
flatten
flatten: Recursively extract text content.
def flatten(elem, include_tail=0): text = elem.text or "" for e in elem: text += flatten(e, 1) if include_tail and elem.tail: text += elem.tail return text
To get rid of all subelements to a given element, and keep just the text, you can do:
elem.text = flatten(elem); del elem[:]
append
append: Like elem.append, but accepts either an element or a string (which is added to the tail).
def append(elem, item): if isinstance(item, basestring): if len(elem): elem[-1].tail = (elem[-1].tail or "") + item else: elem.text = (elem.text or "") + item else: elem.append(item)
walk
walk: A generator that walks a tree in depth-first order. I think this is the same as “getiterator” but the docs are confusing. [the docs say “document order”, which is the order elements are stored in an XML document. same as depth-first, in other words /F]
reverse_walk: Like walk but in the reverse order.
walkaround: Walks around the outside of a tree. Each non-terminal node is visited twice. Each node should have a attribute whose values can be NONE, DONE, FIRST, SECOND, and LEAF.
kill
kill/hoist: Removes a node from a tree. It is replaced by its children.
prettyprint
prettyprint: Prints a tree with each node indented according to its depth. This is done by first indenting the tree (see below), and then serializing it as usual.
indent: Adds whitespace to the tree, so that saving it as usual results in a prettyprinted tree.
# in-place prettyprint formatter def indent(elem, level=0): i = "\n" + level*" " if len(elem): if not elem.text or not elem.text.strip(): elem.text = i + " " if not elem.tail or not elem.tail.strip(): elem.tail = i for elem in elem: indent(elem, level+1) if not elem.tail or not elem.tail.strip(): elem.tail = i else: if level and (not elem.tail or not elem.tail.strip()): elem.tail = i
tostringlist, fromstringlist
tostringlist and fromstringlist to serialize to and from lists of string fragments. This can improve performance a lot when you’re not really interested in the entire string:
out.write(tostring(elem)) out.writelines(tostringlist(elem)) class XMLGenerator: def __init__(self, elem): self.iter = iter(tostringlist(elem)) def more(self): try: return self.iter.next() except StopIteration: return None
Namespace helpers.
class NS: def __init__(self, uri): self.uri = uri def __getattr__(self, tag): return self.uri + tag def __call__(self, path): return "/".join(getattr(self, tag) for tag in path.split("/")) XHTML = NS("{http://www.w3.org/1999/xhtml}") for elem in tree.findall(XHTML("ul/li")):