ElementTree: Bits and Pieces
Code samples that don’t fit anywhere else (yet).
Getting all text from inside an element #
The text attribute contains the text immediately inside an element, but it does not include text inside subelements. To get all text, you can use something like:
def gettext(elem): text = elem.text or "" for e in elem: text += gettext(e) if e.tail: text += e.tail return text
Removing elements #
To remove an element from a tree, you have to replace the element with its contents. This includes not only the subelements, but also the text and tail attributes.
The following function takes a tree and a filter function, and removes all subelements for which the filter returns false.
def cleanup(elem, filter): out = [] for e in elem: cleanup(e, filter) if not filter(e): if e.text: if out: out[-1].tail += e.text else: elem.text += e.text out.extend(e) if e.tail: if out: out[-1].tail += e.tail else: elem.text += e.tail else: out.append(e) elem[:] = out
Note that the top element itself isn’t checked; if you need to remove that, you have to do that at the application level.
Instead of writing a filter function, you can iterate over the tree and set the tag to None for the elements you want to remove. When you’ve checked all elements, call the cleanup function as follows:
cleanup(elem, lambda e: e.tag)
In ElementTree 1.3, the serialization code will leave out the tags for elements that have their tag attribute set to None.