ElementTree Overview
“But I have found that sitting under the ElementTree, one can feel the Zen of XML.”
— Essien Ita Essien
Update 2007-09-12: ElementTree 1.3 alpha 3 is now available. For more information, see Introducing ElementTree 1.3.
Update 2007-08-27: ElementTree 1.2.7 preview is now available. This is 1.2.6 plus support for IronPython. The serializer is ~20% faster, and now supports newlines in attribute values.
The Element type is a simple but flexible container object, designed to store hierarchical data structures, such as simplified XML infosets, in memory. The element type can be described as a cross between a Python list and a Python dictionary.
The ElementTree wrapper type adds code to load XML files as trees of Element objects, and save them back again.
The Element type is available as a pure-Python implementation for Python 1.5.2 and later. A C implementation is also available, for use with CPython 2.1 and later. The core components of both libraries are also shipped with Python 2.5 and later.
There’s also an independent implementation, lxml.etree (dead link), based on the well-known libxml2/libxslt libraries. This adds full support for XSLT, XPath, and more.
For more implementations and add-ons, see the Interesting Stuff section below.
Installation #
Binary installers are available for many platforms, including Windows, Mac OS X, and most Linux distributions. Look for packages named “python-elementtree” or similar.
To install from source, simply unpack the distribution archive, change to the distribution directory, and run the setup.py script as follows:
$ python setup.py install
When you’ve done this, you should be able to import the ElementTree module, and other modules from the elementtree package:
$ python >>> from elementtree import ElementTree
It’s common practice to import ElementTree under an alias, both to minimize typing, and to make it easier to switch between different implementations:
$ python >>> import elementtree.ElementTree as ET >>> import cElementTree as ET >>> import lxml.etree as ET >>> import xml.etree.ElementTree as ET # Python 2.5
Note that if you only need the core functionality, you can include the ElementTree.py file in your own project. To get path support, you also need ElementPath.py. All other modules are optional.
Basic Usage #
Each Element instance can have an identifying tag, any number of attributes, any number of child element instances, and an associated object (usually a string). To create elements, you can use the Element or Subelement factories:
import elementtree.ElementTree as ET # build a tree structure root = ET.Element("html") head = ET.SubElement(root, "head") title = ET.SubElement(head, "title") title.text = "Page Title" body = ET.SubElement(root, "body") body.set("bgcolor", "#ffffff") body.text = "Hello, World!" # wrap it in an ElementTree instance, and save as XML tree = ET.ElementTree(root) tree.write("page.xhtml")
The ElementTree wrapper adds code to load XML files as trees of Element objects, and save them back again. You can use the parse function to quickly load an entire XML document into an ElementTree instance:
import elementtree.ElementTree as ET tree = ET.parse("page.xhtml") # the tree root is the toplevel html element print tree.findtext("head/title") # if you need the root element, use getroot root = tree.getroot() # ...manipulate tree... tree.write("out.xml")
For more details, see Elements and Element Trees.
Documentation #
Zone articles:
- Elements and Element Trees (brief tutorial)
- The elementtree.ElementTree Module (reference page)
- Element Tree Infosets
- The ElementTree iterparse Function
- Incremental Parsing Using the Consumer API
- Element Library Functions, ElementTree: Bits and Pieces (useful helpers)
- SimpleXMLWriter
Elsewhere:
Andrew Dalke: IterParseFilter: XPath-like filtering of ElementTree’s iterparse event stream
Andrew Dalke: PyProtocols for output generation
Martijn Faassen: lxml and (c)ElementTree
Andrew Kuchling: Processing XML with ElementTree [slides from a talk]
Danny Yoo: ElementTree mini-tutorial [“Let’s work through a small example with it; that may help to clear some confusion.“]
Joseph Reagle: XML ElementTree Data Model
Uche Ogbuji: Simple XML Processing With elementtree [xml.com]
David Mertz: Process XML in Python with ElementTree: How does the API stack up against similar libraries? [ibm developerworks]
Uche Ogbuji:
Python Paradigms for XML(dead link)Uche Ogbuji: XML Namespaces Support in Python Tools, Part Three [xml.com]
Uche Ogbuji: Practical SAX Notes: ElementTree, Namespaces and Techniques for Large Documents [xml.com]
Interesting stuff built with (or for) ElementTree (selection):
L. C. Rees: webstring (webstring is a web templating engine that allows programs to manipulate XML and HTML documents with standard Python sequence and string operators. It is designed for those whose preferred web template languages are Python and HTML (and XML for people who swing that way).
Chris McDonough: meld3 (an XML templating system for Python 2.3+ which keeps template markup and dynamic rendering logic separate from one another, based on PyMeld)
Peter Hunt: pymeld4 (another ET-based implementation of the PyMeld templating language)
Seo Sanghyeon: pyexpat/ElementTree for IronPython (a pyexpat emulation for IronPython which lets you use the standard ElementTree module on that platform)
Oren Tirosh:
ElementBuilder(dead link) (friendly syntax for constructing ElementTree:s)Staffan Malmgren: lagen.nu (a nicely formatted, hyperlinked, linkable, and taggable version of the entire body of swedish law) (
more information(dead link))Ralf Schlatterbeck: OOoPy (a tool to inspect, create, and modify OpenOffice.org documents in Python)
Martijn Faassen:
lxml(dead link) (ElementTree-compatible bindings for libxml2 and libxslt).Martin Pool, et al: Bazaar-NG (version management system)
Seth Vidal, Konstantin Ryabitsev, et al: Yellow dog Updater, Modified (an automatic updater and package installer/remover for rpm systems)
Michael Droettboom:
pyScore(dead link) (a set of Python-based tools for working with symbolic music notation)Ryan Tomayko: Kid (a template language)
Ken Rimey:
PDIS XPath(dead link) (a more complete XPath implementation)Roland Leuthe:
minixsv(dead link) (a lightweight XML schema validator written in pure Python)Bruno da Silva de Oliveira, Joel de Guzman: Pyste (a Python binding generator for C++)
Works in progress:
- ElementTree: Working with Qualified Names
- Using the ElementTree Module to Generate Google Requests
- A Simple Technorati Client
- Using Element Trees to Parse WSDL Files
- Using Element Trees to Parse XBEL Files
- Using ElementTrees to Generate XML-RPC Messages
- Generating Tkinter User Interfaces from XML
- A Simple XML-Over-HTTP Class
- You Can Never Have Too Many Stock Tickers!