Elements and Element Trees
Fredrik Lundh | Last updated July 2007
This note introduces the Element, SubElement and ElementTree types available in the effbot.org elementtree library.
For an overview, with links to articles and more documentation, see the ElementTree Overview page.
For an API reference, see The elementtree.ElementTree Module.
You can download the library from the effbot.org downloads page.
In this article:
- The Element Type
- Attributes
- Text Content
- Searching for Subelements
- Reading and Writing XML Files
- XML Namespaces
The Element Type #
The Element type is a flexible container object, designed to store hierarchical data structures in memory. The type can be described as a cross between a list and a dictionary.
Each element has a number of properties associated with it:
- a tag. This is a string identifying what kind of data this element represents (the element type, in other words).
- a number of attributes, stored in a Python dictionary.
- a text string to hold text content, and a tail string to hold trailing text
- a number of child elements, stored in a Python sequence
All elements must have a tag, but all other properties are optional. All strings can either be Unicode strings, or 8-bit strings containing US-ASCII only.
To create an element, call the Element constructor, and pass the tag string as the first argument:
from elementtree.ElementTree import Element root = Element("root")
You can access the tag string via the tag attribute:
print root.tag
To build a tree, create more elements, and append them to the parent element:
root = Element("root") root.append(Element("one")) root.append(Element("two")) root.append(Element("three"))
Since this is a very common operation, the library provides a helper function called SubElement that creates a new element and adds it to its parent, in one step:
from elementtree.ElementTree import Element, SubElement root = Element("root") SubElement(root, "one") SubElement(root, "two") SubElement(root, "three")
To access the subelements, you can use ordinary list (sequence) operations. This includes len(element) to get the number of subelements, element[i] to fetch the i’th subelement, and using the for-in statement to loop over the subelements:
for node in root: print node
The element type also supports slicing (including slice assignment), and the standard append, insert and remove methods:
nodes = node[1:5] node.append(subnode) node.insert(0, subnode) node.remove(subnode)
Note that remove takes an element, not a tag. To find the element to remove, you can either loop over the parent, or use one of the find methods described below.
Truth Testing #
In ElementTree 1.2 and earlier, the sequence behaviour means that an element without any subelements tests as false (since it’s an empty sequence), even if it contains text or attributes. To check the return value from a function or method that may return None instead of a node, you must use an explicit test.
def fetchnode(): ... node = fetchnode() if not node: # careful! print "node not found, or node has no subnodes" if node is None: print "node not found"
Note: This behaviour is likely to change somewhat in ElementTree 1.3. To write code that is compatible in both directions, use “element is None” to test for a missing element, and “len(element)” to test for non-empty elements.
Accessing Parents #
The element structure has no parent pointers. If you need to keep track of child/parent relations, you can structure your program to work on the parents rather than the children:
for parent in tree.getiterator(): for child in parent: ... work on parent/child tuple
The getiterator function is explained in further detail below.
If you do this a lot, you can wrap the iterator code in a generator function:
def iterparent(tree): for parent in tree.getiterator(): for child in parent: yield parent, child for parent, child in iterparent(tree): ... work on parent/child tuple
Another approach is to use a separate data structure to map from child elements to their parents. In Python 2.4 and later, the following one-liner creates a child/parent map for an entire tree:
parent_map = dict((c, p) for p in tree.getiterator() for c in p)
Attributes #
In addition to the tag and the list of subelements, each element can have one or more attributes. Each element attribute consists of a string key, and a corresponding value. As for ordinary Python dictionaries, all keys must be unique.
Element attributes are in fact stored in a standard Python dictionary, which can be accessed via the attrib attribute. To set attributes, you can simply assign to attrib members:
from elementtree.ElementTree import Element elem = Element("tag") elem.attrib["first"] = "1" elem.attrib["second"] = "2"
When creating a new element, you can pass in element attributes using keyword arguments. The previous example is better written as:
from elementtree.ElementTree import Element elem = Element("tag", first="1", second="2")
The Element type provides shortcuts for attrib.get, attrib.keys, and attrib.items. There’s also a set method, to set the value of an element attribute:
from elementtree.ElementTree import Element elem = Element("tag", first="1", second="2") # print 'first' attribute print elem.attrib.get("first") # same, using shortcut print elem.get("first") # print list of keys (using shortcuts) print elem.keys() print elem.items() # the 'third' attribute doesn't exist print elem.get("third") print elem.get("third", "default") # add the attribute and try again elem.set("third", "3") print elem.get("third", "default") 1 1 ['first', 'second'] [('first', '1'), ('second', '2')] None default 3
Note that while the attrib value is required to be a real mutable Python dictionary, an ElementTree implementation may choose to use another internal representation, and create the dictionary only if someone asks for it. To take advantage of such implementations, stick to the shortcut methods whenever possible.
Text Content #
The element type also provides a text attribute, which can be used to hold additional data associated with the element. As the name implies, this attribute is usually used to hold a text string, but it can be used for other, application-specific purposes.
from elementtree.ElementTree import Element elem = Element("tag") elem.text = "this element also contains text"
If there is no additional data, this attribute is set to an empty string, or None.
The element type actually provides two attributes that can be used in this way; in addition to text, there’s a similar attribute called tail. It too can contain a text string, an application-specific object, or None. The tail attribute is used to store trailing text nodes when reading mixed-content XML files; text that follows directly after an element are stored in the tail attribute for that element:
<tag><elem>this goes into elem's text attribute</elem>this goes into elem's tail attribute</tag>
See the Mixed Content section for more information.
Note that some implementations may only support string objects as text or tail values.
Example
# elementtree-example-1.py from elementtree.ElementTree import Element, SubElement, dump window = Element("window") title = SubElement(window, "title", font="large") title.text = "A sample text window" text = SubElement(window, "text", wrap="word") box = SubElement(window, "buttonbox") SubElement(box, "button").text = "OK" SubElement(box, "button").text = "Cancel" dump(window)
$ python elementtree-example-1.py <window><title font="large">A sample text window</title><text wrap= "word" /><buttonbox><button>OK</button><button>Cancel</button></but tonbox></window>
Searching for Subelements #
The Element type provides a number of methods that can be used to search for subelements:
find(pattern) returns the first subelement that matches the given pattern, or None if there is no matching element.
findtext(pattern) returns the value of the text attribute for the first subelement that matches the given pattern. If there is no matching element, this method returns None.
findall(pattern) returns a list (or another iterable object) of all subelements that match the given pattern.
In ElementTree 1.2 and later, the pattern argument can either be a tag name, or a path expression. If a tag name is given, only direct subelements are checked. Path expressions can be used to search the entire subtree.
ElementTree 1.1 and earlier only supports plain tag names.
In addition, the getiterator method can be used to loop over the tree in depth-first order:
getiterator(tag) returns a list (or another iterable object) which contains all subelements that has the given tag, on all levels in the subtree. The elements are returned in document order (that is, in the same order as they would appear if you saved the tree as an XML file).
getiterator() (without argument) returns a list (or another iterable object) of all subelement in the subtree.
getchildren() returns a list (or another iterable object) of all direct child elements. This method is deprecated; new code should use indexing or slicing to access the children, or list(elem) to get a list.
Reading and Writing XML Files #
The Element type can be used to represent XML files in memory. The ElementTree wrapper class is used to read and write XML files.
To load an XML file into an Element structure, use the parse function:
from elementtree.ElementTree import parse tree = parse(filename) elem = tree.getroot()
You can also pass in a file handle (or any object with a read method):
from elementtree.ElementTree import parse file = open(filename, "r") tree = parse(file) elem = tree.getroot()
The parse method returns an ElementTree object. To get the topmost element object, use the getroot method.
In recent versions of the ElementTree module, you can also use the file keyword argument to create a tree, and fill it with contents from a file in one operation:
from elementtree.ElementTree import ElementTree tree = ElementTree(file=filename) elem = tree.getroot()
To save an element tree back to disk, use the write method on the ElementTree class. Like the parse function, it takes either a filename or a file object (or any object with a write method):
from elementtree.ElementTree import ElementTree tree = ElementTree(file=infile) tree.write(outfile)
If you want to write an Element object hierarchy to disk, wrap it in an ElementTree instance:
from elementtree.ElementTree import Element, SubElement, ElementTree html = Element("html") body = SubElement(html, "body") ElementTree(html).write(outfile)
Note that the standard element writer creates a compact output. There is no built-in support for pretty printing or user-defined namespace prefixes in the current version, so the output may not always be suitable for human consumption (to the extent XML is suitable for human consumption, that is).
One way to produce nicer output is to add whitespace to the tree before saving it; see the indent function on the Element Library Functions page for an example.
To convert between XML and strings, you can use the XML, fromstring, and tostring helpers:
from elementtree.ElementTree import XML, fromstring, tostring elem = XML(text) elem = fromstring(text) # same as XML(text) text = tostring(elem)
XML Namespaces #
The elementtree module supports qualified names (QNames) for element tags and attribute names. A qualified name consists of a (uri, local name) pair.
Qualified names were introduced with the XML Namespace specification.
The element type represents a qualified name pair, also called universal name, as a string of the form “{uri}local“. This syntax can be used both for tag names and for attribute keys.
The following example creates an element where the tag is the qualified name pair (http://spam.effbot.org, egg).
from elementtree.ElementTree import Element elem = Element("{http://spam.effbot.org}egg"}
If you save this to an XML file, the writer will automatically generate proper XML namespace declarations, and pick a suitable prefix. When you load an XML file, the parser converts qualified tag and attribute names to the element syntax.
Note that the standard parser discards namespace prefixes and declarations, so if you need to access the prefixes later on (e.g. to handle qualified names in attribute values or character data), you must use an alternate parser. For more information on this topic, see the articles The ElementTree iterparse Function and Using the ElementTree Module to Generate SOAP Messages, Part 3: Dealing with Qualified Names.