XPath Support in ElementTree
Updated Sep 18, 2007 | Fredrik Lundh
ElementTree provides limited support for XPath expressions. The goal is to support a small subset of the abbreviated syntax; a full XPath engine is outside the scope of the core library.
The 1.2 release supports simple element location paths. In its simplest form, a location path is one or more tag names, separated by slashes (/).
You can also use an asterisk (*) instead of a tag name, to match all elements at that level. For example, */subtag returns all subtag grandchildren.
An empty tag (//) is used to search on all levels of the tree, beneath the current level. The empty tag must always be followed by a tag name or an asterisk. For example, .//tag returns all tag elements in the entire tree.
When searching on individual elements, the path must not start with a slash. You can add a leading period (.), if necessary.
The 1.3 release adds basic predicates. You can search on attributes, attribute values, child elements, and on position.
Path element summary:
syntax | meaning |
---|---|
tag | Selects all child elements with the given tag. For example, “spam” selects all child elements named “spam”, “spam/egg” selects all grandchildren named “egg” in all child elements named “spam”. You can use universal names (“{url}local”) as tags. |
* | Selects all child elements. For example, “*/egg” selects all grandchildren named “egg”. |
. | Select the current node. This is mostly useful at the beginning of a path, to indicate that it’s a relative path. |
// | Selects all subelements, on all levels beneath the current element (search the entire subtree). For example, “.//egg” selects all “egg” elements in the entire tree. |
.. | (New in 1.3) Selects the parent element. |
[@attrib] | (New in 1.3) Selects all elements that have the given attribute. For example, “.//a[@href]” selects all “a” elements in the tree that has a “href” attribute. |
[@attrib=’value’] | (New in 1.3) Selects all elements for which the given attribute has the given value. For example, “.//div[@class=’sidebar’]” selects all “div” elements in the tree that has the class “sidebar”. In the current release, the value cannot contain quotes. |
[tag] | (New in 1.3) Selects all elements that has a child element named tag. In the current version, only a single tag can be used (i.e. only immediate children are supported). |
[position] | (New in 1.3) Selects all elements that are located at the given position. The position can be either an integer (1 is the first position), the expression “last()” (for the last position), or a position relative to last() (e.g. “last()-1” for the second to last position). This predicate must be preceeded by a tag name. |
Predicates must be preceeded by a tag name, an asterisk, or another predicate. Note that position predicates must be immediately preceeded with tag names in the current 1.3 release. All other predicates can be stacked.
Note that find and findtext still returns information about the first matching element only (first in document order, that is). To get more than one element, use the findall method.