The ElementRXP module
Fredrik Lundh | February 2005 | Originally posted to online.effbot.org
Here’s a simple module that uses the PyRXP parser to build an element tree:
# File: ElementRXP.py try: from cElementTree import Element except ImportError: from elementtree.ElementTree import Element try: from pyRXPU import Parser except ImportError: # fall back on ASCII-only parser from pyRXP import Parser def fixelement((tag, attrib, children, spare)): elem = this = Element(tag, attrib) for child in children: if isinstance(child, tuple): this = fixelement(child) elem.append(this) else: # add text fragments to the right place if this is elem: this.text = child else: this.tail = child return elem def parse(file): if not hasattr(file, "read"): file = open(file) p = Parser(ExpandEmpty=1) return fixelement(p.parse(file.read()))
This is a faster than the Python version of ElementTree, but a lot slower than plain cElementTree. However, the PyRXP(U) library supports DTD validation, which can come in handy in some applications.