The ElementTree Module (Work In Progress)
Fredrik Lundh | December 2007
Classes #
Element and SubElement #
The core Element type, which represents an XML element with associated attribute- and character data. The SubElement factory creates an Element, and adds it to a parent element.
See The Element API.
ElementTree #
A wrapper class that represents an XML document. This is mainly used to read and write XML documents from file.
See The ElementTree API.
XMLParser #
A low-level, event-generating XML parser.
See The XMLParser API.
TreeBuilder #
A tree builder, which translates an XMLParser-style event stream to an Element tree.
See The TreeBuilder API.
Functions #
The ElementTree module contains the following functions:
iselement #
iselement(obj) ⇒ bool
Checks if an object appears to be a valid element object.
The current ElementTree implementation checks if the object is of a known element type, or if it has a tag attribute. Other implementations may use stricter tests.
parse #
parse(source) ⇒ an ElementTree object
Parses XML data from a file or file-like object, and returns an ElementTree object.
tree = ET.parse("sample.xml") tree = ET.parse(urllib.urlopen("http://site/file.xml"))
To parse XML data from a string, use the XML helper instead.
parse(source, parser) ⇒ an ElementTree object
Same, but uses the given parser instance. You can use this to plug in an alternate parser, or to override the document encoding when using the standard parser:
parser = ET.XMLParser(encoding="iso-8859-1")
tree = ET.parse(source, parser)Exception Handling
In 1.3 and later, this method raises a ParseError exception (a subclass of SyntaxError) if the source data is malformed.
In earlier versions, the exception used is implementation dependent; cElementTree 1.0 uses a SyntaxError exception, other versions usually propagate the exception raised by the internal parser implementation (e.g. pyexpat.error for pyexpat-based parsers).
To emulate 1.3 in earlier versions, you can use something like:
try: ParseError = ET.ParseError except AttributeError: try: XML("<foo>") except: from sys import exc_type as ParseError # (!)
iterparse #
iterparse(source, options) ⇒ a generated (event, element) stream
Parses data from a file or file-like object, and generates a sequence of (event, element)-tuples.
The following options can be given as keyword arguments:
events= A list of events to include in the event stream. If omitted, only “end” events are reported. Note that the parser will use the string objects you pass in as events, so you can use is for comparision in the event handler.
parser= A parser instance. If omitted, the standard XMLParser is used.
Supported Events
By default, the parser returns control to the caller when it sees an end tag (which means that the corresponding element has been fully populated, except for the tail attribute), but you can use an option to tell it to return more events. The following events are available:
end: Indicates that an element is complete, including attributes, text content, and subelements, but not including the tail attribute.
start: Indicates that an element has just been created. The attributes are properly set up, but the text content and the subelements are not available yet.
start-ns: Indicates that a new namespace scope has been opened. In this case, the second tuple item is a (prefix, uri) tuple.
end-ns: Indicates that the most recent namespace scope has been closed. In this case, the second tuple item is None.
Notes
You can modify the tree during parsing, for example to remove subtrees that you have already processed.
for event, elem in ET.iterparse(file): if elem.tag == "record": # process record elem.clear() # won't need this again
The parsing process is asynchronous; the parser reads data from the source in blocks, and processes all XML tags and data sections in that block before it returns them to the caller.
This means that the tree will often be more complete than the events indicate; for example, the entire element may be processed when the “start” event arrives, or one or more sibling elements can be present you see an “end” event. For small files, the entire tree might have been built before you see the first event. Make sure your code doesn’t rely on this.
XML #
XML(data) ⇒ element
Parse XML data from a string buffer, and returns the root element.
Note that the string must contain encoded data. If you want to parse data from a Unicode string, you need to encode it first.
XML(data, parser) ⇒ element
Same, but allows you to pass in a custom parser.
XMLID #
XML(data) ⇒ element, dictionary
XML(data, parser) ⇒ element, dictionary
Same as XML, but returns both the root element and a dictionary mapping “id” attributes to elements.
tostring #
tostring(elem) ⇒ string
Generates a XML representation of the given element, including all subelements. The output is encoded as US-ASCII, and any non-ASCII character is encoded as character references.
tostring(elem, encoding) ⇒ string
Same, but encodes the output in the given encoding.
tostringlist #
tostringlist(elem) ⇒ list of string fragments
tostringlist(elem, encoding) ⇒ list of string fragments
(New in 1.3) Same as tostring, but returns a list of string fragments instead of a single string. This can sometimes be more efficient, since ET doesn’t have to allocate a single large string.
fromstring #
fromstring(data) ⇒ element
fromstringlist(data, parser) ⇒ element
Same as XML.
fromstringlist #
fromstringlist(list) ⇒ element
fromstringlist(list, parser) ⇒ element
(New in 1.3) Same as XML and fromstring, but takes a list of string fragments instead of a single string.
