We're back after a server migration that caused effbot.org to fall over a bit harder than expected. Expect some glitches.

ElementTree: Bits and Pieces

Code samples that don’t fit anywhere else (yet).

Getting all text from inside an element

The text attribute contains the text immediately inside an element, but it does not include text inside subelements. To get all text, you can use something like:

def gettext(elem):
    text = elem.text or ""
    for e in elem:
        text += gettext(e)
        if e.tail:
            text += e.tail
    return text

Removing elements

To remove an element from a tree, you have to replace the element with its contents. This includes not only the subelements, but also the text and tail attributes.

The following function takes a tree and a filter function, and removes all subelements for which the filter returns false.

def cleanup(elem, filter):
    out = []
    for e in elem:
        cleanup(e, filter)
        if not filter(e):
            if e.text:
                if out:
                    out[-1].tail += e.text
                else:
                    elem.text += e.text
            out.extend(e)
            if e.tail:
                if out:
                    out[-1].tail += e.tail
                else:
                    elem.text += e.tail
        else:
            out.append(e)
    elem[:] = out

Note that the top element itself isn’t checked; if you need to remove that, you have to do that at the application level.

Instead of writing a filter function, you can iterate over the tree and set the tag to None for the elements you want to remove. When you’ve checked all elements, call the cleanup function as follows:

cleanup(elem, lambda e: e.tag)

In ElementTree 1.3, the serialization code will leave out the tags for elements that have their tag attribute set to None.