<html><head><title>Generating SeeAlso Indexes for the Python Library Reference</title><meta content="2007-08-10" name="published" /></head><body>
<p class="info">December 10, 2005 | Fredrik Lundh</p><p>Here&#8217;s a simple script that uses <a href="http://docs.python.org/modindex.html">the global module index</a>
to generate <a href="http://effbot.org/zone/idea-seealso.htm">a seealso index</a> for the
Python library reference.</p><p>Note that it queries python.org for all reference pages, so you
shouldn&#8217;t run it more often than needed.  If you just want a sample
file to play with, you can use <a href="http://effbot.org/zone/seealso-python-library.xml">
this copy</a>.</p><p>Required components (elementtree and elementtidy) can be found at
the <a href="http://effbot.org/downloads">effbot.org downloads site</a>.

<pre class="example python">
<span class="pykeyword">import</span> os, re, urllib, urlparse

<span class="pykeyword">from</span> elementtidy.TidyHTMLTreeBuilder <span class="pykeyword">import</span> parse
<span class="pykeyword">from</span> elementtree.ElementTree <span class="pykeyword">import</span> dump

URI = <span class="pystring">"http://docs.python.org/modindex.html"</span>

<span class="pycomment">#</span>
<span class="pycomment"># helpers</span>

<span class="pykeyword">class</span> <span class="pyclass">NS</span>:
    <span class="pykeyword">def</span> <span class="pyfunction">__init__</span>(self, uri):
        self.uri = uri
    <span class="pykeyword">def</span> <span class="pyfunction">__getattr__</span>(self, name):
        <span class="pykeyword">return</span> self.uri + name

XHTML = NS(<span class="pystring">"{http://www.w3.org/1999/xhtml}"</span>)

<span class="pykeyword">def</span> <span class="pyfunction">innertext</span>(elem):
    text = elem.text <span class="pykeyword">or</span> <span class="pystring">""</span>
    <span class="pykeyword">for</span> elem <span class="pykeyword">in</span> elem.getiterator():
        <span class="pykeyword">if</span> elem.text: text += elem.text
        <span class="pykeyword">if</span> elem.tail: text += elem.tail
    <span class="pykeyword">return</span> text

<span class="pykeyword">def</span> <span class="pyfunction">gettitle</span>(href, default):
    s = urllib.urlopen(href).read(1024)
    m = re.search(<span class="pystring">"&lt;title&gt;[\d\s.]+(.*)&lt;/title&gt;"</span>, s)
    <span class="pykeyword">if</span> m:
        <span class="pykeyword">return</span> m.group(1)
    <span class="pykeyword">return</span> default

<span class="pycomment">#</span>
<span class="pycomment"># get going</span>

tree = parse(urllib.urlopen(URI))

<span class="pycomment"># the index is in the second table on the page</span>
tables = tree.findall(<span class="pystring">".//"</span> + XHTML.table)

<span class="pykeyword">print</span> <span class="pystring">"&lt;seealso target-domain='python'&gt;"</span>

<span class="pykeyword">print</span> <span class="pystring">" &lt;info xmlns:dc='http://purl.org/dc/elements/1.1/'&gt;"</span>
<span class="pykeyword">print</span> <span class="pystring">"  &lt;dc:title&gt;Python Library Reference&lt;/dc:title&gt;"</span>
<span class="pykeyword">print</span> <span class="pystring">"  &lt;dc:identifier&gt;http://docs.python.org/index.html&lt;/dc:identifier&gt;"</span>
<span class="pykeyword">print</span> <span class="pystring">"  &lt;dc:language&gt;en&lt;/dc:language&gt;"</span>
<span class="pykeyword">print</span> <span class="pystring">" &lt;/info&gt;"</span>

<span class="pykeyword">for</span> a <span class="pykeyword">in</span> tables[1].findall(<span class="pystring">".//"</span> + XHTML.a):
    href = urlparse.urljoin(URI, a.get(<span class="pystring">"href"</span>))
    text = innertext(a)
    target = text.split()[0]
    <span class="pykeyword">print</span> <span class="pystring">" &lt;item href='%s'&gt;"</span> % href
    <span class="pykeyword">print</span> <span class="pystring">"  &lt;title&gt;%s&lt;/title&gt;"</span> % gettitle(href, text)
    <span class="pykeyword">print</span> <span class="pystring">"  &lt;target&gt;%s&lt;/target&gt;"</span> % target
    <span class="pykeyword">print</span> <span class="pystring">" &lt;/item&gt;"</span>

<span class="pykeyword">print</span> <span class="pystring">"&lt;/seealso&gt;"</span></pre></p></body></html>