We're back after a server migration that caused effbot.org to fall over a bit harder than expected. Expect some glitches.

When importing module X, why do I get "undefined symbol: PyUnicodeUCS2*"?

You are using a version of Python that uses a 4-byte representation for Unicode characters, but some C extension module you are importing was compiled using a Python that uses a 2-byte representation for Unicode characters (the default).

If instead the name of the undefined symbol starts with PyUnicodeUCS4, the problem is the reverse: Python was built using 2-byte Unicode characters, and the extension module was compiled using a Python with 4-byte Unicode characters.

This can easily occur when using pre-built extension packages. RedHat Linux 7.x, in particular, provided a “python2” binary that is compiled with 4-byte Unicode. This only causes the link failure if the extension uses PyUnicode_FromUnicode or any other of the PyUnicode functions. It is also a problem if an extension uses any of the Unicode-related format specifiers for Py_BuildValue (or similar) or parameter specifications for PyArg_ParseTuple.

You can check the size of the Unicode character a Python interpreter is using by checking the value of sys.maxunicode:

>>> import sys
>>> if sys.maxunicode > 65535:
...     print 'UCS4 build'
... else:
...     print 'UCS2 build'

The only way to solve this problem is to use extension modules compiled with a Python binary built using the same size for Unicode characters.

CATEGORY: extending