What does 'UnicodeError: ASCII decoding error: ordinal not in range(128)' and 'UnicodeError: ASCII encoding error: ordinal not in range(128)' mean?
These messages usually means that you’re trying to either mix Unicode strings with 8-bit strings, or is trying to write Unicode strings to an output file or device that only handles ASCII.
When you do this, Python will usually assume that the 8-bit string contains ASCII data only, and will raise an error if this is not the case.
The best way to avoid this on input is to convert all incoming strings to Unicode, do the processing in Unicode, and then convert back to encoded byte strings on the way out.
For this to work, you need to know what encoding to use. In most cases, you can get this information from the environment the application runs in. For example, a program that handles email or web input will typically find character set encoding information in Content-Type headers. This can then be used to properly convert input data to Unicode. Assuming the string referred to by value is encoded as UTF-8:
value = unicode(value, "utf-8")
will return a Unicode object. If the data is not correctly encoded as UTF-8, the above call will raise a UnicodeError exception.
On Windows, there is an encoding known as “mbcs”, which uses an encoding specific to your current locale. In many cases, and particularly when working with COM, this may be an appropriate encoding to use.
If you only want strings converted to Unicode which have non-ASCII data, you can try converting them first assuming an ASCII encoding, and then generate Unicode objects if that fails:
try: unicode(value, "ascii") except UnicodeError: value = unicode(value, "utf-8") else: # value was valid ASCII data pass
It’s possible to change the default ASCII encoding in a file called sitecustomize.py that’s part of the Python library. However, this isn’t recommended because changing the Python-wide default encoding may cause third-party extension modules to fail.