We're back after a server migration that caused effbot.org to fall over a bit harder than expected. Expect some glitches.

XML-RPC and the ASCII limitation

Updated June 30, 2003 | June 29, 2003 | Fredrik Lundh

[home] [zone] [weblog]

The original XML-RPC specification isn’t exactly clear on what characters you can put in an XML-RPC string; it starts out by saying that the body is XML (which supports the full Unicode character space), then talks about ASCII strings, and later clarifies that “any characters are allowed” and that you can use strings to encode binary data.

The last comment is a direct response to a question asked by me, when I was working on the first Python implementation. Since I asked the question, I also remember the context: it’s a direct reference to the “ASCII string” item in the “scalar” list. Did “ASCII” really refer to the 7-bit US-ASCII standard (ISO 646, ANSI something), or was it just a sloppy way for an American to say “text string”? (as non-US programmers know, it’s not the first time that has ever happened ;-)

Dave’s answer was clear: XML-RPC is XML, and you can use any character XML allows you to use.

(Unfortunately, Dave added the comments towards the end, instead of reworking the relevant parts of the specification. The result is that people looking for clues get to the ASCII part, and then tunes out.)

Anyway, Dave’s said the same thing many times since then, both in private conversations, and in various public fora. If you check early archives for the xml-rpc mailing list, you’ll find lots of talk about getting the XML encodings right, but no talk about any ASCII limitation.

But even if you don’t know all this, it’s not that hard to figure it out for yourself. Just make sure you read and digest the entire specification, apply some common sense to sort out the contradictions, and you’ll find that it’s pretty obvious that the intent is that you can use any character allowed by XML.

And as expected, most toolkit implementers have interpreted the specification in exactly this way, either by digesting, or by looking at prior arts or the mailing list archives. Toolkits usually support at least ISO-8859-1 and UTF-8 encodings, or uses plain ASCII with non-ASCII character references.

(Note that even if you use the full XML character space, that’s not good enough to ship arbitrary binary data in text strings (XML doesn’t allow control characters, beyond plain whitespace). To deal with arbitrary binary data, use the base64 type.)

Update: The XML-RPC specification was updated on June 30, 2003, to make things a bit less confusing. It no longer mentions ASCII.

Links

XML-RPC Specification

Unofficial XML-RPC Errata (on effbot.org)