Unofficial XML-RPC Errata
Updated June 15, 2004 | June 30, 2003 | Fredrik Lundh
This is an unofficial errata, intended to clarify certain details in the XML-RPC specification, as well as hint at “best practices” to use when designing your own XML-RPC implementations. This errata is mostly based on real-life experiences from early adopters and toolkit implementors (filtered through the brain of one such early adopter/implementor). Typos are not intentional.
The XML-RPC specification contains a number of contradictions when talking about encodings. It starts out by saying that XML-RPC is XML, then mentions ASCII, and finally says that “all characters can be used in a string”, and that strings can be used for binary data.
If you know your XML, it should be obvious that all four cannot be true at the same time; XML supports more than just ASCII, but you cannot store arbitrary binary data in XML character data.
The easiest (and most practical) way to resolve this is simply to ignore the “ASCII” term; the author has repeatedly stated that the use of “ASCII” was just sloppy way to say “text string”, not a reference to a formal specification.
(Also see XML-RPC and the ASCII limitation.)
In other words, you’re free to use the full Unicode character set in an XML-RPC string, as long as you make sure to set the XML encoding correctly. (Remember that the default is UTF-8, not ISO-8859-1). You can also stick to ASCII in the XML feed, and use character references for non-ASCII characters.
Update: The XML-RPC specification was updated on June 30, 2003, to make the situation a bit clearer. It no longer mentions ASCII.
Do not use whitespace in scalar values (unless it’s part of the actual value, of course). The exception is the base64 scalar element, where whitespace doesn’t matter.
All XML characters can be used in a string. There is no “ASCII” limitation.
Some implementations may not allow non-ASCII characters (based on a strict reading of the original specification, but not so strict that they noticed the comments about binary data being allowed ;-). That’s perfectly okay, as long as you don’t need to work with international characters.
Note that XML doesn’t support control characters in strings, and that line endings are normalized. To transfer binary data, use the base64 type (see below).
The int/i4 integer range is -2147483648 to 2147483647.
Many implementations support C-style floating point literals, in addition to the simple [sign] [digits] [dot] [digits] format mentioned in the specification. Don’t rely on this, if you can avoid it.
Many implementations (or rather, applications) accept zero and non-zero int values instead of boolean 0 (false) and 1 (true). Don’t rely on this, if you can avoid it.
The time value is “naive time”, and does not include a timezone. You can either use a fixed timezone in your application (such as UTC), or ship the timezone offset as a separate value.
Binary data must be encoded in base64 elements. You cannot store arbitrary unencoded binary data in an XML-RPC string.
It’s okay to split base64-data over multiple lines. It’s not okay to remove trailing “=” characters, if any.
Most implementations cannot handle struct elements where the same member occurs more than once (or in other words, structs are usually mapped to dictionaries/hashes).
The members of a struct are not ordered.
There are no restrictions on the member names; anything you can put in a string can be used as a name, including whitespace. If your implementation supports Unicode, you can use Unicode in member names too.
The specification only talks about HTTP, but many implementations also support secure transfer over HTTPS.
Many implementations also provide separate access to marshalling and unmarshalling functions, which allows you to use XML-RPC encoding for other purposes (e.g to store configuration data).
Best Practices: Generating XML-RPC Requests and Responses
For maximum interoperability, use only basic XML components: elements, character data, predefined entities, numerical character references. CDATA sections can be used, but may not be supported by all implementations.
For maximum interoperability, use US-ASCII or a US-ASCII compatible character encoding (e.g. ISO-8859-1, UTF-8, etc). Some implementations may not support less common US-ASCII compatible encodings; when in doubt, use US-ASCII and use character references where necessary.
The <?XML> header is optional, but must be included if you’re using an encoding other than UTF-8 (or US-ASCII, which is a subset of UTF-8).
Do not assume that the other end will understand or use internal or external DTDs.
Best Practices: Parsing XML-RPC Requests and Responses
For maximum interoperability, make sure you support elements (including empty elements), character data, character entities (predefined entities, decimal character references, and hexadecimal character references), and CDATA sections.
For maximum interoperability, make sure you check the encoding attribute of the <?XML> header, if present. If the encoding attribute is not present, you must treat the request/response as UTF-8.
(Note that strictly speaking, the HTTP layer may use a charset tag to override the encoding mentioned in the XML header, but that’s not very common in practice. To be on the safe side, you may wish to reject requests where the HTTP-level charset, if given, differs from the document encoding).
Best Practices: Common Extensions
To be added: interface introspection, standard error codes, object models.
Best Practices: API Design
To be added (?): login tokens instead of user/password strings, extensible interfaces (simulating keyword arguments).
To be added: secure communication issues. XML-RPC system.ciphercall RFC (robert thomson)
To be added (?): encoding/escaping, valid integer range, struct members, pointers to supporting standards (XML, W3C dates, ISO 8601, Base64, etc).
Olav Junker Kjær: FAQ about unicode support in XML-RPC
Dave, Ken, the early xml-rpc mailing list (indirectly and directly). And many others (to be added).