-
Notifications
You must be signed in to change notification settings - Fork 20
Doesn't handle strings with non-UTF-8 contents #17
Comments
Probably blocked by netvl/xml-rs#10 |
quick-xml seems to support non-utf-8 data, so switching is another option |
I've read the XML spec and came to the conclusion that arbitrary binary data is never allowed in an XML document, the whole document must always be valid Unicode encoded in the encoding specified in the XML declaration. Do any XML(RPC) libs support arbitrary binary data? In any case, the lack of support for non-utf-8 encodings should still be fixed at some point. |
I might consider reverting f4a7959 in light of this ^ |
This reverts commit f4a7959. Refer to #17 (comment) Binary data cannot be placed into an XML document without further processing/encoding. In the XMLRPC case, base64 is the obvious encoding, so I consider this impl to be pretty intuitive.
Yeah the comment that "A string can be used to encode binary data." in the spec makes relatively little sense, I'm guessing it's for non-utf8 documents e.g. if the XML document is encoded in iso-8859-1 (theoretically possible) then any byte content is valid and thus the payload could be interpreted directly as binary data, something along those lines. Considering this is from questions "as XML-RPC was being implemented in Python" back in 1999 before Python even had a proper unicode type (and before UTF-8 got really popular) it makes some amount of sense. Ideally somebody would find the actual original discussion as it would shed light on the issue, but in the face of ambiguity I'd just ignore this case. Fundamentally this is an argument for possibly adding support for non-UTF8 input and output, but that would be a different issue / feature request, and that's assuming this is ever used. Python's stdlib seems to optionally allow non-UTF8 encodings when dumping/serving (I assume the client sniffs the encoding in normal XML fashion thoguh I did not check), so does Java's Apache XMLRPC, I can't find a way to specify an encoding in the Ruby stdlib client, it does look like phpxmlrpc defaults to iso-8859-1 but that thing looks… odd. So I don't know that you should care. Either way, I think you can close this issue. |
Agreed, I've opened #43 to track non-UTF-8 encoding support. |
According to the spec,
<string>
can contain binary data as long as the XML chars are escaped properly.Should also get an accessor for the common case where there is utf-8 data.
The text was updated successfully, but these errors were encountered: