Skip to content
This repository was archived by the owner on Oct 18, 2022. It is now read-only.

Doesn't handle strings with non-UTF-8 contents #17

Closed
jonas-schievink opened this issue Feb 19, 2017 · 6 comments
Closed

Doesn't handle strings with non-UTF-8 contents #17

jonas-schievink opened this issue Feb 19, 2017 · 6 comments
Labels

Comments

@jonas-schievink
Copy link
Owner

jonas-schievink commented Feb 19, 2017

According to the spec, <string> can contain binary data as long as the XML chars are escaped properly.

Should also get an accessor for the common case where there is utf-8 data.

@jonas-schievink
Copy link
Owner Author

Probably blocked by netvl/xml-rs#10

@jonas-schievink
Copy link
Owner Author

quick-xml seems to support non-utf-8 data, so switching is another option

@jonas-schievink
Copy link
Owner Author

I've read the XML spec and came to the conclusion that arbitrary binary data is never allowed in an XML document, the whole document must always be valid Unicode encoded in the encoding specified in the XML declaration. Do any XML(RPC) libs support arbitrary binary data?

In any case, the lack of support for non-utf-8 encodings should still be fixed at some point.

@jonas-schievink
Copy link
Owner Author

I might consider reverting f4a7959 in light of this ^

jonas-schievink added a commit that referenced this issue Feb 19, 2018
This reverts commit f4a7959.

Refer to #17 (comment)

Binary data cannot be placed into an XML document without further
processing/encoding. In the XMLRPC case, base64 is the obvious encoding,
so I consider this impl to be pretty intuitive.
@xmo-odoo
Copy link

xmo-odoo commented Jul 7, 2018

Yeah the comment that "A string can be used to encode binary data." in the spec makes relatively little sense, I'm guessing it's for non-utf8 documents e.g. if the XML document is encoded in iso-8859-1 (theoretically possible) then any byte content is valid and thus the payload could be interpreted directly as binary data, something along those lines. Considering this is from questions "as XML-RPC was being implemented in Python" back in 1999 before Python even had a proper unicode type (and before UTF-8 got really popular) it makes some amount of sense. Ideally somebody would find the actual original discussion as it would shed light on the issue, but in the face of ambiguity I'd just ignore this case.

Fundamentally this is an argument for possibly adding support for non-UTF8 input and output, but that would be a different issue / feature request, and that's assuming this is ever used.

Python's stdlib seems to optionally allow non-UTF8 encodings when dumping/serving (I assume the client sniffs the encoding in normal XML fashion thoguh I did not check), so does Java's Apache XMLRPC, I can't find a way to specify an encoding in the Ruby stdlib client, it does look like phpxmlrpc defaults to iso-8859-1 but that thing looks… odd. So I don't know that you should care.

Either way, I think you can close this issue.

@jonas-schievink
Copy link
Owner Author

Agreed, I've opened #43 to track non-UTF-8 encoding support.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants