New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ujson causes UnicodeEncodeError in email mirror #6332
Comments
On Python 2,
So that's why we haven't seen this until recently. Note the decoded object is a |
I think it's worth doing some benchmarking with Python 3.4. With Python 2, using |
I nerdsniped myself on this and sent a PR upstream: ultrajson/ultrajson#284 So we should be able to just use that fix.
Their README has results for both CPython 2.7.6 and CPython 3.4.3. It still shows them far ahead for some benchmarks. |
Hello @zulip/server-notifications members, this issue was labeled with the area: notifications label, so you may want to check it out! |
We've gotten a few exceptions like this recently on zulipchat.com:
The request data in this example looks like
The core of the issue appears to be that
ujson
will decode a JSON document that looks like this, but will then fail if you ask it to encode the result:(The result of
loads
there is a string of three characters, each of which is not a real character but rather a surrogate value. The surrogates were originally carved out for use in UTF-16, to encode Unicode in 16-bit elements, but in Python they're now used for losslessly encoding mostly-UTF-8 bytestrings, like Unix filenames, in Python text strings. This particular sequence of three surrogates doesn't seem to fit either of those uses, so it's a mystery where it came from. If you take the low 8 bits from each of these surrogates as a sequence of bytes, you do get the UTF-8 encoding of U+FFFD REPLACEMENT CHARACTER -- so there are probably multiple layers of wrongness involved here.)Compare
json
from the stdlib, which handles the round-trip with no trouble:The ujson tracker has had this issue open since 2014.
The number of segfaults, crashes, and memory corruption issues in the ujson open issues suggest this may not be a super reliable or vigorously-maintained library. The benchmarks in its README are impressive, but if we can we may be better off using the stdlib's
json
(akasimplejson
).The text was updated successfully, but these errors were encountered: