should not serialize bytes #264

glasserc · 2017-05-25T16:54:23Z

We recently hit a bug (Kinto/kinto#1224) where one code path round-tripped a bcrypt-hashed password (a bytes object) through ujson, and another code path didn't. The one that went through ujson converted everything to str, whereas the other one left it as bytes.

It's my opinion that bytes should not be a serializable type. There is no equivalent to bytes in JSON, but ujson encodes bytes as a JSON string, which is for code points, not bytes. This means that there are bytes values which are not representable in JSON. ujson tries its best, decoding bytes values as UTF8 and failing if that isn't possible:

>>> import ujson
>>> ujson.dumps({"hi": b'\x30'})
'{"hi":"0"}'
>>> ujson.dumps({"hi": b'\xff'})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 7: invalid start byte

This behavior made sense in the days of Python 2, where str objects were often used to encode text (see #74), but I think that if it's going to come out as strings, it shouldn't be allowed in as bytes.

The built-in json module refuses to encode bytes, either as a value or as an Object key:

>>> import json
>>> json.dumps({"hi": b'\x30'})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.5/json/__init__.py", line 230, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib64/python3.5/json/encoder.py", line 198, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib64/python3.5/json/encoder.py", line 256, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib64/python3.5/json/encoder.py", line 179, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: b'0' is not JSON serializable
>>> json.dumps({b"hi": b'\x30'})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.5/json/__init__.py", line 230, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib64/python3.5/json/encoder.py", line 198, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib64/python3.5/json/encoder.py", line 256, in iterencode
    return _iterencode(o, 0)
TypeError: keys must be a string

The text was updated successfully, but these errors were encountered:

raise TypeError when encountering bytes in ujson.dumps() to prevent unexpected Unicode exceptions in production. Fixes ultrajson#264

elelay · 2017-06-11T10:04:16Z

PR #266 leaves the default behaviour unchanged but adds an option to raise on bytes.
This way developer can choose what behaviour they want.
This shouldn't affect performance (only adds a pointer dereference to check if reject_bytes is active when bytes are encountered).
The UNLIKELY macro could be added if desired.

glasserc mentioned this issue May 25, 2017

specify record contents accepted by storage Kinto/kinto#1238

Closed

elelay added a commit to elelay/ultrajson that referenced this issue Jun 11, 2017

new reject_bytes option to raise on bytes

ad280fd

raise TypeError when encountering bytes in ujson.dumps() to prevent unexpected Unicode exceptions in production. Fixes ultrajson#264

elelay mentioned this issue Jun 11, 2017

new reject_bytes option to raise on bytes #266

Merged

hugovk closed this as completed in #266 May 8, 2020

maudetes mentioned this issue Apr 11, 2024

Fix failing captchetat responses datagouv/udata-front#392

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

should not serialize bytes #264

should not serialize bytes #264

glasserc commented May 25, 2017 •

edited by hugovk

elelay commented Jun 11, 2017

should not serialize bytes #264

should not serialize bytes #264

Comments

glasserc commented May 25, 2017 • edited by hugovk

elelay commented Jun 11, 2017

glasserc commented May 25, 2017 •

edited by hugovk