You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We recently hit a bug (Kinto/kinto#1224) where one code path round-tripped a bcrypt-hashed password (a bytes object) through ujson, and another code path didn't. The one that went through ujson converted everything to str, whereas the other one left it as bytes.
It's my opinion that bytes should not be a serializable type. There is no equivalent to bytes in JSON, but ujson encodes bytes as a JSON string, which is for code points, not bytes. This means that there are bytes values which are not representable in JSON. ujson tries its best, decoding bytes values as UTF8 and failing if that isn't possible:
>>> import ujson
>>> ujson.dumps({"hi": b'\x30'})
'{"hi":"0"}'
>>> ujson.dumps({"hi": b'\xff'})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 7: invalid start byte
This behavior made sense in the days of Python 2, where str objects were often used to encode text (see #74), but I think that if it's going to come out as strings, it shouldn't be allowed in as bytes.
The built-in json module refuses to encode bytes, either as a value or as an Object key:
>>> import json
>>> json.dumps({"hi": b'\x30'})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python3.5/json/__init__.py", line 230, in dumps
return _default_encoder.encode(obj)
File "/usr/lib64/python3.5/json/encoder.py", line 198, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib64/python3.5/json/encoder.py", line 256, in iterencode
return _iterencode(o, 0)
File "/usr/lib64/python3.5/json/encoder.py", line 179, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: b'0' is not JSON serializable
>>> json.dumps({b"hi": b'\x30'})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python3.5/json/__init__.py", line 230, in dumps
return _default_encoder.encode(obj)
File "/usr/lib64/python3.5/json/encoder.py", line 198, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib64/python3.5/json/encoder.py", line 256, in iterencode
return _iterencode(o, 0)
TypeError: keys must be a string
The text was updated successfully, but these errors were encountered:
PR #266 leaves the default behaviour unchanged but adds an option to raise on bytes.
This way developer can choose what behaviour they want.
This shouldn't affect performance (only adds a pointer dereference to check if reject_bytes is active when bytes are encountered).
The UNLIKELY macro could be added if desired.
We recently hit a bug (Kinto/kinto#1224) where one code path round-tripped a bcrypt-hashed password (a
bytes
object) through ujson, and another code path didn't. The one that went through ujson converted everything tostr
, whereas the other one left it asbytes
.It's my opinion that
bytes
should not be a serializable type. There is no equivalent tobytes
in JSON, but ujson encodesbytes
as a JSON string, which is for code points, not bytes. This means that there arebytes
values which are not representable in JSON. ujson tries its best, decoding bytes values as UTF8 and failing if that isn't possible:This behavior made sense in the days of Python 2, where
str
objects were often used to encode text (see #74), but I think that if it's going to come out as strings, it shouldn't be allowed in as bytes.The built-in
json
module refuses to encodebytes
, either as a value or as an Object key:The text was updated successfully, but these errors were encountered: