Add support for orjson, including direct numpy support #210

kmsquire · 2022-03-24T05:44:46Z

This pull request builds on #209, adding support for orjson and modifying numpy support to pass values through to orjson when that module is chosen as the (de)serializer and the appropriate flags are passed.

Still needs tests and probably discussion around how much of orjson to pull in.

kigawas · 2022-05-11T10:10:42Z

Hi I think this can be superseded by #226

* `orjson` is supported as an additional submodule, as opposed to replacing the `json` submodule, as it's behavior and output types are slightly different than the `json` module.

codecov · 2022-05-11T18:45:02Z

Codecov Report

Merging #210 (7178ef6) into master (4e1f8e1) will decrease coverage by 0.09%.
The diff coverage is 90.38%.

@@            Coverage Diff             @@
##           master     #210      +/-   ##
==========================================
- Coverage   90.57%   90.47%   -0.10%     
==========================================
  Files          11       12       +1     
  Lines        1432     1470      +38     
  Branches      315      320       +5     
==========================================
+ Hits         1297     1330      +33     
- Misses         96       98       +2     
- Partials       39       42       +3

Impacted Files	Coverage Δ
serde/de.py	`97.66% <ø> (ø)`
serde/numpy.py	`72.36% <75.00%> (-0.61%)`	⬇️
serde/orjson.py	`87.87% <87.87%> (ø)`
serde/core.py	`91.27% <100.00%> (+0.08%)`	⬆️
serde/se.py	`96.36% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 508c54d...7178ef6. Read the comment docs.

kmsquire · 2022-05-11T18:46:48Z

Hi I think this can be superseded by #226

Hi thanks! I didn't see your PR and actually updated this one today. I just pushed the changes.

This takes a slightly different approach than yours. Rather than replacing the default pyserde.json module, it instead creates a separate pyserde.orjson module, which is mostly a drop-in replacement. I chose to do this for a couple of reasons:

Changing the return type of to_json to bytes instead of str is a breaking change, and a lot of code in the wild would have to be updated to support it. It is possible to, e.g., assume utf-8 and do that conversion in pyserde, but that also reduces flexibility in allowing the user to choose that conversion themselves (although I'm not sure how important that is).
Recent versions of orjson support passing dataclasses directly, which is faster than converting to a dictionary and then using orjson, but which doesn't support all of the features that pyserde offers. In particular, it doesn't support pyserde's field methods. I added a manual fallback to use dictionary conversion if users need this functionality.

Either way, your changes are a bit simpler. @yukinarit should take a look at both PRs and decide which is the better direction, or maybe if some of the ideas from each should be combined.

Cheers!

yukinarit · 2022-05-11T23:44:04Z

@kmsquire @kigawas
Thanks! Let me see both of the approaches get back to you! 🎉

kigawas · 2022-05-12T05:17:24Z

Changing to bytes should not introduce significant breakage I think, since json.loads and orjson.loads both support str and bytes:

$ python
Python 3.9.12 (main, Mar 26 2022, 15:51:15)
[Clang 13.1.6 (clang-1316.0.21.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import orjson
>>> import json
>>> json.dumps("😊")
'"\\ud83d\\ude0a"'
>>> json.dumps("😊").encode()
b'"\\ud83d\\ude0a"'
>>> orjson.dumps("😊")
b'"\xf0\x9f\x98\x8a"'
>>> json.dumps("😊").encode() == orjson.dumps("😊")
False
>>> orjson.loads(json.dumps("😊"))
'😊'
>>> json.loads(orjson.dumps("😊"))
'😊'
>>>

Note that orjson's output is more compact. This behavior can be changed:

>>> json.dumps("😊", ensure_ascii=False)
'"😊"'
>>> json.dumps("😊", ensure_ascii=False).encode()
b'"\xf0\x9f\x98\x8a"'

Additionally, mixing bytes and str increases discrepancy rather than decreases, thus I prefer to unify serialized JSON with bytes no matter what JSON library it is.

yukinarit · 2022-05-15T13:49:23Z

Sorry for the delay, I have been busy in the new environment. I finally got time to review

The both of the PRs look great 👍 but kigawas's one is similar to what I was originally thinking, but I have the same concern as @kmsquire said.

Changing return type from bytes to str is a big breaking change for the existing users. I guess most of the Pythonistas would prefer/expect JSON library to produce the output in str not bytes. The implicit conversion from bytes to str sounds unnecessary overhead, but pyserde's design principle is the best simplicity than the best performance as opposed to orison which aims to (de)serialize with the least overhead. Also, If you don't the decode overhead, you can create a wrapper which calls to_dict? 🤔

That being said,

@kigawas Is it ok to request to add decode()?

@kigawas @kmsquire
any feedback is appreciated 🙂

kmsquire · 2022-05-23T18:40:36Z

Closing. Related functionality merged as part of #226.

kmsquire force-pushed the feature/orjson branch from 32d4cf7 to 1e0cd28 Compare March 27, 2022 15:59

feat: Add support for orjson

7178ef6

* `orjson` is supported as an additional submodule, as opposed to replacing the `json` submodule, as it's behavior and output types are slightly different than the `json` module.

kmsquire force-pushed the feature/orjson branch from 1e0cd28 to 7178ef6 Compare May 11, 2022 18:32

kmsquire marked this pull request as ready for review May 11, 2022 18:35

kmsquire closed this May 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for orjson, including direct numpy support #210

Add support for orjson, including direct numpy support #210

kmsquire commented Mar 24, 2022

kigawas commented May 11, 2022

codecov bot commented May 11, 2022

kmsquire commented May 11, 2022

yukinarit commented May 11, 2022

kigawas commented May 12, 2022 •

edited

yukinarit commented May 15, 2022

kmsquire commented May 23, 2022

Add support for orjson, including direct numpy support #210

Add support for orjson, including direct numpy support #210

Conversation

kmsquire commented Mar 24, 2022

kigawas commented May 11, 2022

codecov bot commented May 11, 2022

Codecov Report

kmsquire commented May 11, 2022

yukinarit commented May 11, 2022

kigawas commented May 12, 2022 • edited

yukinarit commented May 15, 2022

kmsquire commented May 23, 2022

kigawas commented May 12, 2022 •

edited