Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for orjson, including direct numpy support #210

Closed
wants to merge 1 commit into from

Conversation

kmsquire
Copy link
Collaborator

This pull request builds on #209, adding support for orjson and modifying numpy support to pass values through to orjson when that module is chosen as the (de)serializer and the appropriate flags are passed.

Still needs tests and probably discussion around how much of orjson to pull in.

@kigawas
Copy link
Contributor

kigawas commented May 11, 2022

Hi I think this can be superseded by #226

* `orjson` is supported as an additional submodule, as opposed to
  replacing the `json` submodule, as it's behavior and output types
  are slightly different than the `json` module.
@kmsquire kmsquire marked this pull request as ready for review May 11, 2022 18:35
@codecov
Copy link

codecov bot commented May 11, 2022

Codecov Report

Merging #210 (7178ef6) into master (4e1f8e1) will decrease coverage by 0.09%.
The diff coverage is 90.38%.

@@            Coverage Diff             @@
##           master     #210      +/-   ##
==========================================
- Coverage   90.57%   90.47%   -0.10%     
==========================================
  Files          11       12       +1     
  Lines        1432     1470      +38     
  Branches      315      320       +5     
==========================================
+ Hits         1297     1330      +33     
- Misses         96       98       +2     
- Partials       39       42       +3     
Impacted Files Coverage Δ
serde/de.py 97.66% <ø> (ø)
serde/numpy.py 72.36% <75.00%> (-0.61%) ⬇️
serde/orjson.py 87.87% <87.87%> (ø)
serde/core.py 91.27% <100.00%> (+0.08%) ⬆️
serde/se.py 96.36% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 508c54d...7178ef6. Read the comment docs.

@kmsquire
Copy link
Collaborator Author

Hi I think this can be superseded by #226

Hi thanks! I didn't see your PR and actually updated this one today. I just pushed the changes.

This takes a slightly different approach than yours. Rather than replacing the default pyserde.json module, it instead creates a separate pyserde.orjson module, which is mostly a drop-in replacement. I chose to do this for a couple of reasons:

  1. Changing the return type of to_json to bytes instead of str is a breaking change, and a lot of code in the wild would have to be updated to support it. It is possible to, e.g., assume utf-8 and do that conversion in pyserde, but that also reduces flexibility in allowing the user to choose that conversion themselves (although I'm not sure how important that is).
  2. Recent versions of orjson support passing dataclasses directly, which is faster than converting to a dictionary and then using orjson, but which doesn't support all of the features that pyserde offers. In particular, it doesn't support pyserde's field methods. I added a manual fallback to use dictionary conversion if users need this functionality.

Either way, your changes are a bit simpler. @yukinarit should take a look at both PRs and decide which is the better direction, or maybe if some of the ideas from each should be combined.

Cheers!

@yukinarit
Copy link
Owner

@kmsquire @kigawas
Thanks! Let me see both of the approaches get back to you! 🎉

@kigawas
Copy link
Contributor

kigawas commented May 12, 2022

Changing to bytes should not introduce significant breakage I think, since json.loads and orjson.loads both support str and bytes:

$ python
Python 3.9.12 (main, Mar 26 2022, 15:51:15)
[Clang 13.1.6 (clang-1316.0.21.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import orjson
>>> import json
>>> json.dumps("😊")
'"\\ud83d\\ude0a"'
>>> json.dumps("😊").encode()
b'"\\ud83d\\ude0a"'
>>> orjson.dumps("😊")
b'"\xf0\x9f\x98\x8a"'
>>> json.dumps("😊").encode() == orjson.dumps("😊")
False
>>> orjson.loads(json.dumps("😊"))
'😊'
>>> json.loads(orjson.dumps("😊"))
'😊'
>>>

Note that orjson's output is more compact. This behavior can be changed:

>>> json.dumps("😊", ensure_ascii=False)
'"😊"'
>>> json.dumps("😊", ensure_ascii=False).encode()
b'"\xf0\x9f\x98\x8a"'

Additionally, mixing bytes and str increases discrepancy rather than decreases, thus I prefer to unify serialized JSON with bytes no matter what JSON library it is.

@yukinarit
Copy link
Owner

Sorry for the delay, I have been busy in the new environment. I finally got time to review

The both of the PRs look great 👍 but kigawas's one is similar to what I was originally thinking, but I have the same concern as @kmsquire said.

Changing return type from bytes to str is a big breaking change for the existing users. I guess most of the Pythonistas would prefer/expect JSON library to produce the output in str not bytes. The implicit conversion from bytes to str sounds unnecessary overhead, but pyserde's design principle is the best simplicity than the best performance as opposed to orison which aims to (de)serialize with the least overhead. Also, If you don't the decode overhead, you can create a wrapper which calls to_dict? 🤔

That being said,

@kigawas Is it ok to request to add decode()?

@kigawas @kmsquire
any feedback is appreciated 🙂

@kmsquire
Copy link
Collaborator Author

Closing. Related functionality merged as part of #226.

@kmsquire kmsquire closed this May 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants