New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support parsing NaN, Infinity and -Infinity #514
Conversation
Codecov Report
@@ Coverage Diff @@
## main #514 +/- ##
==========================================
+ Coverage 90.03% 90.41% +0.37%
==========================================
Files 6 6
Lines 1686 1752 +66
==========================================
+ Hits 1518 1584 +66
Misses 168 168
Continue to review full report at Codecov.
|
Note, I had to disable the black pre-commit check because it is failing due to an issue in the click library. Once they issue a patch (which I hope will be soon), I'll revert that. |
I merged the Black fix in #515, so if you sync with |
Can we chuck in a couple of failure tests too. i.e. Stuff like: with pytest.raises(ujson.JSONDecodeError, match="Unexpected .*'Infinity'"):
ujson.dumps("Infin") Just so that we know that exceptions do get raised instead of segfaults or garbled output |
for more information, see https://pre-commit.ci
@hugovk Rebased on main, |
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
for more information, see https://pre-commit.ci
Thanks! |
Fixes #146
This ports the changes from pandas (https://github.com/pandas-dev/pandas/pull/30295/files
) mentioned by @WillAyd in #146 back to ujson.
Thus far I've only done a nearly verbatim port of the diff. There seems to be an issue. Currently:
is printing
Also
is returning
So there are some details missing from the direct patch. I'll try and check for them, but my C is much worse than my Python, so any insights would be helpful.
EDIT
I've fixed the above issue, although I'm not sure why the ordering of the function pointers in ultrajson.h
__JSONObjectDecoder
was important. I assume it's a C thing, if anyone wants to tell me why that fixed worked, I'd like to know.While working on this I discovered a (hidden) bug in the pandas port. When they parse NaN in C, they actually return None instead of NaN. The pandas part after that seems to interpret None as Nan, which is why their test passes:
But the parser itself is actually wrong
And that can be demonstrated by adding a dict to the list, which prevents the pandas engine from realizing that None should have been a nan. This probably needs to be a pandas bug report (see here).
But I think I can fix it here. I'm getting a handle on how this is working.