-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: fix read_json ignoring the dtype with the pyarrow engine #60997
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: William Ayd <william.ayd@icloud.com>
Could any reviewers point me in the right direction with the failing test? Can't seem to figure out what is causing it |
I believe that issue appears in pyarrow < 16. In your test you can check that pyarrow version and filter the warning if that is detected |
Looks like that has worked, thanks! |
pandas/tests/io/json/test_pandas.py
Outdated
@@ -2183,6 +2186,30 @@ def test_read_json_dtype_backend( | |||
# string_storage setting -> ignore that for checking the result | |||
tm.assert_frame_equal(result, expected, check_column_type=False) | |||
|
|||
@td.skip_if_no("pyarrow") | |||
@pytest.mark.filterwarnings("ignore::DeprecationWarning") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add this filterwarnings
mark only if pyarrow < 16?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find a way to do that with pytest filterwarnings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def test_foo(..., request):
if pa_version_under16p0:
request.applymarker(pytest.mark.filterwarnings(...))
pandas/io/json/_json.py
Outdated
df = arrow_table_to_pandas(pa_table, dtype_backend=self.dtype_backend) | ||
|
||
if isinstance(self.dtype, dict): | ||
df = df.astype(self.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although more complicated I think the original design of passing the data types as parsing options is better than reading in full and casting after the fact
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought there wasn't a way to cast the types with pyarrow.read_json
without ignoring the other columns but I missed one of the parse options that lets you infer the other types, I have updated that now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm ex @mroeschke comments
pandas/tests/io/json/test_pandas.py
Outdated
version_tuple = tuple(map(int, pa.__version__.split("."))) | ||
|
||
if version_tuple[0] < 16: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use pandas.util.version import Version
to just do Version(pa.__version__) < Version("16.0")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
Added code to check if dtypes have been passed in, if they are, read the JSON with the schema built from those dtypes
Split read JSON function up for the respective engines
closes BUG: read_json silently ignores the dtype when engine=pyarrow #59516 (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.