Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: fix read_json ignoring the dtype with the pyarrow engine #60997

Open
wants to merge 32 commits into
base: main
Choose a base branch
from

Conversation

will-larkin
Copy link

@will-larkin will-larkin commented Feb 24, 2025

@WillAyd WillAyd added IO JSON read_json, to_json, json_normalize Arrow pyarrow functionality labels Feb 26, 2025
@will-larkin
Copy link
Author

Could any reviewers point me in the right direction with the failing test? Can't seem to figure out what is causing it

@will-larkin will-larkin requested a review from WillAyd March 5, 2025 18:28
@WillAyd
Copy link
Member

WillAyd commented Mar 5, 2025

I believe that issue appears in pyarrow < 16. In your test you can check that pyarrow version and filter the warning if that is detected

@will-larkin
Copy link
Author

I believe that issue appears in pyarrow < 16. In your test you can check that pyarrow version and filter the warning if that is detected

Looks like that has worked, thanks!

@@ -2183,6 +2186,30 @@ def test_read_json_dtype_backend(
# string_storage setting -> ignore that for checking the result
tm.assert_frame_equal(result, expected, check_column_type=False)

@td.skip_if_no("pyarrow")
@pytest.mark.filterwarnings("ignore::DeprecationWarning")
Copy link
Member

@mroeschke mroeschke Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add this filterwarnings mark only if pyarrow < 16?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find a way to do that with pytest filterwarnings

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def test_foo(..., request):
     if pa_version_under16p0:
         request.applymarker(pytest.mark.filterwarnings(...))

df = arrow_table_to_pandas(pa_table, dtype_backend=self.dtype_backend)

if isinstance(self.dtype, dict):
df = df.astype(self.dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although more complicated I think the original design of passing the data types as parsing options is better than reading in full and casting after the fact

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought there wasn't a way to cast the types with pyarrow.read_json without ignoring the other columns but I missed one of the parse options that lets you infer the other types, I have updated that now

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm ex @mroeschke comments

Comment on lines 2193 to 2195
version_tuple = tuple(map(int, pa.__version__.split(".")))

if version_tuple[0] < 16:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use pandas.util.version import Version to just do Version(pa.__version__) < Version("16.0")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: read_json silently ignores the dtype when engine=pyarrow
3 participants