Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

My feature branch to issue #19129 (read_json and orient='table' With Numeric Column) #60945

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

chandra-teajunkie
Copy link

@chandra-teajunkie chandra-teajunkie commented Feb 16, 2025

  • closes #19129
  • [Tests added and passed
  • All [code checks passed
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

New Behavior

The following code will now raise a ValueError with the message:

Example:

import pandas as pd

# Create DataFrame with numeric column names
df = pd.DataFrame([[1, 2, 3, 4]], columns=[5, 6, 7, 8])

# Attempt to serialize to JSON with 'table' orient
df.to_json('test.json', orient='table')


ValueError: Column names must be strings for JSON serialization with orient='table'.

Note : This is my first attempt at contributions. So, I would like to point out that there might be flaws and would appreciate any feedback.

@chandra-teajunkie
Copy link
Author

changed the approach slightly from checking only for numeric column names.
The implementation now checks and raises the ValueError if any non string column name is used.

@WillAyd
Copy link
Member

WillAyd commented Feb 26, 2025

The issue here I don't think is as finely scoped as the comments would suggest. AFAIU there is nothing specific to the table format that takes a stance on whether numeric values can be the keys of an object. However, the base JSON specification requires that object keys are strings:

https://www.json.org/json-en.html

There is a balance that needs to be struck here between being pedantic and pragmatic; do none of the other formats serialize object keys as strings?

@WillAyd WillAyd added the IO JSON read_json, to_json, json_normalize label Feb 26, 2025
@chandra-teajunkie
Copy link
Author

Hi @WillAyd,

I iterated over the other orient values to check the behaviour. It is as you have mentioned, the other formats do serialize the keys as string when saved as a json file.

for orient in orients:
    try:
        # Save JSON
        df.to_json('test.json', orient=orient)
        
        # Read JSON
        read_df = pd.read_json('test.json', orient=orient)
        
        # Check column types after reading
        col_types = [type(col) for col in read_df.columns]
        
        results[orient] = {
            "Success": True,
            "Column Types": col_types,
            "DataFrame": read_df
        }
    except Exception as e:
        results[orient] = {
            "Success": False,
            "Error": str(e)
        }

And I can see that this is handled for other orients when the saved json is read back while preserving the int column types.

{'records': {'Success': True,
  'Column Types': [int, int, int, int],
  'DataFrame':    5  6  7  8
  0  1  2  3  4},
 'index': {'Success': True,
  'Column Types': [int, int, int, int],
  'DataFrame':    5  6  7  8
  0  1  2  3  4},
 'columns': {'Success': True,
  'Column Types': [int, int, int, int],
  'DataFrame':    5  6  7  8
  0  1  2  3  4},
 'split': {'Success': True,
  'Column Types': [int, int, int, int],
  'DataFrame':    5  6  7  8
  0  1  2  3  4},
 'values': {'Success': True,
  'Column Types': [int, int, int, int],
  'DataFrame':    0  1  2  3
  0  1  2  3  4},
 'table': {'Success': False,
  'Error': "Cannot convert non-finite values (NA or inf) to integer: Error while type casting for column '5'"}}
       
 

@WillAyd
Copy link
Member

WillAyd commented Feb 27, 2025

Cool thanks. So if we do anything here I think it makes the most sense to serialize as strings but keep the dtype in the metadata of the table format as integral.

@chandra-teajunkie chandra-teajunkie force-pushed the my_feature_branch_to_json_issue branch from 96e8b59 to b0923d4 Compare February 28, 2025 00:12
@chandra-teajunkie
Copy link
Author

I have tried a new approach by working on the parse_table_schema function to handle the numeric columns by initially converting them to string and then restoring their original types after the dataframe is formed.

I am now able to reproduce the same behaviour for orient='table'

{'records': {'Success': True,
  'Column Types': [int, int, int, int],
  'DataFrame':    5  6  7  8
  0  1  2  3  4},
 'index': {'Success': True,
  'Column Types': [int, int, int, int],
  'DataFrame':    5  6  7  8
  0  1  2  3  4},
 'columns': {'Success': True,
  'Column Types': [int, int, int, int],
  'DataFrame':    5  6  7  8
  0  1  2  3  4},
 'split': {'Success': True,
  'Column Types': [int, int, int, int],
  'DataFrame':    5  6  7  8
  0  1  2  3  4},
 'values': {'Success': True,
  'Column Types': [int, int, int, int],
  'DataFrame':    0  1  2  3
  0  1  2  3  4},
 'table': {'Success': True,
  'Column Types': [int, int, int, int],
  'DataFrame':    5  6  7  8
  0  1  2  3  4}}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants