-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
My feature branch to issue #19129 (read_json and orient='table' With Numeric Column) #60945
base: main
Are you sure you want to change the base?
My feature branch to issue #19129 (read_json and orient='table' With Numeric Column) #60945
Conversation
changed the approach slightly from checking only for numeric column names. |
The issue here I don't think is as finely scoped as the comments would suggest. AFAIU there is nothing specific to the table format that takes a stance on whether numeric values can be the keys of an object. However, the base JSON specification requires that object keys are strings: https://www.json.org/json-en.html There is a balance that needs to be struck here between being pedantic and pragmatic; do none of the other formats serialize object keys as strings? |
Hi @WillAyd, I iterated over the other orient values to check the behaviour. It is as you have mentioned, the other formats do serialize the keys as string when saved as a json file. for orient in orients:
try:
# Save JSON
df.to_json('test.json', orient=orient)
# Read JSON
read_df = pd.read_json('test.json', orient=orient)
# Check column types after reading
col_types = [type(col) for col in read_df.columns]
results[orient] = {
"Success": True,
"Column Types": col_types,
"DataFrame": read_df
}
except Exception as e:
results[orient] = {
"Success": False,
"Error": str(e)
} And I can see that this is handled for other orients when the saved json is read back while preserving the int column types. {'records': {'Success': True,
'Column Types': [int, int, int, int],
'DataFrame': 5 6 7 8
0 1 2 3 4},
'index': {'Success': True,
'Column Types': [int, int, int, int],
'DataFrame': 5 6 7 8
0 1 2 3 4},
'columns': {'Success': True,
'Column Types': [int, int, int, int],
'DataFrame': 5 6 7 8
0 1 2 3 4},
'split': {'Success': True,
'Column Types': [int, int, int, int],
'DataFrame': 5 6 7 8
0 1 2 3 4},
'values': {'Success': True,
'Column Types': [int, int, int, int],
'DataFrame': 0 1 2 3
0 1 2 3 4},
'table': {'Success': False,
'Error': "Cannot convert non-finite values (NA or inf) to integer: Error while type casting for column '5'"}}
|
Cool thanks. So if we do anything here I think it makes the most sense to serialize as strings but keep the dtype in the metadata of the table format as integral. |
…o parse schema for numeric column names
96e8b59
to
b0923d4
Compare
I have tried a new approach by working on the parse_table_schema function to handle the numeric columns by initially converting them to string and then restoring their original types after the dataframe is formed. I am now able to reproduce the same behaviour for orient='table' {'records': {'Success': True,
'Column Types': [int, int, int, int],
'DataFrame': 5 6 7 8
0 1 2 3 4},
'index': {'Success': True,
'Column Types': [int, int, int, int],
'DataFrame': 5 6 7 8
0 1 2 3 4},
'columns': {'Success': True,
'Column Types': [int, int, int, int],
'DataFrame': 5 6 7 8
0 1 2 3 4},
'split': {'Success': True,
'Column Types': [int, int, int, int],
'DataFrame': 5 6 7 8
0 1 2 3 4},
'values': {'Success': True,
'Column Types': [int, int, int, int],
'DataFrame': 0 1 2 3
0 1 2 3 4},
'table': {'Success': True,
'Column Types': [int, int, int, int],
'DataFrame': 5 6 7 8
0 1 2 3 4}} |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.New Behavior
The following code will now raise a
ValueError
with the message:Example:
Note : This is my first attempt at contributions. So, I would like to point out that there might be flaws and would appreciate any feedback.