Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New defaults for concat, merge, combine_* #10062

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

jsignell
Copy link
Contributor

@jsignell jsignell commented Feb 19, 2025

Replaces #10051

  • Towards Stricter defaults for concat, combine, open_mfdataset, merge #8778
    • after a sufficiently long deprecation cycle there should be another PR that:
      • removes the option (but doesn't throw if people are still setting it)
      • removes all the FutureWarnings
    • after even more time one last PR that:
      • encodes the new defaults right into the function signature
      • removes the extra warnings that throw on failure
  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst

This PR attempts to throw warnings if and only if the output would change with the new kwarg defaults. To exercise the new default I am toggling the option back and forth and running the existing test suite.

With new kwarg values

  • Run all the tests with use_new_combine_kwarg_defaults=True -> 78 failed
  • Change to use_new_combine_kwarg_defaults=False and run the last failed -> 76 failed, 2 passed

Those 2 are missed alarms. In these two cases the behavior will be different when xarray switched to using new default kwarg values and there is no warning. But I am not totally sure whether we need to handle them because they are tests for conditions that used to raise an error and with the new defaults they do not.

FAILED xarray/tests/test_concat.py::TestConcatDataset::test_concat_do_not_promote - Failed: DID NOT RAISE <class 'ValueError'>
FAILED xarray/tests/test_merge.py::TestMergeFunction::test_merge_error - Failed: DID NOT RAISE <class 'xarray.core.merge.MergeError'>

With old kwarg values

  • Run all the tests with use_new_combine_kwarg_defaults=False -> 119 failed
  • Change to use_new_combine_kwarg_defaults=True and run the last failed -> 76 failed, 43 passed

Those 44 are potentially false alarms - they could also be tests that just don't assert that the result exactly matches some fixed expectation - but mostly they are false alarms.

Here is a list of them
FAILED xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_open_mfdataset_does_same_as_concat[outer-minimal-nested-t] - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_open_mfdataset_does_same_as_concat[outer-minimal-by_coords-None] - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_open_mfdataset_does_same_as_concat[inner-minimal-nested-t] - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_open_mfdataset_does_same_as_concat[inner-minimal-by_coords-None] - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_open_mfdataset_does_same_as_concat[left-minimal-nested-t] - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_open_mfdataset_does_same_as_concat[left-minimal-by_coords-None] - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_open_mfdataset_does_same_as_concat[right-minimal-nested-t] - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_open_mfdataset_does_same_as_concat[right-minimal-by_coords-None] - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_open_mfdataset_dataset_combine_attrs[drop] - FutureWarning: In a future version of xarray the default value for data_vars will change from data_vars='all' to data_vars='minimal'. This is likely to lead to different results when multiple datasets hav...
FAILED xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_open_mfdataset_dataset_combine_attrs[override] - FutureWarning: In a future version of xarray the default value for data_vars will change from data_vars='all' to data_vars='minimal'. This is likely to lead to different results when multiple datasets hav...
FAILED xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_open_mfdataset_dataset_combine_attrs[no_conflicts] - FutureWarning: In a future version of xarray the default value for data_vars will change from data_vars='all' to data_vars='minimal'. This is likely to lead to different results when multiple datasets hav...
FAILED xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_open_mfdataset_dataset_combine_attrs[identical] - FutureWarning: In a future version of xarray the default value for data_vars will change from data_vars='all' to data_vars='minimal'. This is likely to lead to different results when multiple datasets hav...
FAILED xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_open_mfdataset_dataset_combine_attrs[drop_conflicts] - FutureWarning: In a future version of xarray the default value for data_vars will change from data_vars='all' to data_vars='minimal'. This is likely to lead to different results when multiple datasets hav...
FAILED xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_open_mfdataset_dataset_attr_by_coords - FutureWarning: In a future version of xarray the default value for data_vars will change from data_vars='all' to data_vars='minimal'. This is likely to lead to different results when multiple datasets hav...
FAILED xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_open_mfdataset_dataarray_attr_by_coords - FutureWarning: In a future version of xarray the default value for data_vars will change from data_vars='all' to data_vars='minimal'. This is likely to lead to different results when multiple datasets hav...
FAILED xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_common_coord_when_datavars_minimal - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_backends.py::TestOpenMFDatasetWithDataVarsAndCoordsKw::test_invalid_data_vars_value_should_fail - FutureWarning: In a future version of xarray the default value for data_vars will change from data_vars='all' to data_vars='minimal'. This is likely to lead to different results when multiple datasets hav...
FAILED xarray/tests/test_backends.py::TestDask::test_open_mfdataset_concat_dim_none - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_backends.py::TestDask::test_open_mfdataset_concat_dim_default_none - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_backends.py::test_h5netcdf_storage_options - FutureWarning: In a future version of xarray the default value for data_vars will change from data_vars='all' to data_vars='minimal'. This is likely to lead to different results when multiple datasets hav...
FAILED xarray/tests/test_combine.py::TestNestedCombine::test_auto_combine_2d - FutureWarning: In a future version of xarray the default value for data_vars will change from data_vars='all' to data_vars='minimal'. This is likely to lead to different results when multiple datasets hav...
FAILED xarray/tests/test_combine.py::TestNestedCombine::test_auto_combine_2d_combine_attrs_kwarg - FutureWarning: In a future version of xarray the default value for data_vars will change from data_vars='all' to data_vars='minimal'. This is likely to lead to different results when multiple datasets hav...
FAILED xarray/tests/test_combine.py::TestCombineDatasetsbyCoords::test_infer_order_from_coords - FutureWarning: In a future version of xarray the default value for data_vars will change from data_vars='all' to data_vars='minimal'. This is likely to lead to different results when multiple datasets hav...
FAILED xarray/tests/test_combine.py::TestCombineDatasetsbyCoords::test_combine_by_coords_no_concat - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_concat.py::TestConcatDataset::test_concat_coords_kwarg[dim1-minimal] - FutureWarning: In a future version of xarray the default value for data_vars will change from data_vars='all' to data_vars='minimal'. This is likely to lead to different results when multiple datasets hav...
FAILED xarray/tests/test_concat.py::TestConcatDataset::test_concat_coords_kwarg[dim1-all] - FutureWarning: In a future version of xarray the default value for data_vars will change from data_vars='all' to data_vars='minimal'. This is likely to lead to different results when multiple datasets hav...
FAILED xarray/tests/test_concat.py::TestConcatDataset::test_concat_errors - FutureWarning: In a future version of xarray the default value for data_vars will change from data_vars='all' to data_vars='minimal'. This is likely to lead to different results when multiple datasets hav...
FAILED xarray/tests/test_concat.py::TestConcatDataArray::test_concat_avoids_index_auto_creation - FutureWarning: In a future version of xarray the default value for coords will change from coords='different' to coords='minimal'. This is likely to lead to different results when multiple datasets have m...
FAILED xarray/tests/test_dask.py::test_map_blocks_da_ds_with_template[obj1] - FutureWarning: In a future version of xarray the default value for data_vars will change from data_vars='all' to data_vars='minimal'. This is likely to lead to different results when multiple datasets hav...
FAILED xarray/tests/test_dask.py::test_map_blocks_errors_bad_template[obj1] - FutureWarning: In a future version of xarray the default value for data_vars will change from data_vars='all' to data_vars='minimal'. This is likely to lead to different results when multiple datasets hav...
FAILED xarray/tests/test_dataset.py::TestDataset::test_dataset_math_auto_align - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_merge.py::TestMergeFunction::test_merge_datasets - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_merge.py::TestMergeFunction::test_merge_arrays_attrs_variables[no_conflicts-attrs10-attrs20-expected_attrs0-False] - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_merge.py::TestMergeFunction::test_merge_arrays_attrs_variables[no_conflicts-attrs11-attrs21-expected_attrs1-False] - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_merge.py::TestMergeFunction::test_merge_arrays_attrs_variables[no_conflicts-attrs12-attrs22-expected_attrs2-False] - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_merge.py::TestMergeFunction::test_merge_arrays_attrs_variables[no_conflicts-attrs13-attrs23-expected_attrs3-True] - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_merge.py::TestMergeFunction::test_merge_arrays_attrs_variables[drop-attrs14-attrs24-expected_attrs4-False] - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_merge.py::TestMergeFunction::test_merge_arrays_attrs_variables[identical-attrs15-attrs25-expected_attrs5-False] - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_merge.py::TestMergeFunction::test_merge_arrays_attrs_variables[identical-attrs16-attrs26-expected_attrs6-True] - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_merge.py::TestMergeFunction::test_merge_arrays_attrs_variables[override-attrs17-attrs27-expected_attrs7-False] - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_merge.py::TestMergeFunction::test_merge_arrays_attrs_variables[drop_conflicts-attrs18-attrs28-expected_attrs8-False] - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_merge.py::TestMergeFunction::test_merge_arrays_attrs_variables[<lambda>-attrs19-attrs29-expected_attrs9-False] - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...
FAILED xarray/tests/test_merge.py::TestMergeFunction::test_merge_no_conflicts_preserve_attrs - FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlappin...

About half of them are triggered by my attempt to catch cases where different datasets have matching overlapping variables and with compat='no_conflicts' you might get different results than with compat='override'. There might be a better way, but we want to be careful to avoid calling compute.

Updating tests

After investigating the impact on existing tests, I updated the tests to make sure they still test what they were meant to test by passing in any required kwargs and I added new tests that explicitly ensure that for a bunch of different inputs, using old defaults throws a warning OR output is the same with new and old defaults.

Notes

  • I used sentinel values to indicate a kwarg value that the user has not explicitly set. The benefit of using this approach is that it makes it more obvious that people should not be setting these kwargs to None in their own code. Also with this approach it is possible to encode both the old and the new default value in the function signature.
  • When users have opted in to the new default values there is logic that tries to catch any errors that might be related to the new default kwargs and adds that context to the error message. In case people don't pay attention to the warnings this will at least give them some indication that things have changed and explain new errors that crop up. This only applies when there is already an error, so they don't catch places where the code succeeds but the result is different than before.

coords="different",
compat="equals",
join="outer",
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hard-coded these to the old defaults since there is no way for the user to set them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this approach. These options result in confusing groupby behaviour (#2145) but we can tackle that later

@jsignell jsignell force-pushed the concat_default_kwargs branch from 0e65034 to 5461a9f Compare February 24, 2025 20:07
@jsignell
Copy link
Contributor Author

The last test file that I need to work on is test_concat.py

Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on!

I'd like to avoid adding the extra option. Can you remind us of what that would be useful please?

coords="different",
compat="equals",
join="outer",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this approach. These options result in confusing groupby behaviour (#2145) but we can tackle that later

assert_identical(ds2, actual)

actual = ds2.merge(ds1)
actual = ds2.merge(ds1, compat="override")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok this is dangerous, the dimensionality of 'x' in the output depends on the order of merging.

I intended compat="override" to be used when "x" is the same in all datasets, not merely 'compatible' as is here. It seems like we would want to at minimum, assert that ndim is same for all x in all datasets.

Opened #10094


assert_identical(old, new)

def test_merge(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice test.

compat,
positions,
dim=dim,
data_vars="all",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is unsettable - I hardcoded it to the old default.

@jsignell
Copy link
Contributor Author

jsignell commented Mar 4, 2025

I'd like to avoid adding the extra option. Can you remind us of what that would be useful please?

The option is part of the deprecation process. Basically how it works is in this PR the default values for these kwargs do not change. BUT you get FutureWarnings if the new default values would lead to a different result or an error. The use_new_combine_kwarg_defaults (this absolutely does not have to be the final name) allows you to opt into the new defaults prematurely. Setting that option to True lets you encounter any issues that might arise and it turns off the FutureWarnings.

This is how I'm thinking about the plan after this PR lands:

  1. after a sufficiently long deprecation cycle there should be another PR that:
    • removes the option (but (probably) doesn't throw if people are still setting it)
    • removes all the FutureWarnings
  2. after even more time one last PR that:
    • encodes the new defaults right into the function signature
    • removes the extra context about new defaults that is included in some error messages

I guess you could get rid of the option the tradeoff is that to get rid of the FutureWarnings people may have to hardcode to the new defaults. But there aren't too many false alarms (I need to update the PR description UPDATED) so most of the time when there is a FutureWarning it does indicate that hardcoding the kwarg value is a good idea.

@jsignell jsignell marked this pull request as ready for review March 4, 2025 21:01
@jsignell
Copy link
Contributor Author

jsignell commented Mar 4, 2025

This is ready for review! The failing test looks like it is also failing on main.

@jsignell
Copy link
Contributor Author

jsignell commented Mar 7, 2025

Wellll maybe the unit tests will pass now. I'll fix mypy and the doctests next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants