Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimally invasive workflow for xarray transition #275

Draft
wants to merge 11 commits into
base: RC_v1.5.x
Choose a base branch
from

Conversation

fsenf
Copy link
Member

@fsenf fsenf commented May 7, 2023

this is a draft PR to document our collaborative activities in the xarray to iris transition. The idea for a step-by-step transition to xarray is laid out in #143 . This is also a test for full collaborative development, in contrast to the earlier strategy with a single feature prepared by one developer. Let's see ....

I suggest:

  • that the people that work on a certain module indicate what they currently do and also plan to do in a comment to this draft PR, maybe making a checkbox list of already migrated functions within that module

  • that everybody runs testing locally with the standard tests and the notebooks from the example directory using

    tobac> python  -m pytest
    
    tobac> jupyter nbconvert --to notebook --inplace --execute examples/Example_*/*.ipynb

@fsenf fsenf added the xarray transition Part of the transition to xarray support label May 7, 2023
@fsenf fsenf added this to the Version 1.6 milestone May 7, 2023
@codecov
Copy link

codecov bot commented May 7, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +1.49 🎉

Comparison is base (a0fd1ff) 47.47% compared to head (7e9b328) 48.96%.

Additional details and impacted files
@@              Coverage Diff              @@
##           RC_v1.5.0     #275      +/-   ##
=============================================
+ Coverage      47.47%   48.96%   +1.49%     
=============================================
  Files             14       14              
  Lines           2711     2851     +140     
=============================================
+ Hits            1287     1396     +109     
- Misses          1424     1455      +31     
Flag Coverage Δ
unittests 48.96% <100.00%> (+1.49%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
tobac/feature_detection.py 83.16% <100.00%> (+0.16%) ⬆️
tobac/testing.py 96.51% <100.00%> (+0.57%) ⬆️

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@fsenf
Copy link
Member Author

fsenf commented May 7, 2023

I started work on feature_detection.py.

  • feature_detection_multithreshold
  • feature_detection_multithreshold_timestep
  • feature_detection_threshold
  • feature_position
  • filter_min_distance
  • remove_parents
  • test_overlap

@fsenf
Copy link
Member Author

fsenf commented May 7, 2023

notes: def feature_detection_threshold is using numpy arrays as input type -> this is inconsistent with our interface strategy -> however, I keep it for the moment.

@fsenf
Copy link
Member Author

fsenf commented May 7, 2023

OK, tests are in place for function feature_detection_multithreshold.

However, they fail because xarray input is not convert into the correct iris.Cube format. .coords is empty:

        if "time" not in [coord.name() for coord in field_in.coords()]:
>           raise ValueError(
                "input to feature detection step must include a dimension named 'time'"
            )
E           ValueError: input to feature detection step must include a dimension named 'time'

Two options:

  1. correct the respective wrapper in utils
  2. proceed and remove iris internals in feature_detection_multithreshold such that .coords() is not used anymore

I favor option 2.

@freemansw1 , @w-k-jones , @JuliaKukulies : What do you think?

@w-k-jones
Copy link
Member

w-k-jones commented May 7, 2023

I think this is failing because testing.make_dataset_from_arr doesn't actually set any coordinate names. I'm not even sure why it works in the first place, given that it initially creates an xarray.DataArray and then converts it to iris, but we should probably explicitly set coordinate names.

Overall, I think we need to create a robust xarray/iris conversion method before we convert any of the internals so that we can ensure that output/behaviour is identical before and after the switch. I've starting making a list of elements we need to add on top of the existing xarray conversion methods, which includes:

  1. Coordinate/dim/variable name uniformity. I think that xarray.to_iris uses the standard_name attribute of variables when converting to iris, which can be different to their name in xarray. This isn't symmetric in the other direction either!
  2. Handling times, as it's a mess in iris. Proleptic gregorian calendar nonsense etc.

@JuliaKukulies
Copy link
Member

JuliaKukulies commented May 8, 2023

We are not setting coordinate names in testing.making_dataset_from_arrin the current version, but we are also not testing feature_detection_multithreshold.

In RC_v1.5.0, we are actually setting the coordinate name for the time dimension, but only if the datatype is set to iris (see here):

if has_time:
    out_arr_iris.add_dim_coord(
        iris.coords.DimCoord(
            pd.date_range(start=time_min, periods=time_num)
            .values.astype("datetime64[s]")
            .astype(int),
            standard_name="time",
            units="seconds since epoch",
        ),
        time_dim_num,
)

The tests in the current version do not fail because we have only tested with iris cubes. So agree with @w-k-jones , that we should set the coordinate names when the test dataset output is set to be a xarray.DataArray.

Also agree that it makes sense to start working on more robust conversion routines, and working with the decorators as suggested by @fsenf might also help to identify more of the oddities that are currently covered. Or were you suggesting to fix the conversion routines entirely before continuing with this strategy, @w-k-jones ?

@fsenf
Copy link
Member Author

fsenf commented May 8, 2023

@JuliaKukulies : maybe I misunderstood, but we actually have two testing function for feature_detection_multithreshold starting in line

def test_feature_detection_multiple_z_coords(
            )


def test_feature_detection_setting_multiple():
        )

I extended the first test function to also test for xarray and iris as part of the transition to xarray. I will look at the test data soon to find a solution.

@JuliaKukulies
Copy link
Member

JuliaKukulies commented May 9, 2023

@fsenf, sorry I figured my comment was a bit unclear. I was just saying that these test functions have only been introduced with the 3D changes. Hence, they exist in the RC_v1.5.0 branch, but not in current main. That is not relevant for the changes you made, but was just a note as to why testing.making_dataset_from_arr is not yet equipped with setting the coordinate names for both iris and xarray.

The code snippet I posted was from RC_v1.5.0 and shows why the tests you mention currently work: for iris test data we are explicitly setting the time dimension and we are currently only testing iris. With your changes, we also test xarray which is not covered by the code. I think the easiest for now would be to set the coordinate names for xarray test data sets and then make sure the conversion routine is robust in converting these correctly. Your option 2 would already internally change the code which wanted to stay away for the moment or not?

@fsenf
Copy link
Member Author

fsenf commented May 10, 2023

@JuliaKukulies : Thanks for the hint! make_dataset_from_arr is now adjusted to output xarray with coordinates properly!

@JuliaKukulies
Copy link
Member

Excellent, @fsenf !

…st and than go to xarray in a conversion steps; this is consistent now for `make_dataset_from_arr` with the other approaches in the module so that the planned conversion tio xarray internals can always use the same scheme
@JuliaKukulies JuliaKukulies changed the base branch from RC_v1.5.0 to RC_v1.5.x July 14, 2023 20:15
@freemansw1
Copy link
Member

FWIW, I now have a mostly working copy of feature detection fully transitioned to xarray from iris, using many of the same principles as discussed here, including a migration of add_coordinates to xarray, which turned out to be a more difficult problem than I had imagined! I need to incorporate some of the tests here, but I wanted to pass it along. I'm happy to commit to the exp branch or do a separate PR into either the exp branch or into the RC_v1.5.x branch.

I still have some more testing and validation to do, and some of the folks running tracking on large datasets at CSU have volunteered to do more testing on this. Once I finish my testing, I'll be happy to contribute it back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
xarray transition Part of the transition to xarray support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants