Adding an xarray wrapper with apply_ufunc #15

jbusecke · 2022-05-12T21:54:15Z

Closes xarray wrapper #14

This draft PR represents the progress @TomNicholas and I made today.

We achieved the following:

Testing the output for given shapes of input

We wrote a little script (dev_numpy_wrapper.py) that tests the return shape of values from the 'raw' f2py function.

The results can be summarized like this:

# original:(200, 200, 10) | output:(200, 200, 10)
# original:(200, 200, 1) | output:(200, 200, 1)
# original:(200, 200) | output:(200, 200, 1)
# original:200 | output:(200, 1, 1)
# original:() | output:(1, 1, 1)
# (200, 200, 10, 4),  # ValueError: too many axes: 4 (effrank=4), expected rank=3
# (200, 200, 0),  # ValueError: unexpected array size: new_size=40000, got array with arr_size=0
# (200, 0, 3), #ValueError: unexpected array size: new_size=40000, got array with arr_size=0
# original:(1, 1, 1) | output:(1, 1, 1)

From which you can see that any input with <=3 dims/axes will return a 3d array, while >3 dims/axes or a 0 size axis will result in errors.

We concluded from this that the easiest way to use apply_ufuncs would be to ensure len(dims)==3 before invoking xr.apply_ufunc, because handling changes in array dimensionality is complicated. Our high level approach for now is to simple assert that input arrays have 3 dimensions after broadcasting. In the future we could simply expand along dummy dimensions, then pass to apply_ufunc and squeeze the dummy dimensions out again (the user would get back the same shape of array that they put in).

Working with chunked dask arrays

We have an initial test running in dev_xarray_wrapper.py that seems to successfully parallelize over a chunked array.

Outstanding issues:

Refactor this into a proper module + write tests
Check that results are 'transpose invariant'. If I assume correctly that the calculation is in principal pointwise, it doesnt matter for the results which dimension the fortran code internally iterates over. I will write a test that confirms that with some synthetic data. If this turns out to be right, we could tweak the performance of the fortran code by transposing the arrays appropriately outside of the wrapper (and add a comment about this). If the results are NOT the same, we need to think more deeply about how we make sure that the time dimension is identified (and then parallelized over since at that point it will be a core dimension).

source/aerobulk/flux.py

jbusecke · 2022-05-12T22:48:07Z

UPDATE: I have added the xarray wrapper to flux.py and added some initial tests.

I believe that the tests confirm this to be indeed true:

Check that results are 'transpose invariant'. If I assume correctly that the calculation is in principal pointwise, it doesnt matter for the results which dimension the fortran code internally iterates over. I will write a test that confirms that with some synthetic data. If this turns out to be right, we could tweak the performance of the fortran code by transposing the arrays appropriately outside of the wrapper (and add a comment about this).

Which is great news! This means that we can use this wrapper to process our data in the cloud already (given that we are careful about units).

jbusecke · 2022-05-12T22:58:11Z

Hmm interesting. The CI seems to be failing, but locally it worked for me. Could this be platform dependent?

jbusecke · 2022-05-12T23:00:11Z

Just expanded the testing matrix to include macos to rule that out.

TomNicholas

Tests look good, at least without a known answer value to compare to.

Lots of other small things to do still here though - do you want me to push to this PR?

jbusecke · 2022-05-13T18:42:58Z

Yeah sure. You can make suggestions or push directly. I will probably try to work on it later. Just let me know when you break for the weekend.

Oh I think the CI problems are caused by the None kwargs in the numpy wrapper, so if we could solve this with positional arguments only that might solve this.

dev_numpy_wrapper.py

jbusecke · 2022-05-27T15:56:46Z

Sorry this should not have been closed, my bad

jbusecke · 2022-06-09T21:19:46Z

Ooof this is really messed up. I am getting these weird error during the test execution that actually show up as passed even though they only pass a subset of the tests.
I need to check this really carefully before this gets merged.
More tomorrow

jbusecke · 2022-06-09T21:19:47Z

Ooof this is really messed up. I am getting these weird error during the test execution that actually show up as passed even though they only pass a subset of the tests.
I need to check this really carefully before this gets merged.
More tomorrow

… into xarray_wrapper2

rabernat

This seems like a good keep the project moving forward.

There is a lot more that could be done to make the API more friendly and flexible towards different input configurations. I should be able to call this on 1D or 4D data and have it automatically expanded / contracted appropriately. The constraint to use 3 dimensions should really not be seen by the user.

However, I am absolutely fine with deferring those feature to a future PR.

rabernat · 2022-06-13T15:17:19Z

source/fortran/mod_aerobulk_wrap_noskin.f90

+
+  END SUBROUTINE AEROBULK_MODEL_NOSKIN
+
+END MODULE mod_aerobulk_wrapper_noskin


I think it would be simpler to put both functions (skin + noskin) in the same module / f90 file.

So this seems to have caused the problems we had with pytest. The problem was actually not due to pytest, but arose, when both python wrappers were imported and called in the same module/script/notebook.
This was a brute force solution, but I think the overhead is not too bad for now.
As much as I would like to understand the problem, I think I really want to have a working implementation for now, instead of sinking more time into this somewhat obscure problem.

tests/test_flux_xr.py

source/aerobulk/flux.py

rabernat · 2022-06-13T15:21:18Z

source/aerobulk/flux.py


+    if len(sst.dims) < 3:
+        # TODO promote using expand_dims?
+        raise NotImplementedError


Add an actual message to this error

I refactored these warnings into a check_input decorator, which we can later use to implement logic to expand/reshape the input and revert that action after the values are returned.

source/aerobulk/flux.py

rabernat · 2022-06-13T15:24:37Z

source/aerobulk/flux.py

+        input_core_dims=[()] * 6,
+        output_core_dims=[()] * 5,


I'm curious about the lack of core dims. I think of 'time', 'lat', 'lon' as the core dims (input and output). Providing these would allow us to automatically work on arrays with more dimensions.

On a theoretical level the calculation is pointwise (this test confirms that). For practical reasons we definitely do not want to define time as a core dim, since it would prevent us from using dask='parallelized' in most cases.
The restrictions (of passing 3d arrays) can be solved independent of the actual dimensions, so I think it is not appropriate to define core dims here.

Co-authored-by: Ryan Abernathey <ryan.abernathey@gmail.com>

for more information, see https://pre-commit.ci

jbusecke · 2022-06-14T16:44:01Z

I made #28 to serve as a reminder for @rabernat concerns. Do you think we can merge this for now? Keen to get this released before our hack tomorrow.

cc @TomNicholas

TomNicholas · 2022-06-14T20:18:53Z

source/aerobulk/flux.py

+    algo,
+    zt,
+    zu,
+    niter,
 ):
    """Python wrapper for aerobulk without skin correction.
    !ATTENTION If input not provided in correct units, will crash.


This is a very vague warning. I would prefer that we actually listed the allowed value ranges under each input in the docstring.

I also don't think we should describe this error in terms of "correct units". The function expects the values to be expressed in certain units, AND the function will error if the values given are outside some (completely arbitrary) range.

Also you should be consistent with using the Warnings RST block.

TomNicholas · 2022-06-14T20:30:51Z

source/aerobulk/flux.py

+        test_arg = args[
+            0
+        ]  # assuming that all the input shapes are the same size. TODO: More thorough check


Why are we not just looping over all the input arguments?

source/aerobulk/flux.py

jbusecke · 2022-06-15T15:45:50Z

We need this to be available on conda for todays hack, so we will merge now, but record concerns raised in issues.

jbusecke added 3 commits May 12, 2022 15:29

add gitignore entries

cf37ace

apply_ufunc for np 3d

c7441bb

dask working

8859536

jbusecke commented May 12, 2022

View reviewed changes

source/aerobulk/flux.py Outdated Show resolved Hide resolved

jbusecke commented May 12, 2022

View reviewed changes

source/aerobulk/flux.py Outdated Show resolved Hide resolved

wrapper to module + tests

aac39a9

add macos for testing

dbeb1b1

TomNicholas requested changes May 13, 2022

View reviewed changes

jbusecke commented May 26, 2022

View reviewed changes

dev_numpy_wrapper.py Outdated Show resolved Hide resolved

Update dev_numpy_wrapper.py

a047295

paigem mentioned this pull request May 26, 2022

Niter bug fix + add test to compare to fortran #16

Merged

jbusecke closed this in #16 May 26, 2022

jbusecke reopened this May 27, 2022

paigem and others added 4 commits June 9, 2022 12:37

Merge branch 'main' into xarray_wrapper2

986e730

update xr functions + tests

85e5d0c

getting local segfault

b21aeb6

update gh workflow

045ad44

jbusecke mentioned this pull request Jun 9, 2022

Default niter to 6 #26

Closed

Update ci.yaml

900cd3a

jbusecke added 4 commits June 10, 2022 11:28

Testing error dependence on parameter order

30aecca

Merge branch 'xarray_wrapper2' of github.com:jbusecke/aerobulk-python…

d76853c

… into xarray_wrapper2

Fail case

075abfa

refactor f2py wrappers into sep modules

e9de870

jbusecke added 3 commits June 10, 2022 16:25

Delete dev_numpy_wrapper.py

2b7fb63

Delete dev_xarray_wrapper.py

f095da9

Add docstrings

e8eeea1

jbusecke marked this pull request as ready for review June 10, 2022 20:34

jbusecke requested a review from TomNicholas June 10, 2022 20:34

rabernat reviewed Jun 13, 2022

View reviewed changes

jbusecke and others added 5 commits June 13, 2022 13:31

Update tests/test_flux_xr.py

d47a299

Co-authored-by: Ryan Abernathey <ryan.abernathey@gmail.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

5df2ff4

for more information, see https://pre-commit.ci

Address Ryans Review

87527ce

Less checks on np wrapper

2a186e7

Remove Macos from python matrix

d5e4525

jbusecke mentioned this pull request Jun 14, 2022

Accomodate any input shape by extending/reshaping #28

Closed

TomNicholas reviewed Jun 14, 2022

View reviewed changes

jbusecke added 2 commits June 15, 2022 11:40

clearer warning

96fa114

rename data check decorator

b68d51f

This was referenced Jun 15, 2022

Explicitly handle the input argument ranges from Aerobulk #29

Closed

Loop through input arguments #30

Closed

jbusecke merged commit 2eef01d into xgcm:main Jun 15, 2022

This was referenced Jun 28, 2022

Handle land mask #40

Merged

Properly format warning blocks in the numpy/xarray wrappers #41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding an xarray wrapper with apply_ufunc #15

Adding an xarray wrapper with apply_ufunc #15

jbusecke commented May 12, 2022 •

edited

Loading

jbusecke commented May 12, 2022

jbusecke commented May 12, 2022

jbusecke commented May 12, 2022

TomNicholas left a comment

jbusecke commented May 13, 2022

jbusecke commented May 27, 2022

jbusecke commented Jun 9, 2022

jbusecke commented Jun 9, 2022

rabernat left a comment

rabernat Jun 13, 2022

jbusecke Jun 13, 2022

rabernat Jun 13, 2022

jbusecke Jun 13, 2022

rabernat Jun 13, 2022 •

edited

Loading

jbusecke Jun 13, 2022

jbusecke commented Jun 14, 2022

TomNicholas Jun 14, 2022

TomNicholas Jun 14, 2022

TomNicholas Jun 14, 2022

TomNicholas Jun 14, 2022

jbusecke commented Jun 15, 2022


		END SUBROUTINE AEROBULK_MODEL_NOSKIN

		END MODULE mod_aerobulk_wrapper_noskin

Adding an xarray wrapper with apply_ufunc #15

Adding an xarray wrapper with apply_ufunc #15

Conversation

jbusecke commented May 12, 2022 • edited Loading

Testing the output for given shapes of input

Working with chunked dask arrays

jbusecke commented May 12, 2022

jbusecke commented May 12, 2022

jbusecke commented May 12, 2022

TomNicholas left a comment

Choose a reason for hiding this comment

jbusecke commented May 13, 2022

jbusecke commented May 27, 2022

jbusecke commented Jun 9, 2022

jbusecke commented Jun 9, 2022

rabernat left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rabernat Jun 13, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbusecke commented Jun 14, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbusecke commented Jun 15, 2022

jbusecke commented May 12, 2022 •

edited

Loading

rabernat Jun 13, 2022 •

edited

Loading