-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding an xarray wrapper with apply_ufunc #15
Conversation
UPDATE: I have added the xarray wrapper to I believe that the tests confirm this to be indeed true:
Which is great news! This means that we can use this wrapper to process our data in the cloud already (given that we are careful about units). |
Hmm interesting. The CI seems to be failing, but locally it worked for me. Could this be platform dependent? |
Just expanded the testing matrix to include macos to rule that out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests look good, at least without a known answer value to compare to.
Lots of other small things to do still here though - do you want me to push to this PR?
Yeah sure. You can make suggestions or push directly. I will probably try to work on it later. Just let me know when you break for the weekend. Oh I think the CI problems are caused by the |
Sorry this should not have been closed, my bad |
Ooof this is really messed up. I am getting these weird error during the test execution that actually show up as passed even though they only pass a subset of the tests. |
1 similar comment
Ooof this is really messed up. I am getting these weird error during the test execution that actually show up as passed even though they only pass a subset of the tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a good keep the project moving forward.
There is a lot more that could be done to make the API more friendly and flexible towards different input configurations. I should be able to call this on 1D or 4D data and have it automatically expanded / contracted appropriately. The constraint to use 3 dimensions should really not be seen by the user.
However, I am absolutely fine with deferring those feature to a future PR.
|
||
END SUBROUTINE AEROBULK_MODEL_NOSKIN | ||
|
||
END MODULE mod_aerobulk_wrapper_noskin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be simpler to put both functions (skin + noskin) in the same module / f90 file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this seems to have caused the problems we had with pytest. The problem was actually not due to pytest, but arose, when both python wrappers were imported and called in the same module/script/notebook.
This was a brute force solution, but I think the overhead is not too bad for now.
As much as I would like to understand the problem, I think I really want to have a working implementation for now, instead of sinking more time into this somewhat obscure problem.
source/aerobulk/flux.py
Outdated
|
||
if len(sst.dims) < 3: | ||
# TODO promote using expand_dims? | ||
raise NotImplementedError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add an actual message to this error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored these warnings into a check_input
decorator, which we can later use to implement logic to expand/reshape the input and revert that action after the values are returned.
input_core_dims=[()] * 6, | ||
output_core_dims=[()] * 5, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious about the lack of core dims. I think of 'time', 'lat', 'lon'
as the core dims (input and output). Providing these would allow us to automatically work on arrays with more dimensions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a theoretical level the calculation is pointwise (this test confirms that). For practical reasons we definitely do not want to define time as a core dim, since it would prevent us from using dask='parallelized'
in most cases.
The restrictions (of passing 3d arrays) can be solved independent of the actual dimensions, so I think it is not appropriate to define core dims here.
Co-authored-by: Ryan Abernathey <ryan.abernathey@gmail.com>
for more information, see https://pre-commit.ci
I made #28 to serve as a reminder for @rabernat concerns. Do you think we can merge this for now? Keen to get this released before our hack tomorrow. cc @TomNicholas |
algo, | ||
zt, | ||
zu, | ||
niter, | ||
): | ||
"""Python wrapper for aerobulk without skin correction. | ||
!ATTENTION If input not provided in correct units, will crash. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very vague warning. I would prefer that we actually listed the allowed value ranges under each input in the docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't think we should describe this error in terms of "correct units". The function expects the values to be expressed in certain units, AND the function will error if the values given are outside some (completely arbitrary) range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also you should be consistent with using the Warnings RST block.
test_arg = args[ | ||
0 | ||
] # assuming that all the input shapes are the same size. TODO: More thorough check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we not just looping over all the input arguments?
We need this to be available on conda for todays hack, so we will merge now, but record concerns raised in issues. |
This draft PR represents the progress @TomNicholas and I made today.
We achieved the following:
Testing the output for given shapes of input
We wrote a little script (
dev_numpy_wrapper.py
) that tests the return shape of values from the 'raw' f2py function.The results can be summarized like this:
From which you can see that any input with <=3 dims/axes will return a 3d array, while >3 dims/axes or a 0 size axis will result in errors.
We concluded from this that the easiest way to use apply_ufuncs would be to ensure len(dims)==3 before invoking xr.apply_ufunc, because handling changes in array dimensionality is complicated. Our high level approach for now is to simple assert that input arrays have 3 dimensions after broadcasting. In the future we could simply expand along dummy dimensions, then pass to apply_ufunc and squeeze the dummy dimensions out again (the user would get back the same shape of array that they put in).
Working with chunked dask arrays
We have an initial test running in
dev_xarray_wrapper.py
that seems to successfully parallelize over a chunked array.Outstanding issues: