Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downsample total counts #474

Merged
merged 6 commits into from Mar 21, 2019

Conversation

Projects
None yet
4 participants
@ivirshup
Copy link
Collaborator

commented Feb 11, 2019

Update to downsample_counts to allow downsampling total counts, similar to normalization by cellranger aggr (I'm pretty sure on this, there's a lot going on in their code). Additionally, enabled caching for the numba'd function, which cuts down on test time.

As adding this feature meant renaming target_counts to counts_per_cell, this becomes a breaking change. Since it's breaking, I've also gone ahead and set replace=False by default as mentioned before (#340).

Definitely willing to make changes. I've implemented this since I'm doing some integration work and figured it'd be nice to be able to try the basic cellranger strategy.

Edit: The failing PAGA test occurs locally on master as well, but I don't think I broke that.

@flying-sheep flying-sheep force-pushed the theislab:master branch 2 times, most recently from 3efb194 to fc84096 Feb 12, 2019

ivirshup added some commits Feb 11, 2019

Allow downsampling counts from entire matrix
This changes the API to `sc.pp.dowsample_counts` to allow downsample on both a per-cell and whole dataset level. Downsampling the whole count matrix is done with the argument `"total_counts"` while downsampling cells is done with `"counts_per_cell"`.
Minor refactor/ cleanup.
* Internal `_downsample*` functions take the count matrix instead of the AnnData object
* Added caching to `_downsample_array` (speeds tests up a lot)
* Doc updates
* Made it so original sparse matrix type is returned for downsampling total counts
* Eliminate zeros after downsampling
* Minor miscellaneous cleanup/ formatting
Add decorator for deprecating keyword argument names.
Example usage:

```python
>>> @deprecated_arg_names({"b": "c"})
    def foo(a, c=2):
        return a + c

>>> foo(a, b=1)
 __main__:1: DeprecationWarning: Keyword argument 'b' has been deprecated in favour of 'c'. 'b' will be removed in a future version.
 2
```
Preserve backwards compatability
Made it so `sc.pp.downsample_counts` will still run if keyword argument "downsample_counts" is used, but will throw a deprecation warning.

@ivirshup ivirshup force-pushed the ivirshup:downsample_experiment branch from 91601d2 to f318741 Feb 18, 2019

@ivirshup

This comment has been minimized.

Copy link
Collaborator Author

commented Feb 18, 2019

I've added a decorator for renaming arguments, @deprecated_arg_names. Should this have gone in AnnData.utils?

Decorator example usage:

@deprecated_arg_names({"b": "c"})
def foo(a, c=2):
    return a + c

foo(a, b=1)
# __main__:1: DeprecationWarning: Keyword argument 'b' has been deprecated in favour of 'c'. 'b' will be removed in a future version.
# 2
@flying-sheep

This comment has been minimized.

Copy link
Member

commented Mar 8, 2019

Hi! I’ve wanted to introduce https://github.com/flying-sheep/legacy-api-wrap for some time.

do you think that would be sufficient or should we take the deprecation of kwargs into account?

There’s also tantale/deprecated#8

@falexwolf

This comment has been minimized.

Copy link
Member

commented Mar 8, 2019

Right, I'd also suspect that a non-scanpy-custom solution for deprecated arguments is the way to go. For now, this is fine; we can replace it if we transition there.

If @flying-sheep would go about replacing this with legaciy_api_wrap (after or before merging), this looks like a very good solution to me?

@ivirshup

This comment has been minimized.

Copy link
Collaborator Author

commented Mar 10, 2019

@flying-sheep I think changing the name of a keyword argument is specific enough to warrant it's own functionality

  • This allows giving a very specific warning about what should change in the user's code
  • This can fully abstract away the functional change (checking if the old argument was used, and renaming), so the implementation doesn't have any code dealing with the deprecation

As far as I can tell, legacy_api_wrap doesn't quite cover this use case, as it deals with positional arguments. Does that sound right to you?

@falexwolf If there was a commonly used, high quality library which provided tools for handling deprecations like this I wouldn't mind switching over to that. After some light googling, I didn't find one. Ultimately, it's a pretty trivial decorator, and I don't think it's adding much complexity here.

@falexwolf

This comment has been minimized.

Copy link
Member

commented Mar 10, 2019

I completely get your point, @ivirshup.

@falexwolf falexwolf referenced this pull request Mar 10, 2019

Open

TODO: Backwards-compat breaking changes #453

0 of 13 tasks complete
@flying-sheep

This comment has been minimized.

Copy link
Member

commented Mar 11, 2019

OK, so now the question is: should this become part of legacy-api-wrap?

I’d rather have the API fixed once than using multiple decorators. I think It’s clearer to see what the new API is like if you don’t have to think about the order of multiple decorators being applied.

Also, I think

@renamed_args(new="old")

feels more natural than

@deprecated_arg_names({"old": "new"})
@ivirshup

This comment has been minimized.

Copy link
Collaborator Author

commented Mar 16, 2019

On legacy_api_wrap, I don't think I have enough experience maintaining stable APIs to make a call.

I'm not too worried about the api for this decorator if it's just in scanpy. Since it'd be only meant for internal use there aren't any promises about the api that should be kept. It also means the issue of multiple decorators can be dealt with when it occurs, and an example case could help guide the decision.

I think that I prefer passing a dict to using kwargs because it might make sense to give this decorator keyword arguments of its own. For example, if you can specify the version it'll be removed. If keyword arguments we used I agree new="old" would make sense to me, but with a dict I see "old" maps to "new" as more intuitive.

@flying-sheep

This comment has been minimized.

Copy link
Member

commented Mar 18, 2019

I agree with all you said 😄

@ivirshup

This comment has been minimized.

Copy link
Collaborator Author

commented Mar 19, 2019

Great!

Is this all good to merge?

@falexwolf

This comment has been minimized.

Copy link
Member

commented Mar 19, 2019

@ivirshup, you mentioned it's a backwards compat breaking change; that's why I put it on the list of those changes. But I agree that the break is so little that we can also merge it now. It might take a while until the rest gets done. So, I'm happy to go ahead and merge.

@ivirshup

This comment has been minimized.

Copy link
Collaborator Author

commented Mar 19, 2019

Right, there were two changes, the bigger one I've fixed with that decorator. Since the smaller breaking change is just a change of default bool, I could split that out, but I'm not sure how you want to handle that. If there was a v1.5 branch, it could go there?

Cells with fewer counts than `target_counts` are unaffected by this. This
has been implemented by M. D. Luecken.
If `counts_per_cell` in specified, each cell will downsampled. If

This comment has been minimized.

Copy link
@fidelram

fidelram Mar 20, 2019

Collaborator

Maybe a typo in: If counts_per_cell is specified?

This comment has been minimized.

Copy link
@ivirshup

ivirshup Mar 23, 2019

Author Collaborator

Yep, should definitely be is. I've gotta start proofreading my docs better.

@falexwolf

This comment has been minimized.

Copy link
Member

commented Mar 21, 2019

Thanks for clarifying this again, @ivirshup! We should have changed the default value already for 1.4.

I'll add a note to the release notes and it's fine... ;)

@falexwolf

This comment has been minimized.

Copy link
Member

commented Mar 21, 2019

PS: @ivirshup, I tried writing you an email recently. Did you get it?

@falexwolf falexwolf merged commit a9b0175 into theislab:master Mar 21, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.