Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix zero HVG bug emerging when batch_key is given #1180

Merged
merged 9 commits into from May 18, 2020
Merged

Fix zero HVG bug emerging when batch_key is given #1180

merged 9 commits into from May 18, 2020

Conversation

gokceneraslan
Copy link
Collaborator

No description provided.

Copy link
Member

@ivirshup ivirshup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I'm just requesting a minor documentation update.

I'm a little unsure about changing the return type, but also I would like to see if anyone complains. My expectation is that no one wants a recarray.

Comment on lines +221 to +224
if min_disp is None: min_disp = 0.5
if min_mean is None: min_mean = 0.0125
if max_mean is None: max_mean = 3
if max_disp is None: max_disp = np.inf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two thoughts on this:

  • Should this function throw an error if the user passes n_top and one of these values?
  • These values should get documented. If we're not using None as a sentinel for missing values, could these just be the argument defaults?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Should this function throw an error if the user passes n_top and one of these values?

This is handled already with a "logg.info" line:

    if n_top_genes is not None and not all(m is None for m in [
        min_disp, max_disp, min_mean, max_mean
    ]):
        logg.info('If you pass `n_top_genes`, all cutoffs are ignored.')
  • These values should get documented. If we're not using None as a sentinel for missing values, could these just be the argument defaults?

I actually like that they are None by default, it seems more consistent with the rest of scanpy. I added the defaults to the documentation.

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default values

What I like about default arguments is that they get automatically documented and are probably right, while manually documented values can go out of date if the values change. I think we only use None as a default value when we want to be able to check if the user provided a value.

n_top_genes / value filters behavior

I think the behavior for the filters (i.e. max, min, and n_top_genes) should either work together or be mutually exclusive. If they're mutually exclusive, an error is better than a warning. Otherwise I think it's too easy to miss a warning message and think the code is doing something it's not.

For example, I think this line could select the top 100 genes that have a mean of at least 2:

sc.pp.highly_variable_genes(adata, min_mean=2, n_top_genes=100)

If this goes in a script in a pipeline, it'd be very easy not to see the logged message. Then the user would have the wrong idea of what their output means. I think failing fast is better in this case.

In addition, I'm not sure that it makes sense for n_top_genes to have a higher priority than the value filters.

@ivirshup ivirshup merged commit 4ceb5ea into master May 18, 2020
@ivirshup
Copy link
Member

Was there an issue that should be closed by this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants