Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Density calculation and plotting #543

Merged
merged 23 commits into from Mar 22, 2019

Conversation

Projects
None yet
5 participants
@LuckyMD
Copy link
Contributor

commented Mar 20, 2019

This pull request is for calculating and plotting cell densities on an embedded representation. This is especially useful together with an .obs covariate to calculate and visualize cell densities over conditions.

Code is adapted from raw version by @sophietr .

Still work in progress...

@falexwolf

This comment has been minimized.

Copy link
Member

commented Mar 20, 2019

This looks good so far! Thank you!

Can we call it embedding_density or something similar? density is a little generic; we might have other methods in the future...

Some form of test would also be great! ;)

Let me know when this should be merged.

'dm' : Diffusion map
'pca' : PCA
'tsne' : t-SNE
'fa' : Force-directed graph layout by Force Atlas 2

This comment has been minimized.

Copy link
@gokceneraslan

gokceneraslan Mar 20, 2019

Collaborator

I think this is same as the basis option we have in other plotting functions (https://github.com/theislab/scanpy/blob/master/scanpy/plotting/_anndata.py#L79). There is no need to hard-code basis, it shouldn't be limited to predefined basis as in here. We can use spatial basis, dca or scvi basis etc., if they're stored in obsm.

You can use the same processing here: https://github.com/theislab/scanpy/blob/master/scanpy/plotting/_anndata.py#L300

There is also a trick to skip the useless dimension of diffmap there.

This comment has been minimized.

Copy link
@LuckyMD

LuckyMD Mar 20, 2019

Author Contributor

This is indeed the same... I should rename the argument basis. And yes, the adaptation for diffmap is very helpful.

In terms of not hard-coding I guess I can just throw an error to saw that adata.obsm['X_'+basis] does not exist then?

This comment has been minimized.

Copy link
@gokceneraslan

gokceneraslan Mar 20, 2019

Collaborator

Sounds good.

@LuckyMD

This comment has been minimized.

Copy link
Contributor Author

commented Mar 20, 2019

@fidelram I guess you are the right person to ask for help with this... I'm struggling to work nicely with plot_scatter(). I am trying to generate a plot where density values for non-selected conditions are grey, while density values for the selected condition are on 'YlOrRd' or another color map. It seems this is not ideal with a single plot_scatter() call (which I was hoping to use as the facet wrapping is already done there). For the grey values I am using a color value of -1, while the others are between 0 and 1. However, when I define a color map that is symmetric around 0, positive values near 0 are mapped to grey instead of colours... any idea why?

LuckyMD
renamed embedding to basis, removed hard coded options for basis, and…
… added diff map components starting at 1
if basis == 'fa':
basis = 'draw_graph_fa'

if 'X_'+basis not in adata.obsm.dtype.names:

This comment has been minimized.

Copy link
@gokceneraslan

gokceneraslan Mar 21, 2019

Collaborator

adata.obsm_keys() might be better here.

This comment has been minimized.

Copy link
@LuckyMD

LuckyMD Mar 21, 2019

Author Contributor

Thanks! Wasn't aware of that.


components = [0,1]

if basis == 'dm':

This comment has been minimized.

Copy link
@gokceneraslan

This comment has been minimized.

Copy link
@LuckyMD

LuckyMD Mar 21, 2019

Author Contributor

good spot :)

@falexwolf

This comment has been minimized.

Copy link
Member

commented Mar 21, 2019

Can't you use vmin and vmax? I remember there was a similar issue asking for greying out some data points a couple of weeks ago. And the person asking seemed to be happy with that answer...

@fidelram

This comment has been minimized.

Copy link
Collaborator

commented Mar 21, 2019

@LuckyMD You can do something like this:

import matplotlib.pyplot as plt
import matplotlib as mpl

adata = sc.datasets.blobs()
adata.obs['n_blob'] = np.array(adata.obs.blobs.apply(lambda x: int(x)))
sc.tl.pca(adata)


cmap = plt.get_cmap('YlOrRd')
norm = mpl.colors.Normalize(vmin=1, vmax=3)
cmap.set_over('blue')
cmap.set_under('lightgray')

sc.pl.pca(adata, color='n_blob',  cmap=cmap, norm=norm)

image

In this case any value that is less than 1 get a lightgray color and any value greater than 2 gets a blue color. You can pass cmap and norm to any scatter plot function including plot_scatter

@LuckyMD

This comment has been minimized.

Copy link
Contributor Author

commented Mar 21, 2019

I had no idea that I could define out-of-range colours like this. That's amazing! Thanks a lot!

LuckyMD added some commits Mar 21, 2019

"""Plot the density of cells in an embedding (per condition)
Plots the gaussian kernel density estimates (over condition) from the
`sc.tl.density()` output.

This comment has been minimized.

Copy link
@fidelram

fidelram Mar 21, 2019

Collaborator

I think here should be sc.tl.embedding_density

key : `str`
Name of the `.obs` covariate that contains the density estimates
group : `str`, optional (default: `None`)
Category of the observed covariate which will be plotted.

This comment has been minimized.

Copy link
@fidelram

fidelram Mar 21, 2019

Collaborator

is not clear to me from the description what group means or does. Is a categorical value from .obs ? If this is the case, in other tools we tend to use groupby.

This comment has been minimized.

Copy link
@LuckyMD

LuckyMD Mar 21, 2019

Author Contributor

It's a specific category from a categorical covariate in .obs. I struggle to define this in a nicer way.

This comment has been minimized.

Copy link
@sophietr

sophietr Mar 21, 2019

maybe similar to sc.pl.umap (and other plotting functions)
groups : str, None
Restrict to a few categories in categorical observation annotation. The default is not to restrict to any groups.

This comment has been minimized.

Copy link
@LuckyMD

LuckyMD Mar 21, 2019

Author Contributor

Good idea, thanks. I don't think I can use plot_scatter to plot multiple categories like this though, because you need different dot sizes, and density values for each category. I don't think I can just wrap all those things in a list for multiple plots.

Name of the `.obs` covariate that contains the density estimates
group : `str`, optional (default: `None`)
Category of the observed covariate which will be plotted.
"""

This comment has been minimized.

Copy link
@fidelram

fidelram Mar 21, 2019

Collaborator

can you add an example here?

Something like (I am copying from the sc.tl.dendrogram function.

    Examples
    --------
    >>> adata = sc.datasets.pbmc68k_reduced()
    >>> sc.tl.dendrogram(adata, 'bulk_labels')
    >>> sc.pl.dendrogram(adata, 'bulk_labels')

This comment has been minimized.

Copy link
@LuckyMD

LuckyMD Mar 21, 2019

Author Contributor

done, thanks for the tip!

components = [1,2]

if key not in adata.obs:
raise ValueError('Please run `sc.tl.density()` first and specify the correct key.')

This comment has been minimized.

Copy link
@fidelram

fidelram Mar 21, 2019

Collaborator

here should also be sc.tl.embedding_density and in all other lines.

Gaussian kernel density estimation is used to calculate the density of
cells in an embedded space. This can be performed per category over a
categorical cell annotation. The cell density can be plotted using the
`sc.pl.density()` function.

This comment has been minimized.

Copy link
@fidelram

fidelram Mar 21, 2019

Collaborator

change to .tl.embedding_density

@LuckyMD

This comment has been minimized.

Copy link
Contributor Author

commented Mar 21, 2019

@fidelram I think I found a bug in plot_scatter. While the sort_order parameter is reordering the data points to plot the highest obs covariate last, which is great... I don't think this is happening correctly with the dot sizes. Thus, with sort_order=True you cannot really use dot sizes. I don't have a minimal example for this yet, but I think this is what I see while calibrating defaults for dot sizes with this code. I will confirm later, but let me know if you think this is possible.

@fidelram

This comment has been minimized.

Copy link
Collaborator

commented Mar 21, 2019

@LuckyMD plot_scatter does not sort size.. Let me submit a PR for this quickly.

@LuckyMD

This comment has been minimized.

Copy link
Contributor Author

commented Mar 21, 2019

@fidelram Thanks a lot! I was getting confused why my dot sizes weren't working (only sometimes, strangely...). If the dot sizes are sorted according to the color variable it should hopefully work.

LuckyMD added some commits Mar 21, 2019

@LuckyMD

This comment has been minimized.

Copy link
Contributor Author

commented Mar 21, 2019

I think everything should be there now pending @fidelram's fix. If that solves my dot size issues, then it's ready to merge. So I'm waiting for #546 to be merged.

@fidelram

This comment has been minimized.

Copy link
Collaborator

commented Mar 21, 2019

@LuckyMD #546 is merged.

@falexwolf

This comment has been minimized.

Copy link
Member

commented Mar 21, 2019

Wow! This looks absolutely fantastic! So I guess we can merge and make a 1.4.1?

@LuckyMD

This comment has been minimized.

Copy link
Contributor Author

commented Mar 22, 2019

Let me check if dot sizes work now. Also, I still have another function I would like to add, which probably requires less work. Just some marker gene overlap test that takes a dictionary as input. Could you give til next week Wednesday for that?

@LuckyMD

This comment has been minimized.

Copy link
Contributor Author

commented Mar 22, 2019

Dot sizes now work, this is ready to be merged... assuming you are happy with how I integrated it into the documentation (I added a separate "density" subsection for the plotting tools as I felt it shouldn't really be put in the same category as embeddings.)

@falexwolf

This comment has been minimized.

Copy link
Member

commented Mar 22, 2019

Great! 😄

Sure, next Wednesday is fine. Also, we can always simply make 1.4.2. I just need to check one tiny thing before we actually make the release.

@falexwolf falexwolf merged commit 72429bb into master Mar 22, 2019

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@flying-sheep flying-sheep deleted the density_plots branch Mar 22, 2019

@fidelram

This comment has been minimized.

Copy link
Collaborator

commented Mar 25, 2019

@LuckyMD why don't you add a test to scanpy/tests/test_plotting.py. Thus we can guarantee that future changes do not break your code.

@LuckyMD

This comment has been minimized.

Copy link
Contributor Author

commented Mar 25, 2019

I added two tests in scanpy/tests/test_embedding_density.py... one of them just to test if the plotting functions run. Should this have been placed in a different file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.