New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dotplot where sizes are proportional to p-value and the color to log2-fold change? #562
Comments
I thing that is not so difficult to achieve this. I submit a PR soon
Fidel Ramírez
… On 26 Mar 2019, at 22:02, Alex Wolf ***@***.***> wrote:
@fidelram, as discussed today, could we adopt pl.rank_genes_groups_dotplot so that it reads this information from .uns['rank_genes_groups']?
Maybe just a simple switch? Or having arguments color and size be a choice from a selection {pvals, pvals_adj, log2FC, expression, frac-genes-expressed}.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
I would also love that actually 😄
|
@gokceneraslan the idea of a @falexwolf should we give it a try or do you see problem with this? |
If this is stored in |
Which is tracked in #134 I think. |
Why is it that .obs, .var, and .uns don't have data frames in them? Also, I'd like to suggest that storing all differential expression within the anndata object might get complicated, and deserve it's own class. It'd be nice if it could be easy to tell what cells and genes were compared, what exactly was being tested, and which direction is "up". That said, the results should definitely be easily accessible as a data frame. |
Huh? The reason why they haven’t originally been that is that |
Ah, totally miswrote that. I meant |
I think enabling that |
Definitely would be. I'd thought the alternative would be to subclass something like a |
Sure, but I think it’s about the same effort to subclass a mapping while adding array features compared to the other way round. |
Back on the topic of getting dot plots a bit more flexible, I've been working on an approach that could work. You can check it out in this binder environment, but it's based on two main ideas:
Here's a quick example of the output: |
This looks really cool... but then I haven't used dot plots much before, so not sure what this is replacing... I just wonder if you can put different thresholds on the |
Completely agreed!
We just did only allow rec arrays in |
I think there are also two separate problems here, which are "what's a better way to store differential expression results" and "what's a good api for differential expression". I'm interested in the
|
Sounds great! Re tidy: Storing things internally in tidy format also seems inefficient to me... I remember a long discussion with Philipp more than 2 years ago... 😄 Re diffxpy: If you say that diffxpy has a good solution, why should we build a new one? Can't we just use their solution?
Completely agreed.
Re sc.extract One of the core ideas of Scanpy (as opposed to, say, scikit learn) was to have this model of taking the burden of bookkeeping from the user as much as possible. This design messed up, in particular, the return values of There is a trade-off between having nice APIs and return values (such as dataframes) and a transparent and efficient on-disk representation in terms of HDF5, zarr or another format. These days, I'd even consider simply pickling things, which would have saved us a lot of work; but I thought that we'd need established compression facilities, concatenation possibilities, some way to manually "look into" an on-disk object (both from R and from the command line) so that it's maximally transparent and then the widely established, cross-language, but old-school and not entirely scalable HDF5 seemed the best. The Human Cell Atlas decided in favor of zarr meanwhile. But that's not a drama, because Scanpy only writes "storage-friendly" values to AnnData, that is, arrays and dicts. HDF5 knows how to handle them and zarr also. If one uses xarray or dataframes, one has to think about how this gets written to disk. That being said: it's likely that we'll continue to choose representations for on-disk (and in-memory) storage that aren't convenient (rec arrays, for instance), a three-dimensional xarray and dicts. A general solution for this problem would be the mentioned The first function in that namespace should be Now, we can apply this logic to every single function that doesn't have a simple return value. Upon calling the function with I think DataFrames (a case like Other possible names for the API would be PS: I'd love to move away from the name |
I really like the |
I like the idea for the "nice data accessors module"! Maybe | If you say that diffxpy has a good solution, why should we build a new one? Can't we just use their solution? I think this would involve throwing away recarrays, unless someone wants to write a converter (not me). I'm also not so sure how mature/ stable | That being said: it's likely that we'll continue to choose representations for on-disk I like that the current representations are pretty easy to read in other languages as they're mostly standard hdf5 types. I think there are definitely cases where it make sense to break cross-compat, like complicated datastructures for a specific package (an index, for example). | If one uses xarray or dataframes, one has to think about how this gets written to disk My impression is |
@davidsebfischer: do you feel you have a mature solution for storing simple difftest results that could be reused for
If xarray does everything we want (sparse and categorical data), that would be great, of course. I was investigating pandas hdf5 early on and decided against it as it was very opaque (e.g., I couldn't see how to easily implement on-disk concatenation on it) and it didn't seem to offer performance gains. |
xarray doesn't do sparse :(. They're also holding off for csd/ csf formats in pydata/sparse I believe. |
OK, we have those alternatives:
I think |
Based on discussion from in: scverse#562
Based on discussion from in: scverse#562
Based on discussion from in: scverse#562
* Unifies common code between `dotplot`, `matrixplot` and `stacked_violin` plots while adding flexibility to the plots. sc.pl.dotplot`, `sc.pl.matrixplot` and `sc.pl.stacked_violin` methods had been transformed into wrappers for the new `DotPlot`, `MatrixPlot` and `StackedVioling` classes. Accessing the new classes directly allows further fine tuning of the plots. * The new plot classes are all descendants of `BasePlot` class that captures the common code. The design of the classes follows the method chaining (as found in Pandas or Altair). This allows the addition of independent features (via well documented methods) to the plot without increasing the number parameters of a single function. This was first suggested here #956. All objects have consistent functions for `legend`, to set up titles and width, `style()` to set visual parameters specific to each plot like colormap, edge color, linewidth. `swap_axes` to transpose the figure, `add_dendrogram` with options to change the with of the dendrogran and `add_total` tho show a bar plot of the total number of cells per category. Also includes options to sort the categories. * Previous functionality is maintained but plots will look slightly different. * This commit addresses issues from #979 and #1103 related to `sc.pl.dotplot` * Now is possible to plot fold changes, log fold changes and p-values from `sc.tl.rank_genes_groups` as suggested in #562 Specific changes: **all figures**: * Set a title to the image. * Pass an `axe` where to plot the image. * Return a dictionary of axes for further manipulation * using `return_fig` the plot object can be used to adjust the proportions to the legend and other visual aspects can be fine tuned. * a bar plot with the totals per category can be added. This will align at the left of the image if the categories are plotted in rows or at the top if they are plotted in columns. * legend can be removed * `groupby` can be a list of categories. **dotplot** * Improved the colorbar and size legend for dotplots. Now the colorbar and size have titles, which can be modified using the `colorbar_title` and `size_title` arguments. They also align at the bottom of the image and do not shrink if the dotplot image is smaller. * Plot genes in rows and categories in columns (swap_axes). * Using the DotPlot object the dot_edge_color and line width can be set up, a grid added as well as several other features * `sc.pl.rank_genes_groups_dotplot` can now plot `pvals` and `log fold changes` **matrixplot** * added title for colorbar and positioned as in dotplot * `sc.pl.rank_genes_groups_matrixplot` can now plot `pvals` and `log fold changes` **stacked_violin** * violin colors can be colored based on average gene expression as in dotplots * made the linewidth of the violin plots smaller. * removed the tics for the y axis as they tend to overlap with each other. Using the `style` method they can be visualized. **other** * `sc.pl.heatmap` and `sc.pl.trackplots` now return a dictionary of axes when `show=False` as for the other plots. * now 'interpolation' can be passed as parameter for `sc.pl.heatmap`
@fidelram, as discussed today, could we adopt
pl.rank_genes_groups_dotplot
so that it reads this information from.uns['rank_genes_groups']
?Maybe just a simple switch? Or having arguments
color
andsize
be a choice from a selection {pvals
,pvals_adj
,log2FC
,expression
,frac-genes-expressed
}.The text was updated successfully, but these errors were encountered: