-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
merging with a loom file #36
Comments
Hi Samuele, so Could you just load all loom files into three That should hopefully give you the same dimensions. |
Hej Volker, thanks for the hint, I thought it was the right way to go, based on some issues I read on the velocyto command line tool page. I'll post back when I try it :) Cheers, |
Hej again, I tried using |
You're right, I forgot to mention that we only recently enabled the |
ok thanks. |
|
Ok, then I try with anndata. Should I just update it from conda? I can see it has version 0.6.17 available :) |
It's not released yet. Just install the latest commits from source via
|
Did that work for you? |
Hej Volker,
I was going to write about this.
I get the same object by concatenating my 3 loom files opened in scanpy and
by loading the file combined with loompy.
However when I do the merging with my previously annotated object, I still
get an error. I worked the issue around by selecting cells matching the
name, but I lose a thousand cells that seem not to be in there.
I post the commands as soon as I have my hands on the laptop.
Cheers,
Samuele
ons. 30. jan. 2019 5.25 PM skrev Volker Bergen <notifications@github.com>:
… Did that work for you?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#36 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AIXvUeJhZ24h1t-sbJexi7riTYsOD80_ks5vIcdigaJpZM4aOh-M>
.
|
Hi again, so as I said concatenating 3 loom files or using the one combined by loompy gives the same thing. all_data_merged = scv.utils.merge(all_data, all_data_loom) outputs again this error:
As a tweak I extracted the layers (magic and raw) from my annData object and canceled them from it.
after running succesfully the merging with the loom file, I added again the layers into the resulting object. However, I get again an error because the shapes do not match again. before merging I had 7379 cells and now I get a little over 6000. I think the problem is recognizing cell names that are duplicated when the merging is done: >>> all_data #previous annData object
AnnData object with n_obs × n_vars = 7379 × 15000
>>> all_data_merged = scv.utils.merge(all_data, all_data_loom)
>>> all_data_merged #merged object
AnnData object with n_obs × n_vars = 6717 × 33694 The number of genes do not match as well, but one can do it manually very easily. |
Tested the
|
I get the two lengths 0 15009.
in the data and in the loom data, respectively. I am using again 0.1.16 as a release, not the one in development, but the one available from pip. |
The merge module first "cleans up" names (basically removing everything that is not from 'ACTG'), what you can do with
Then intersect. Can you run that again with cleaned names and also check if they're still unique? |
If I use the cleaning tools, I obtain
I get exactly 7219, while I would expect as well 6557. |
Would need to figure out which cells get sorted out and why (after cleaning)
|
There are actually some names popping out that seem not to be clean, but have still "-1" in the name
The other names seem ok. If I run on the loom data
then I get a vector of length
I noticed that here there are as well names with some remaining |
The You might need a tailored |
That would be nice, of course only when you have time to look at it :) I guess this kind of problem might arise pretty often when one has concatenated different datasets into a single annotated data object. But it is probably also impossible to make a merge function that generalizes to all possible cases. |
I sent the data in a mail to your institutional address :) |
I conclude that the Rather something is corrupted with the data. Detailed disc via mail. |
That is in some way nice, so I know that this kind of stuff should not happen and things would rather have to go more smooth. |
Thanks a lot for the help and patience. Still owe a beer ;) |
Hi |
If the var_names are the same, but in different order, you can simply put them in the right order with Good point though. I'll include that in the |
Hi, Do you have an example of how the Thanks in advance, |
Sure, here you find the module description along with several examples. It's basically The multiple .var issue is addressed in scverse/anndata#162 and will be resolved soon. If it bothers you, you can just delete the duplicated .var columns. The You raised quite a few questions. Let me know, if I overlooked anything. |
Thanks for your answer. Thanks, HM |
Are you sure, that you didn't have any duplicates in Seurat? You can check that in python with As of satijalab/seurat#1238 it looks like it is handling/removing duplicates internally. In scanpy/scvelo, this is fixed by calling |
Thanks, Since I am new to python, could you please write an example code that will sum the rows of the same gene names by columns. The output will be a table of unique gene names by cells, and the values are the summed counts. Thanks a lot, HM |
Get the names of your duplicates:
subset AnnData to the first duplicate var_name:
Here, you can check whether you have If they have unique identifiers, also the number of cells expressed for that gene with duplicate names would probably be different:
|
Hi @VolkerBergen - My apologies for asking another q for closed issue, but when I read my data as below, a warning stated Variable names are not unique. To make them unique, call Thank you very much for your help! |
@denvercal1234GitHub, as the function name suggests and indicated by the warning message, variable names are made unique (e.g. by appending a suffix) and not duplicate variables removed. |
Ah, I see. Thank you Weiler. |
Hej,
I am trying to use
scv.utils.merge(adata, adata_loom)
to merge my dataset used with scanpy and the related loom dataset opened with scvelo. However, adata has dimension 7370x15000, while adata_loom has dimension 282016x33694. adata is the concatenation of 3 datasets, one from each individual, and the loom file I used is the combination of the 3 separate individuals' file through loompy.combine.I am pretty ok with the second dimension (I kept the most expressed 15000 genes in scanpy), but the first dimension is not clear to me, since it is really large, and generates an error because adata and adata_loom have the attribute
layers
containing matrices of incompatible shape.Is the first dimension to be interpreted in another way than just cells? Should I run the velocity analysis on the loom file without being allowed to merge?
Cheers,
Samuele
The text was updated successfully, but these errors were encountered: