Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Drop-seq bam files? #17

Closed
FloWuenne opened this issue Nov 6, 2017 · 11 comments
Closed

Support for Drop-seq bam files? #17

FloWuenne opened this issue Nov 6, 2017 · 11 comments

Comments

@FloWuenne
Copy link

Hi there,

first of all, thank you very much for the great package and the awesome preprint, really enjoyed reading your analysis using RNA velocity and the theory behind it.

I was going through the python tutorial and was wondering whether you are planning to add a run mode, that would make it easy to run velocyto with BAM files from the Drop-seq pipeline.

The format for BAM files created with this pipeline is described here:
Dropseq Alignment Cookbook

Basically, the Cell barcode for each read is tagged with the XC tag and the molecular barcode with the XM tag. I think it would be really cool to use your approach on some Drop-seq data. Maybe there is already a way to use the package on these kinds of BAM files and I haven't noticed if so, please excuse and close the issue. If not, I think there are quite a few people using Drop-seq, that would be interested to apply velocyto to their data!

Best,

Florian

@gioelelm
Copy link
Member

gioelelm commented Nov 6, 2017

Yes, we plan to support Drop-seq soon. Actually we were planning to use the bam output of the DropEst pipeline (that is in cellranger-like format). However, supporting the .bam output of pipeline you linked sounds like a good idea. I would easier for me if I could get an example of such a .bam file. Could you provide a test file that I can test on?

Some extra questions:
Are really XC and XM tags the only differences with cellranger bam files?
Would there be a need to specify a list of "valid barcodes" or the bam file will contained only reads mapping to valid barcodes?
Are UMIs already error-corrected as in cell-ranger?

@gioelelm gioelelm self-assigned this Nov 6, 2017
@gioelelm gioelelm added this to the 1.0 milestone Nov 6, 2017
@FloWuenne
Copy link
Author

Great to hear that this is already in the pipe 👍 !

I can provide you a BAM file that you would get from running the standard pipeline. The BAM contains still ALL barcodes, not just the real ones. So usually for getting a gene x cell matrix you still need to supply a file with valid Barcodes but since people running Drop-seq will have this list anyways it would be no extra effort for them to get it. If the user just supplies a list of Cell barcodes with one Cell barcode per row to velocyto, that would be the easiest for most Drop-seq users I believe.

Regarding the Cell-ranger questions, I have actually never analyzed 10x data with CellRanger so I don't really know what steps are in that downstream pipeline. The only extra tags in the Drop-seq BAMs should be XM and XC as far as I know.

The UMIs are error corrected in the sense that they are merged based on Hamming distance and that some general synthesis errors that might come from the beads. Besides that, I don't think there is any specific correction of UMIs.

@gioelelm
Copy link
Member

gioelelm commented Nov 6, 2017

Ok, I made some changes to the code. In principle (as soon as the .bam file it contains the tags you described) Drop-seq bam should work now! Just use the velocyto run subcommand and specify a barcode file.

Of course the feature it is untested yet, I will try as soon as you provide the link to a sample .bam and the correspondent barcode file. Could you also specify the correspondent .gtf to the version of the genome you are mapping against?

It might faster for you to test the fix yourself. Pull the latest commit and Install from source pip install -e .

@FloWuenne
Copy link
Author

I will try it out as soon as I can and let you know whether it works or not!

Thanks so much for the quick integration and I will see whether everything works out!

@FloWuenne
Copy link
Author

Im getting an errror when trying to install via pip:

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-kdva_8wf/velocyto/

Tried on our cluster as well on local computer where I have sudo rights and get the same message. Any suggestions how to fix this?!

Have the latest version pip installed and python 3.6.2...

@gioelelm
Copy link
Member

gioelelm commented Nov 7, 2017

Have you uninstalled previous version? pip uninstall velocyto?

@gioelelm
Copy link
Member

gioelelm commented Nov 7, 2017

I assume you are already following this guide right?

Installing in a conda environment is fundamental. Dealing with dependencies could be a pain otherwise.

@gioelelm
Copy link
Member

gioelelm commented Nov 9, 2017

Updates on the issue?

@FloWuenne
Copy link
Author

I contacted support on our cluster because this is the main place where I would like velocyto to run. I guess its an issue with dependencies on the cluster. However, I also tried to install on my local machine and ran into the same problem. I do think it is just an issue with dependencies like you pointed out.

Will get back to you as soon as I got it installed ;).

@FloWuenne
Copy link
Author

So update: Got it installed after fixing dependencies and updating conda for python.

I ran it on the Drop-seq BAM with supplying a list of barcodes. It terminated succesfully without any errors.

Whats the easiest way to check whether your updated code worked in terms of Cell barcode and UMI identification? Should I just follow one of the examples from here?
https://github.com/velocyto-team/velocyto.R

@gioelelm
Copy link
Member

Start with calling plot_fractions and continue by following the tutorial or this notebook https://github.com/velocyto-team/velocyto-notebooks/tree/master/python. I would also quickly check some genes that I am familiar with, in the dataset, if it is similar to what you were getting with your standard pipeline, it is probably right, but you can check more estensivelly with a proper cell paorwise comparison.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants