Support for Drop-seq bam files? #17

FloWuenne · 2017-11-06T21:05:38Z

Hi there,

first of all, thank you very much for the great package and the awesome preprint, really enjoyed reading your analysis using RNA velocity and the theory behind it.

I was going through the python tutorial and was wondering whether you are planning to add a run mode, that would make it easy to run velocyto with BAM files from the Drop-seq pipeline.

The format for BAM files created with this pipeline is described here:
Dropseq Alignment Cookbook

Basically, the Cell barcode for each read is tagged with the XC tag and the molecular barcode with the XM tag. I think it would be really cool to use your approach on some Drop-seq data. Maybe there is already a way to use the package on these kinds of BAM files and I haven't noticed if so, please excuse and close the issue. If not, I think there are quite a few people using Drop-seq, that would be interested to apply velocyto to their data!

Best,

Florian

gioelelm · 2017-11-06T21:24:43Z

Yes, we plan to support Drop-seq soon. Actually we were planning to use the bam output of the DropEst pipeline (that is in cellranger-like format). However, supporting the .bam output of pipeline you linked sounds like a good idea. I would easier for me if I could get an example of such a .bam file. Could you provide a test file that I can test on?

Some extra questions:
Are really XC and XM tags the only differences with cellranger bam files?
Would there be a need to specify a list of "valid barcodes" or the bam file will contained only reads mapping to valid barcodes?
Are UMIs already error-corrected as in cell-ranger?

FloWuenne · 2017-11-06T21:56:37Z

Great to hear that this is already in the pipe 👍 !

I can provide you a BAM file that you would get from running the standard pipeline. The BAM contains still ALL barcodes, not just the real ones. So usually for getting a gene x cell matrix you still need to supply a file with valid Barcodes but since people running Drop-seq will have this list anyways it would be no extra effort for them to get it. If the user just supplies a list of Cell barcodes with one Cell barcode per row to velocyto, that would be the easiest for most Drop-seq users I believe.

Regarding the Cell-ranger questions, I have actually never analyzed 10x data with CellRanger so I don't really know what steps are in that downstream pipeline. The only extra tags in the Drop-seq BAMs should be XM and XC as far as I know.

The UMIs are error corrected in the sense that they are merged based on Hamming distance and that some general synthesis errors that might come from the beads. Besides that, I don't think there is any specific correction of UMIs.

gioelelm · 2017-11-06T22:46:44Z

Ok, I made some changes to the code. In principle (as soon as the .bam file it contains the tags you described) Drop-seq bam should work now! Just use the velocyto run subcommand and specify a barcode file.

Of course the feature it is untested yet, I will try as soon as you provide the link to a sample .bam and the correspondent barcode file. Could you also specify the correspondent .gtf to the version of the genome you are mapping against?

It might faster for you to test the fix yourself. Pull the latest commit and Install from source pip install -e .

FloWuenne · 2017-11-07T00:12:52Z

I will try it out as soon as I can and let you know whether it works or not!

Thanks so much for the quick integration and I will see whether everything works out!

FloWuenne · 2017-11-07T00:30:06Z

Im getting an errror when trying to install via pip:

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-kdva_8wf/velocyto/

Tried on our cluster as well on local computer where I have sudo rights and get the same message. Any suggestions how to fix this?!

Have the latest version pip installed and python 3.6.2...

gioelelm · 2017-11-07T00:33:32Z

Have you uninstalled previous version? pip uninstall velocyto?

gioelelm · 2017-11-07T00:34:59Z

I assume you are already following this guide right?

Installing in a conda environment is fundamental. Dealing with dependencies could be a pain otherwise.

gioelelm · 2017-11-09T14:42:39Z

Updates on the issue?

FloWuenne · 2017-11-09T15:02:02Z

I contacted support on our cluster because this is the main place where I would like velocyto to run. I guess its an issue with dependencies on the cluster. However, I also tried to install on my local machine and ran into the same problem. I do think it is just an issue with dependencies like you pointed out.

Will get back to you as soon as I got it installed ;).

FloWuenne · 2017-11-09T22:55:24Z

So update: Got it installed after fixing dependencies and updating conda for python.

I ran it on the Drop-seq BAM with supplying a list of barcodes. It terminated succesfully without any errors.

Whats the easiest way to check whether your updated code worked in terms of Cell barcode and UMI identification? Should I just follow one of the examples from here?
https://github.com/velocyto-team/velocyto.R

gioelelm · 2017-11-10T02:52:12Z

Start with calling plot_fractions and continue by following the tutorial or this notebook https://github.com/velocyto-team/velocyto-notebooks/tree/master/python. I would also quickly check some genes that I am familiar with, in the dataset, if it is similar to what you were getting with your standard pipeline, it is probably right, but you can check more estensivelly with a proper cell paorwise comparison.

gioelelm self-assigned this Nov 6, 2017

gioelelm added enhancement good first issue labels Nov 6, 2017

gioelelm added this to the 1.0 milestone Nov 6, 2017

gioelelm closed this as completed Nov 10, 2017

Hoohm mentioned this issue Nov 14, 2018

RNA velocity grst/single_cell_data_integration#6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Drop-seq bam files? #17

Support for Drop-seq bam files? #17

FloWuenne commented Nov 6, 2017

gioelelm commented Nov 6, 2017 •

edited

Loading

FloWuenne commented Nov 6, 2017

gioelelm commented Nov 6, 2017

FloWuenne commented Nov 7, 2017

FloWuenne commented Nov 7, 2017

gioelelm commented Nov 7, 2017

gioelelm commented Nov 7, 2017 •

edited

Loading

gioelelm commented Nov 9, 2017

FloWuenne commented Nov 9, 2017

FloWuenne commented Nov 9, 2017

gioelelm commented Nov 10, 2017

Support for Drop-seq bam files? #17

Support for Drop-seq bam files? #17

Comments

FloWuenne commented Nov 6, 2017

gioelelm commented Nov 6, 2017 • edited Loading

FloWuenne commented Nov 6, 2017

gioelelm commented Nov 6, 2017

FloWuenne commented Nov 7, 2017

FloWuenne commented Nov 7, 2017

gioelelm commented Nov 7, 2017

gioelelm commented Nov 7, 2017 • edited Loading

gioelelm commented Nov 9, 2017

FloWuenne commented Nov 9, 2017

FloWuenne commented Nov 9, 2017

gioelelm commented Nov 10, 2017

gioelelm commented Nov 6, 2017 •

edited

Loading

gioelelm commented Nov 7, 2017 •

edited

Loading