GitHub

Querying various single-cell "atlases"

TODO:

find datasets, download them
download cluster data, calculate cluster means?
test various methods for similarity search

Datasets

Tabula Muris

https://figshare.com/projects/Tabula_Muris_Transcriptomic_characterization_of_20_organs_and_tissues_from_Mus_musculus_at_single_cell_resolution/27733

This dataset contains cell type and tissue-annotated datasets, where the cell type annotation comes from Cell Ontology.

There are two single-cell sequencing datasets:

10x droplet seq dataset, with 422,803 droplets, 55,656 of which passed a QC cutoff of 500 genes and 1000 UMI
Smart-Seq2 sequencing of FACS-sorted cells, with 53,760 cells, 44,879 of which passed a QC cutoff of at least 500 genes and 50,000 reads

There are x cells and y cell types.

Microwell-seq

https://figshare.com/articles/MCA_DGE_Data/5435866

This dataset contains some batch-removed cells.

Querying

the main problem with querying is that the input data and the database might have different gene sets. right now we just subset the genes present in both the db and query. Is there a way to do this better?

ideas:

hamming distance with binarized data from input vs db?
rank correlation using only nonzero elements in query?
somehow combining hamming distance with rank correlation? average?
extracting top genes from each cell type in tabula muris

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
examples		examples
mouse_cell_query		mouse_cell_query
seurat_test		seurat_test
split_seq_spinal_cord		split_seq_spinal_cord
test		test
README.md		README.md
cell_ontology_to_cellmesh_tabula_muris.tsv		cell_ontology_to_cellmesh_tabula_muris.tsv
cellatlassearch_preprocessing.py		cellatlassearch_preprocessing.py
create_combined_figures.py		create_combined_figures.py
flip_all_matrices.py		flip_all_matrices.py
generate_tabula_muris_barcodes.py		generate_tabula_muris_barcodes.py
mca_cell_names_to_cellmarker.tsv		mca_cell_names_to_cellmarker.tsv
mca_coarse_cellmesh_query.py		mca_coarse_cellmesh_query.py
mca_fine_cellmesh_query.py		mca_fine_cellmesh_query.py
pbmc_cellmesh_query.py		pbmc_cellmesh_query.py
preprocess_mca_microwell_seq.py		preprocess_mca_microwell_seq.py
preprocess_tabula_muris.py		preprocess_tabula_muris.py
preprocess_tabula_muris_facs.py		preprocess_tabula_muris_facs.py
process_allen.py		process_allen.py
quantify_query_accuracy.py		quantify_query_accuracy.py
reprocess_mca.py		reprocess_mca.py
run_scquery_mca.py		run_scquery_mca.py
run_scquery_tm_facs.py		run_scquery_tm_facs.py
run_singleR.R		run_singleR.R
setup.py		setup.py
tabula_muris_droplet_cellmesh_query.py		tabula_muris_droplet_cellmesh_query.py
tabula_muris_droplet_heatmap.py		tabula_muris_droplet_heatmap.py
tabula_muris_droplet_lung_cellmesh_query.py		tabula_muris_droplet_lung_cellmesh_query.py
tabula_muris_droplet_subsample.py		tabula_muris_droplet_subsample.py
tabula_muris_facs_cellmesh_query.py		tabula_muris_facs_cellmesh_query.py
tabula_muris_facs_subsample.py		tabula_muris_facs_subsample.py
tabula_muris_lung_heatmap.py		tabula_muris_lung_heatmap.py
tabula_muris_to_panglao.tsv		tabula_muris_to_panglao.tsv
test_nn_tabula_muris.py		test_nn_tabula_muris.py
test_scmatch.py		test_scmatch.py
tm_cell_onto_alternate_names.tsv		tm_cell_onto_alternate_names.tsv
tm_to_scquery.tsv		tm_to_scquery.tsv
tm_top_100_genes.csv		tm_top_100_genes.csv
train_nn_tabula_muris_combined.py		train_nn_tabula_muris_combined.py
train_nn_tabula_muris_droplet.py		train_nn_tabula_muris_droplet.py
train_nn_tabula_muris_facs.py		train_nn_tabula_muris_facs.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Datasets

Tabula Muris

Microwell-seq

Querying

About

Releases

Packages

Languages

yjzhang/mouse_cell_query

Folders and files

Latest commit

History

Repository files navigation

Datasets

Tabula Muris

Microwell-seq

Querying

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages