Package to query our single cell genomics KnowledgeBase.
If you use Cytopus 🐙 or its gene sets please cite the original Cytopus publications & .
For details see the license
The KnowledgeBase is provided in graph format based on the networkx package. Central to the KnowledgeBase is a cell type hierarchy and cellular_processess which correspond to the cell types in this hierarchy. Cell types are supported by gene sets indicative of their cellular identities. Moreover, the KnowledgeBase contains metadata about the gene sets such as author ship, the gene set topic etc..
The KnowledgeBase can be queried to retrieve gene sets for specific cell types and organize them in a dictionary format for downstream use with the Spectra package:
install from pypi:
pip install cytopus
install from source:
pip install git+https://github.com/wallet-maker/cytopus.git
Some plotting functions require pygraphviz or pyvis. Install either or both:
pygraphviz using conda:
conda install --channel conda-forge pygraphviz
pyvis using pip
pip install pyvis
Retrieve default KnowledgeBase (human only):
import cytopus as cp
G = cp.KnowledgeBase()
Retrieve custom KnowledgeBase (documentation to build KnowledgeBase object here):
file_path = '~/dir1/dir2/knowledgebase_file.txt'
G = cp.KnowledgeBase(file_path)
Access data in KnowledgeBase:
#list of all cell types in KnowledgeBase
G.celltypes
#dictionary of all cellular processes in KnowledgeBase as a dictionary {'process_1':['gene_a','gene_e','gene_y',...],'process_2':['gene_b','gene_u',...],...}
G.processes
#dictionary of all cellular identities in KnowledgeBase as a dictionary {'identity_1':['gene_j','gene_k','gene_z',...],'identity_2':['gene_y','gene_p',...],...}
G.identities
#dictionary with gene set properties (for cellular processes or identities)
G.graph.nodes['gene_set_name']
Plot the cell type hierarchy stored in the KnowledgeBase as a directed graph with edges pointing into the direction of the parents:
G.plot_celltypes()
Prepare a nested dictionary assigning cell types to their cellular processes and cellular processes to their corresponding genes. This dictionary can be used as an input for Spectra.
First, select the cell types which you want to retrieve gene sets for. These cell types can be selected from the cell type hierarchy (see .plot_celltypes() method above)
celltype_of_interest = ['M','T','B','epi']
Second, select the cell types which you want merge gene sets and set them as global gene sets for the Spectra package. These gene sets should be valid for all cell types in the data.
##e.g. if you are working with different human cells
global_celltypes = ['all-cells']
##e.g. if you are working with human leukocytes
global_celltypes = ['all-cells','leukocyte']
##e.g. if you are working with B cells
global_celltypes = ['all-cells','leukocyte','B']
Third retrieve dictionary of format {celltype_a:{process_a:[gene_a,gene_b,...],...},...}. Decide whether you want to merge gene sets for all children or all parents (unusual) of the selected cell types.
G.get_celltype_processes(celltype_of_interest,global_celltypes = global_celltypes,get_children=True,get_parents =False)
Fourth, dictionary will be stored in the KnowledgeBase
G.celltype_process_dict
Learn how to explore the Knowledge Base and retrieve a dicitionary which can be used for Spectra:
Learn how to create a Knowledge Base object from gene sets annotations and cell type hierarchies stored in .csv files: here
Learn how to label marker genes from factor analysis, determine factor cell type specificity and export the Knowledge Base content as .gmt files for other applications:
Hierarchically annotate and query cells using AnnData and Cytopus:
submit gene sets to be added to the KnowledgeBase here:
All submissions will be reviewed and if needed revised before they will be added to the database. This will ensure consistency of the annotations and avoid gene set duplication. Authorship will be acknowledged in the KnowledgeBase for all submitted gene sets which pass review and are added to the KnowledgeBase. You can also create entirely new KnowledgeBase objects with this package.
For gene sets from external sources you must also abide to the licenses of the original gene sets. To make this easier we have stored these in the Knowledge Base object:
import cytopus as cp
G = cp.KnowledgeBase()
gene_set_of_interest = 'all_macroautophagy_regulation_positive'
print(G.graph.nodes[gene_set_of_interest])