Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to specify the AnnData var field that has the gene_symbols instead of only relying on var_names #87

Closed
pcm32 opened this issue Sep 20, 2023 · 2 comments

Comments

@pcm32
Copy link

pcm32 commented Sep 20, 2023

Hi there, thanks for this great tool.

Sometimes, AnnData files are indexed by ENSEMBL / NCBI gene identifiers rather than gene symbols. Could you add a parameter to the CLI to specify a field from the var from where to get the gene_symbols? Otherwise, for using the CLI, one would need to read in the AnnData into memory, do some modification to change the index (if at all possible) of var and then rewrite the AnnData, which can be a lot of time and disk space as well. It would be much nicer if it can be handled in in-memory in the CLI.

ChuanXu1 added a commit that referenced this issue Sep 22, 2023
@ChuanXu1
Copy link
Collaborator

@pcm32, you can provide a symbol-to-ID mapping file (one column being gene symbols and the other column being IDs), and use it to convert the model from gene symbols to Ensembl IDs.

#load a model
model = celltypist.Model.load("some_model.pkl")
#convert the model
model.convert("path_to_your_mapping_file")
#prediction
predictions = celltypist.annotate(input_data, model = model)

In the new version (1.6.1), I also added a mapping file based on GENCODE version 44. So you can use it if you do not find a mapping file yourself. Details can be found in the online tutorial (https://github.com/Teichlab/celltypist -> Usage (classification) -> Supplemental guidance -> Model conversion from gene symbols to Ensembl IDs)

@pcm32
Copy link
Author

pcm32 commented Oct 9, 2023

Converting the models as suggested works, thanks!

@pcm32 pcm32 closed this as completed Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants