Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: More informative "Gene_set" information in custom Enrichr run #181

Closed
dnjst opened this issue Dec 13, 2022 · 2 comments
Closed

Comments

@dnjst
Copy link

dnjst commented Dec 13, 2022

Thank you for the program!

I have the same connection lost error many others are reporting #117 #153 #174. The error message was not revealing that it may be throttling from the website, or connection errors.

I would like to get into contact with Enrichr whether it is an intentional throttling or a server issue.

I downloaded the files from https://maayanlab.cloud/Enrichr/#libraries to .gmt files as you suggested in issue #153 and it works offline as a temporary fix.

However, the results dataframes under the "Gene_set" column now just list a name like "CUSTOM47448322283840", which is understandable because it no longer knows which the gene set is.

Would it be possible to give more informative names in the report?

For example:

gene_sets=["./Enrichr/KEGG_2021_Human.gmt", "./Enrichr/GO_Biological_Process_2021.gmt"]

would result in "KEGG_2021_Human.gmt" and "GO_Biological_process_2021.gmt" as the titles in the results dataframe, and only appending all of the numbers if the filenames are identical, or something?

Gene_set	Term	Overlap	P-value	Adjusted P-value	Odds Ratio	Genes
CUSTOM47448322283840	aerobic electron transport chain (GO:0019646)	6/70	1.03010752590765E-07	3.78221977804328E-05	30.6190878775069	UQCRC1;COX6A1;COX8A;UQCRB;COX7C;COX7B
CUSTOM47448322283840	mitochondrial ATP synthesis coupled electron transport (GO:0042775)	6/71	1.12232040891492E-07	3.78221977804328E-05	30.1896447922938	UQCRC1;COX6A1;COX8A;UQCRB;COX7C;COX7B
CUSTOM47448322283840	platelet aggregation (GO:0070527)	5/36	1.06724547431451E-07	3.78221977804328E-05	49.4551884680813	MYL12A;ACTG1;ACTB;HSPB1;GNAS

turns into

Gene_set	Term	Overlap	P-value	Adjusted P-value	Odds Ratio	Genes
GO_Biological_Process_2021.gmt	aerobic electron transport chain (GO:0019646)	6/70	1.03010752590765E-07	3.78221977804328E-05	30.6190878775069	UQCRC1;COX6A1;COX8A;UQCRB;COX7C;COX7B
GO_Biological_Process_2021.gmt	mitochondrial ATP synthesis coupled electron transport (GO:0042775)	6/71	1.12232040891492E-07	3.78221977804328E-05	30.1896447922938	UQCRC1;COX6A1;COX8A;UQCRB;COX7C;COX7B
GO_Biological_Process_2021.gmt	platelet aggregation (GO:0070527)	5/36	1.06724547431451E-07	3.78221977804328E-05	49.4551884680813	MYL12A;ACTG1;ACTB;HSPB1;GNAS

would be nice.

Also possible would be a way to manually specify the names of the Gene sets:

gene_sets={"KEGG" : "./Enrichr/KEGG_2021_Human.gmt", "GO BP" : "./Enrichr/GO_Biological_Process_2021.gmt"}

but I think that would conflict with the dictionary syntax for passing a single list.

zqfang pushed a commit that referenced this issue Dec 16, 2022
@zqfang
Copy link
Owner

zqfang commented Dec 16, 2022

Hi @dnjst,

Yes, for the default behaivour, same id number is append if filename is the same.

The lastest push is what your expected now.

However, the solution I came up for the dictionary input would stores it's index value.

e.g.

genes_sets=["/path/to/a.gmt",  {'term':[] ...}, "KEGG_2021"]

then, the output Gene_set column will be

a.gmt
gs_ind_1
KEGG_2021

I'll upload a new version to pypi soon

@zqfang
Copy link
Owner

zqfang commented Dec 20, 2022

Fixed in new release v1.0.3

@zqfang zqfang closed this as completed Dec 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants