Suggestion: More informative "Gene_set" information in custom Enrichr run #181

dnjst · 2022-12-13T22:01:04Z

Thank you for the program!

I have the same connection lost error many others are reporting #117 #153 #174. The error message was not revealing that it may be throttling from the website, or connection errors.

I would like to get into contact with Enrichr whether it is an intentional throttling or a server issue.

I downloaded the files from https://maayanlab.cloud/Enrichr/#libraries to .gmt files as you suggested in issue #153 and it works offline as a temporary fix.

However, the results dataframes under the "Gene_set" column now just list a name like "CUSTOM47448322283840", which is understandable because it no longer knows which the gene set is.

Would it be possible to give more informative names in the report?

For example:

gene_sets=["./Enrichr/KEGG_2021_Human.gmt", "./Enrichr/GO_Biological_Process_2021.gmt"]

would result in "KEGG_2021_Human.gmt" and "GO_Biological_process_2021.gmt" as the titles in the results dataframe, and only appending all of the numbers if the filenames are identical, or something?

Gene_set	Term	Overlap	P-value	Adjusted P-value	Odds Ratio	Genes
CUSTOM47448322283840	aerobic electron transport chain (GO:0019646)	6/70	1.03010752590765E-07	3.78221977804328E-05	30.6190878775069	UQCRC1;COX6A1;COX8A;UQCRB;COX7C;COX7B
CUSTOM47448322283840	mitochondrial ATP synthesis coupled electron transport (GO:0042775)	6/71	1.12232040891492E-07	3.78221977804328E-05	30.1896447922938	UQCRC1;COX6A1;COX8A;UQCRB;COX7C;COX7B
CUSTOM47448322283840	platelet aggregation (GO:0070527)	5/36	1.06724547431451E-07	3.78221977804328E-05	49.4551884680813	MYL12A;ACTG1;ACTB;HSPB1;GNAS

turns into

Gene_set	Term	Overlap	P-value	Adjusted P-value	Odds Ratio	Genes
GO_Biological_Process_2021.gmt	aerobic electron transport chain (GO:0019646)	6/70	1.03010752590765E-07	3.78221977804328E-05	30.6190878775069	UQCRC1;COX6A1;COX8A;UQCRB;COX7C;COX7B
GO_Biological_Process_2021.gmt	mitochondrial ATP synthesis coupled electron transport (GO:0042775)	6/71	1.12232040891492E-07	3.78221977804328E-05	30.1896447922938	UQCRC1;COX6A1;COX8A;UQCRB;COX7C;COX7B
GO_Biological_Process_2021.gmt	platelet aggregation (GO:0070527)	5/36	1.06724547431451E-07	3.78221977804328E-05	49.4551884680813	MYL12A;ACTG1;ACTB;HSPB1;GNAS

would be nice.

Also possible would be a way to manually specify the names of the Gene sets:

gene_sets={"KEGG" : "./Enrichr/KEGG_2021_Human.gmt", "GO BP" : "./Enrichr/GO_Biological_Process_2021.gmt"}

but I think that would conflict with the dictionary syntax for passing a single list.

The text was updated successfully, but these errors were encountered:

zqfang · 2022-12-16T20:19:04Z

Hi @dnjst,

Yes, for the default behaivour, same id number is append if filename is the same.

The lastest push is what your expected now.

However, the solution I came up for the dictionary input would stores it's index value.

e.g.

genes_sets=["/path/to/a.gmt",  {'term':[] ...}, "KEGG_2021"]

then, the output Gene_set column will be

a.gmt
gs_ind_1
KEGG_2021

I'll upload a new version to pypi soon

zqfang · 2022-12-20T17:59:00Z

Fixed in new release v1.0.3

zqfang pushed a commit that referenced this issue Dec 16, 2022

gene_set name for dict, gmt, #181

132d68e

zqfang added the enhancement label Dec 20, 2022

zqfang closed this as completed Dec 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: More informative "Gene_set" information in custom Enrichr run #181

Suggestion: More informative "Gene_set" information in custom Enrichr run #181

dnjst commented Dec 13, 2022 •

edited

zqfang commented Dec 16, 2022

zqfang commented Dec 20, 2022

Suggestion: More informative "Gene_set" information in custom Enrichr run #181

Suggestion: More informative "Gene_set" information in custom Enrichr run #181

Comments

dnjst commented Dec 13, 2022 • edited

zqfang commented Dec 16, 2022

zqfang commented Dec 20, 2022

dnjst commented Dec 13, 2022 •

edited