Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different results when restricting background set to only include genes with at least one annotation term #259

Open
Maryam-Haghani opened this issue Jan 27, 2023 · 1 comment

Comments

@Maryam-Haghani
Copy link

Hi,

I performed the enrichment analysis using my background set, and repeated it restricting the background set to only include genes with at least one annotation term (based on annotation file that the analysis is using).

I realized that GOATOOLS takes into account all background set genes without considering whether or not each gene has an annotation term. As a result, the findings of these two analyses had different P-values and GO term significance levels.

I'm wondering to know why GOATOOLS does not apply this filter by default in order to do a more accurate enrichment study.

Thanks!

@Maryam-Haghani Maryam-Haghani changed the title Different GO enrichments when filtering background set to those genes having at least one annotation term Different results when restricting background set to only include genes with at least one annotation term Jan 27, 2023
@dvklopfenstein
Copy link
Collaborator

Changing the background population genes will most likely result in different pvalues than if using the original background population genes. This is correct behavior.

If the background population genes are reduced by removing unannotated genes, the same should be done with the study genes.

Even with the reduction in both the population and study set of genes, the pvalues will still likely to be different than not removing any genes due to the random chance that the distribution of unannotated genes in the total population and the distribution of unannotated total study population will differ from gene set to gene set. This is expected behavior.

GOA Tools keeps all study genes and population genes by default. However, reseachers wishing to develop criteria to remove population genes are able to do so due to the GOA Tools architecture that separates managing the databases (GO ontology DAG and annotations) from running the statistical tests.

Please feel free to apply any filtering functions on the population genes, but also ensure the same filter is applied to the study gene sets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants