Why are some GO IDs (and associated genes) not included in the "main data table"? #12

laurahspencer · 2022-07-11T20:37:37Z

I'm interested in particular GO terms in my input files, but they do not get included in the "main data table" (GO division)_(input filename). I understand that all original GO terms aren't actually analyzed because they are represented by either a) a more specific term, or b) a highly similar term. I would like to see which redundant/similar GO term absorbed the GO terms of interest, but can't find my GO term of interest in any of the GO_MWU output. Are all original GO IDs then supposed to be accounted for in the main data table?

Any insight would be very helpful. Attached is some GO_MWU code and results showing that the GO term of interest (and its associated genes) is missing from the GO_MWU output. You'll see that I have relaxed the filtering settings to not remove any GO categories that contain a large fraction of genes or only a few genes, in an effort to not throw out GO terms.

testing_gomwu.zip

z0on · 2022-07-13T07:55:04Z

Hi Laura - can you check in your GO annotations file, is there such term initially? There are four ways an existing (ie actually among annotations) GO term might disappear: - it has less than 5 genes assigned to it (option “smallest” in gomwuStats”) - it has more than 10% of all genes assigned to it (option “largest" in gomwStats) - it shares 75% or more of its genes with some GO term that contains more genes (option “clusterCutHeight” in gomwuStats, set it to 1-(proportion of shared genes for grouping); setting it to 0 should suppress clustering) - it fully overlaps (contains exactly the same genes) with some GO term that is lower in the GO hierarchy. Really cannot change this one in any way unless you are willing to edit the perl code... Misha

…

On Jul 11, 2022, at 10:37 PM, Laura H Spencer ***@***.***> wrote: I'm interested in particular GO terms in my input files, but they do not get included in the "main data table" (GO division)_(input filename) that GO_MWU actually analyzes. I understand that all original GO terms aren't actually analyzed because they are represented by either a) a more specific term, or b) a highly similar term. I would like to see which redundant/similar GO term absorbed the GO terms of interest, but can't find my GO term of interest in any of the GO_MWU output. Are all original GO IDs then supposed to be accounted for in the main data table? Any insight would be very helpful. Attached is some GO_MWU code and results showing that the GO term of interest (and its associated genes) is missing from the GO_MWU output. You'll see that I have relaxed the filtering settings to not remove any GO categories that contain a large fraction of genes or only a few genes, in an effort to not throw out GO terms. testing_gomwu.zip <https://github.com/z0on/GO_MWU/files/9087390/testing_gomwu.zip> — Reply to this email directly, view it on GitHub <#12>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABZUHGHGD5H7JP2UZ7YVFC3VTSAZZANCNFSM53IYAHDA>. You are receiving this because you are subscribed to this thread.

laurahspencer · 2022-07-14T18:16:19Z

Hi Misha,

Thanks for the info!

I've double checked that the GO term is indeed in the GO annotations file (go.obo) and is assigned to the namespace "biological_process" (see below), and that the GO term is in the background list of GO terms that I input into GO_MWU.

[Term]
id: GO:0006313
name: transposition, DNA-mediated
namespace: biological_process
alt_id: GO:0006317
alt_id: GO:0006318
def: "Any process involved in a type of transpositional recombination which occurs via a DNA intermediate." [GOC:jp, ISBN:0198506732, ISBN:1555812090]
synonym: "Class II transposition" EXACT []
synonym: "DNA transposition" EXACT [GOC:dph]
synonym: "P-element excision" NARROW []
synonym: "P-element transposition" NARROW []
synonym: "Tc1/mariner transposition" NARROW []
synonym: "Tc3 transposition" NARROW []
is_a: GO:0006310 ! DNA recombination
is_a: GO:0032196 ! transposition

I have also experimented with relaxing the smallest and largest (smallest=1, largest=.99), and set clustuerCutHeight=0, but my GO terms are still missing. To your fourth bullet I found one offspring of my GO term in the results, but it isn't associated with any of the genes that map to my GO term of interest (the genes it does map to aren't significant). Further, the genes that are associated with my GO term of interest are also missing from the output - are all significant genes supposed to be re-assigned to other GO terms, or are they supposed to be removed?

Thanks for the help!

z0on · 2022-07-14T23:04:45Z

Sorry what is the “background list of go terms”? Go_mwu does not have that… Question is, does you favorite GO term ever appear among *your* genes’ annotations, I mean the genes for which you have expression measured? If yes, how many times? (Go.obo is just the universal database of all possible GO terms, no wonder it is there)

On Thu, Jul 14, 2022 at 8:16 PM Laura H Spencer ***@***.***> wrote: Hi Misha, Thanks for the info! I've double checked that the GO term is indeed in the GO annotations file (go.obo) and is assigned to the namespace "biological_process" (see below), and that the GO term is in the background list of GO terms that I input into GO_MWU. [Term] id: GO:0006313 name: transposition, DNA-mediated namespace: biological_process alt_id: GO:0006317 alt_id: GO:0006318 def: "Any process involved in a type of transpositional recombination which occurs via a DNA intermediate." [GOC:jp, ISBN:0198506732, ISBN:1555812090] synonym: "Class II transposition" EXACT [] synonym: "DNA transposition" EXACT [GOC:dph] synonym: "P-element excision" NARROW [] synonym: "P-element transposition" NARROW [] synonym: "Tc1/mariner transposition" NARROW [] synonym: "Tc3 transposition" NARROW [] is_a: GO:0006310 ! DNA recombination is_a: GO:0032196 ! transposition I have also experimented with relaxing the smallest and largest (smallest=1, largest=.99), and set clustuerCutHeight=0, but my GO terms are still missing. To your fourth bullet I found one offspring of my GO term in the results, but it isn't associated with any of the genes that map to my GO term of interest (the genes it does map to aren't significant). Further, the genes that are associated with my GO term of interest are also missing from the output - are all significant genes supposed to be re-assigned to other GO terms, or are they supposed to be removed? Thanks for the help! — Reply to this email directly, view it on GitHub <#12 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABZUHGFQTQXZU3YD4A3N7LTVUBKP5ANCNFSM53IYAHDA> . You are receiving this because you commented.Message ID: ***@***.***>

-- cheers Misha matzlab.weebly.com

laurahspencer · 2022-07-14T23:42:22Z

By "background" I mean the goAnnotations list, and yes, that list is lousy with my GO term of interest (6,259 of 29,127 genes are associated with my GO term).

My genes contain 732 genes linked to my GO term of interest (it actually comprises ~28% of all "significant" genes).

For context, I'm analyzing WGNCA results, so my input includes all genes measured, and then module membership scores for genes assigned to the focal module. And the GO term relates to transposons, of which there are many in my focal species' genome.

I guess the big question now, and what I should have started this issue with, is why do so many of my genes get discarded despite me relaxing the settings that filter/merge GO terms?

z0on · 2022-07-15T06:50:40Z

Can this term be filtered out because it is too broad (is associated with more than 10% of all genes)? In that case try relaxing “largest” option to gomwuStats from 0.1 (default) to say 0.3

On Fri, Jul 15, 2022 at 1:42 AM Laura H Spencer ***@***.***> wrote: By "background" I mean the goAnnotations list, and yes, that list is lousy with my GO term of interest (6,259 of 29,127 genes are associated with my GO term). *My* genes contain 732 genes linked to my GO term of interest (it actually comprises ~28% of all "significant" genes). For context, I'm analyzing WGNCA results, so my input includes all genes measured, and then module membership scores for genes assigned to the focal module. And the GO term relates to transposons, of which there are many in my focal species' genome. I guess the big question now, and what I should have started this issue with, is why do so many of my genes get discarded despite me relaxing the settings that filter/merge GO terms? — Reply to this email directly, view it on GitHub <#12 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABZUHGGJM2W4HE65V6ANBPLVUCQWRANCNFSM53IYAHDA> . You are receiving this because you commented.Message ID: ***@***.***>

-- cheers Misha matzlab.weebly.com

laurahspencer · 2022-07-15T15:10:21Z

Yes I have played with that setting quite a bit and tried various levels up to 0.99 (see code I attached in my first comment). Does that setting have a hard-coded ceiling? (E.g. nothing above 50% is analyzed)

z0on · 2022-07-15T19:37:09Z

Hmm, that’s surely possible… let me check

On Fri, Jul 15, 2022 at 5:11 PM Laura H Spencer ***@***.***> wrote: Yes I have played with that setting quite a bit and tried various levels up to 0.99 (see code I attached in my first comment). Is it possible that setting isn’t actually registered by the underlying functions? — Reply to this email directly, view it on GitHub <#12 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABZUHGHAAMAE73M2Q2GO5STVUF5ORANCNFSM53IYAHDA> . You are receiving this because you commented.Message ID: ***@***.***>

-- cheers Misha matzlab.weebly.com

laurahspencer · 2022-07-25T18:44:34Z

Thanks for checking! I've tried playing with the perl code but haven't had any breakthroughs

z0on · 2022-07-25T18:51:36Z

so what about this tho: does it print out something like this, and if yes, does the first number change when you change the “largest” option? Run parameters: largest GO category as fraction of all genes (largest) : 0.1 smallest GO category as # of genes (smallest) : 5 clustering threshold (clusterCutHeight) : 0.25

…

On Mon, Jul 25, 2022 at 1:44 PM Laura H Spencer ***@***.***> wrote: Thanks for checking! I've tried playing with the perl code but haven't had any breakthroughs — Reply to this email directly, view it on GitHub <#12 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABZUHGE6PUV7EQDOIVPP5QLVV3OBZANCNFSM53IYAHDA> . You are receiving this because you commented.Message ID: ***@***.***>

laurahspencer · 2022-07-25T20:26:15Z

Yes the output changes- for example here's the output when I used the following settings: largest=0.99 smallest=1 cutHeight=0 (genes of interest still get discarded).

go.obo WGCNA-genes_for-GOMWU.tab WGCNA-module_lightgreen.csv BP largest=0.99 smallest=1 cutHeight=0

Run parameters:

largest GO category as fraction of all genes (largest)  : 0.99
         smallest GO category as # of genes (smallest)  : 1
                clustering threshold (clusterCutHeight) : 0

-----------------
retrieving GO hierarchy, reformatting data...

-------------
go_reformat:
Genes with GO annotations, but not listed in measure table: 1

Terms without defined level (old ontology?..): 0
-------------
-------------
go_nrify:
1174 categories, 2585 genes; size range 1-2559.15
	1 too broad
	0 too small
	1173 remaining

removing redundancy:

calculating GO term similarities based on shared genes...
598 non-redundant GO categories of good size

z0on · 2022-07-25T23:02:07Z

hmm. Just making sure: the option is clusterCutHeight (not cutHeight as your last email says) - is this how you ran it?

…

On Mon, Jul 25, 2022 at 3:26 PM Laura H Spencer ***@***.***> wrote: Yes the output changes- for example here's the output when I used the following settings: largest=0.99 smallest=1 cutHeight=0 (genes of interest still get discarded). go.obo WGCNA-genes_for-GOMWU.tab WGCNA-module_lightgreen.csv BP largest=0.99 smallest=1 cutHeight=0 Run parameters: largest GO category as fraction of all genes (largest) : 0.99 smallest GO category as # of genes (smallest) : 1 clustering threshold (clusterCutHeight) : 0 ----------------- retrieving GO hierarchy, reformatting data... ------------- go_reformat: Genes with GO annotations, but not listed in measure table: 1 Terms without defined level (old ontology?..): 0 ------------- ------------- go_nrify: 1174 categories, 2585 genes; size range 1-2559.15 1 too broad 0 too small 1173 remaining removing redundancy: calculating GO term similarities based on shared genes... 598 non-redundant GO categories of good size — Reply to this email directly, view it on GitHub <#12 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABZUHGAKSL5U43YE2QOFBSLVV3Z7FANCNFSM53IYAHDA> . You are receiving this because you commented.Message ID: ***@***.***>

laurahspencer · 2022-07-25T23:11:21Z

yes, sorry, i definitely used option clusterCutHeight

z0on · 2022-07-25T23:56:16Z

Ah, I see! here is the modified gomwu.functions.R file, plop it into your GO_MWU directory (replace old file) and give it a shot?

…

On Mon, Jul 25, 2022 at 6:11 PM Laura H Spencer ***@***.***> wrote: yes, sorry, i definitely used option clusterCutHeight — Reply to this email directly, view it on GitHub <#12 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABZUHGG7PVFWG6HHNYXGYKDVV4NKHANCNFSM53IYAHDA> . You are receiving this because you commented.Message ID: ***@***.***>

z0on · 2022-10-11T07:49:52Z

does it print out something like this, and if yes, does the first number change when you change the “largest” option? Run parameters: largest GO category as fraction of all genes (largest) : 0.1 smallest GO category as # of genes (smallest) : 5 clustering threshold (clusterCutHeight) : 0.25

…

On Jul 15, 2022, at 9:36 PM, Mikhail V Matz ***@***.***> wrote: Hmm, that’s surely possible… let me check On Fri, Jul 15, 2022 at 5:11 PM Laura H Spencer ***@***.*** ***@***.***>> wrote: Yes I have played with that setting quite a bit and tried various levels up to 0.99 (see code I attached in my first comment). Is it possible that setting isn’t actually registered by the underlying functions? — Reply to this email directly, view it on GitHub <#12 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABZUHGHAAMAE73M2Q2GO5STVUF5ORANCNFSM53IYAHDA>. You are receiving this because you commented. -- cheers Misha matzlab.weebly.com <http://matzlab.weebly.com/>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why are some GO IDs (and associated genes) not included in the "main data table"? #12

Why are some GO IDs (and associated genes) not included in the "main data table"? #12

laurahspencer commented Jul 11, 2022 •

edited

Loading

z0on commented Jul 13, 2022 via email

laurahspencer commented Jul 14, 2022

z0on commented Jul 14, 2022 via email

laurahspencer commented Jul 14, 2022

z0on commented Jul 15, 2022 via email

laurahspencer commented Jul 15, 2022 •

edited

Loading

z0on commented Jul 15, 2022 via email

laurahspencer commented Jul 25, 2022

z0on commented Jul 25, 2022 via email

laurahspencer commented Jul 25, 2022

z0on commented Jul 25, 2022 via email

laurahspencer commented Jul 25, 2022

z0on commented Jul 25, 2022 via email

z0on commented Oct 11, 2022 via email

Why are some GO IDs (and associated genes) not included in the "main data table"? #12

Why are some GO IDs (and associated genes) not included in the "main data table"? #12

Comments

laurahspencer commented Jul 11, 2022 • edited Loading

z0on commented Jul 13, 2022 via email

laurahspencer commented Jul 14, 2022

z0on commented Jul 14, 2022 via email

laurahspencer commented Jul 14, 2022

z0on commented Jul 15, 2022 via email

laurahspencer commented Jul 15, 2022 • edited Loading

z0on commented Jul 15, 2022 via email

laurahspencer commented Jul 25, 2022

z0on commented Jul 25, 2022 via email

laurahspencer commented Jul 25, 2022

z0on commented Jul 25, 2022 via email

laurahspencer commented Jul 25, 2022

z0on commented Jul 25, 2022 via email

z0on commented Oct 11, 2022 via email

laurahspencer commented Jul 11, 2022 •

edited

Loading

laurahspencer commented Jul 15, 2022 •

edited

Loading