-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why are some GO IDs (and associated genes) not included in the "main data table"? #12
Comments
Hi Laura - can you check in your GO annotations file, is there such term initially?
There are four ways an existing (ie actually among annotations) GO term might disappear:
- it has less than 5 genes assigned to it (option “smallest” in gomwuStats”)
- it has more than 10% of all genes assigned to it (option “largest" in gomwStats)
- it shares 75% or more of its genes with some GO term that contains more genes (option “clusterCutHeight” in gomwuStats, set it to 1-(proportion of shared genes for grouping); setting it to 0 should suppress clustering)
- it fully overlaps (contains exactly the same genes) with some GO term that is lower in the GO hierarchy. Really cannot change this one in any way unless you are willing to edit the perl code...
Misha
… On Jul 11, 2022, at 10:37 PM, Laura H Spencer ***@***.***> wrote:
I'm interested in particular GO terms in my input files, but they do not get included in the "main data table" (GO division)_(input filename) that GO_MWU actually analyzes. I understand that all original GO terms aren't actually analyzed because they are represented by either a) a more specific term, or b) a highly similar term. I would like to see which redundant/similar GO term absorbed the GO terms of interest, but can't find my GO term of interest in any of the GO_MWU output. Are all original GO IDs then supposed to be accounted for in the main data table?
Any insight would be very helpful. Attached is some GO_MWU code and results showing that the GO term of interest (and its associated genes) is missing from the GO_MWU output. You'll see that I have relaxed the filtering settings to not remove any GO categories that contain a large fraction of genes or only a few genes, in an effort to not throw out GO terms.
testing_gomwu.zip <https://github.com/z0on/GO_MWU/files/9087390/testing_gomwu.zip>
—
Reply to this email directly, view it on GitHub <#12>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABZUHGHGD5H7JP2UZ7YVFC3VTSAZZANCNFSM53IYAHDA>.
You are receiving this because you are subscribed to this thread.
|
Hi Misha, Thanks for the info! I've double checked that the GO term is indeed in the GO annotations file (go.obo) and is assigned to the namespace "biological_process" (see below), and that the GO term is in the background list of GO terms that I input into GO_MWU.
I have also experimented with relaxing the Thanks for the help! |
Sorry what is the “background list of go terms”? Go_mwu does not have that…
Question is, does you favorite GO term ever appear among *your* genes’
annotations, I mean the genes for which you have expression measured? If
yes, how many times?
(Go.obo is just the universal database of all possible GO terms, no wonder
it is there)
On Thu, Jul 14, 2022 at 8:16 PM Laura H Spencer ***@***.***> wrote:
Hi Misha,
Thanks for the info!
I've double checked that the GO term is indeed in the GO annotations file
(go.obo) and is assigned to the namespace "biological_process" (see below),
and that the GO term is in the background list of GO terms that I input
into GO_MWU.
[Term]
id: GO:0006313
name: transposition, DNA-mediated
namespace: biological_process
alt_id: GO:0006317
alt_id: GO:0006318
def: "Any process involved in a type of transpositional recombination which occurs via a DNA intermediate." [GOC:jp, ISBN:0198506732, ISBN:1555812090]
synonym: "Class II transposition" EXACT []
synonym: "DNA transposition" EXACT [GOC:dph]
synonym: "P-element excision" NARROW []
synonym: "P-element transposition" NARROW []
synonym: "Tc1/mariner transposition" NARROW []
synonym: "Tc3 transposition" NARROW []
is_a: GO:0006310 ! DNA recombination
is_a: GO:0032196 ! transposition
I have also experimented with relaxing the smallest and largest
(smallest=1, largest=.99), and set clustuerCutHeight=0, but my GO terms
are still missing. To your fourth bullet I found one offspring of my GO
term in the results, but it isn't associated with any of the genes that map
to my GO term of interest (the genes it does map to aren't significant).
Further, the genes that are associated with my GO term of interest are also
missing from the output - are all significant genes supposed to be
re-assigned to other GO terms, or are they supposed to be removed?
Thanks for the help!
—
Reply to this email directly, view it on GitHub
<#12 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZUHGFQTQXZU3YD4A3N7LTVUBKP5ANCNFSM53IYAHDA>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
cheers
Misha
matzlab.weebly.com
|
By "background" I mean the goAnnotations list, and yes, that list is lousy with my GO term of interest (6,259 of 29,127 genes are associated with my GO term). My genes contain 732 genes linked to my GO term of interest (it actually comprises ~28% of all "significant" genes). For context, I'm analyzing WGNCA results, so my input includes all genes measured, and then module membership scores for genes assigned to the focal module. And the GO term relates to transposons, of which there are many in my focal species' genome. I guess the big question now, and what I should have started this issue with, is why do so many of my genes get discarded despite me relaxing the settings that filter/merge GO terms? |
Can this term be filtered out because it is too broad (is associated with
more than 10% of all genes)? In that case try relaxing “largest” option to
gomwuStats from 0.1 (default) to say 0.3
On Fri, Jul 15, 2022 at 1:42 AM Laura H Spencer ***@***.***> wrote:
By "background" I mean the goAnnotations list, and yes, that list is lousy
with my GO term of interest (6,259 of 29,127 genes are associated with my
GO term).
*My* genes contain 732 genes linked to my GO term of interest (it
actually comprises ~28% of all "significant" genes).
For context, I'm analyzing WGNCA results, so my input includes all genes
measured, and then module membership scores for genes assigned to the focal
module. And the GO term relates to transposons, of which there are many in
my focal species' genome.
I guess the big question now, and what I should have started this issue
with, is why do so many of my genes get discarded despite me relaxing the
settings that filter/merge GO terms?
—
Reply to this email directly, view it on GitHub
<#12 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZUHGGJM2W4HE65V6ANBPLVUCQWRANCNFSM53IYAHDA>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
cheers
Misha
matzlab.weebly.com
|
Yes I have played with that setting quite a bit and tried various levels up to 0.99 (see code I attached in my first comment). Does that setting have a hard-coded ceiling? (E.g. nothing above 50% is analyzed) |
Hmm, that’s surely possible… let me check
On Fri, Jul 15, 2022 at 5:11 PM Laura H Spencer ***@***.***> wrote:
Yes I have played with that setting quite a bit and tried various levels
up to 0.99 (see code I attached in my first comment). Is it possible that
setting isn’t actually registered by the underlying functions?
—
Reply to this email directly, view it on GitHub
<#12 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZUHGHAAMAE73M2Q2GO5STVUF5ORANCNFSM53IYAHDA>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
cheers
Misha
matzlab.weebly.com
|
Thanks for checking! I've tried playing with the perl code but haven't had any breakthroughs |
so what about this tho:
does it print out something like this, and if yes, does the first number
change when you change the “largest” option?
Run parameters:
largest GO category as fraction of all genes (largest) : 0.1
smallest GO category as # of genes (smallest) : 5
clustering threshold (clusterCutHeight) : 0.25
…On Mon, Jul 25, 2022 at 1:44 PM Laura H Spencer ***@***.***> wrote:
Thanks for checking! I've tried playing with the perl code but haven't had
any breakthroughs
—
Reply to this email directly, view it on GitHub
<#12 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZUHGE6PUV7EQDOIVPP5QLVV3OBZANCNFSM53IYAHDA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Yes the output changes- for example here's the output when I used the following settings:
|
hmm. Just making sure: the option is clusterCutHeight (not cutHeight as
your last email says) - is this how you ran it?
…On Mon, Jul 25, 2022 at 3:26 PM Laura H Spencer ***@***.***> wrote:
Yes the output changes- for example here's the output when I used the
following settings: largest=0.99 smallest=1 cutHeight=0 (genes of
interest still get discarded).
go.obo WGCNA-genes_for-GOMWU.tab WGCNA-module_lightgreen.csv BP largest=0.99 smallest=1 cutHeight=0
Run parameters:
largest GO category as fraction of all genes (largest) : 0.99
smallest GO category as # of genes (smallest) : 1
clustering threshold (clusterCutHeight) : 0
-----------------
retrieving GO hierarchy, reformatting data...
-------------
go_reformat:
Genes with GO annotations, but not listed in measure table: 1
Terms without defined level (old ontology?..): 0
-------------
-------------
go_nrify:
1174 categories, 2585 genes; size range 1-2559.15
1 too broad
0 too small
1173 remaining
removing redundancy:
calculating GO term similarities based on shared genes...
598 non-redundant GO categories of good size
—
Reply to this email directly, view it on GitHub
<#12 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZUHGAKSL5U43YE2QOFBSLVV3Z7FANCNFSM53IYAHDA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
yes, sorry, i definitely used option |
Ah, I see!
here is the modified gomwu.functions.R file, plop it into your GO_MWU
directory (replace old file) and give it a shot?
…On Mon, Jul 25, 2022 at 6:11 PM Laura H Spencer ***@***.***> wrote:
yes, sorry, i definitely used option clusterCutHeight
—
Reply to this email directly, view it on GitHub
<#12 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZUHGG7PVFWG6HHNYXGYKDVV4NKHANCNFSM53IYAHDA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
does it print out something like this, and if yes, does the first number change when you change the “largest” option?
Run parameters:
largest GO category as fraction of all genes (largest) : 0.1
smallest GO category as # of genes (smallest) : 5
clustering threshold (clusterCutHeight) : 0.25
… On Jul 15, 2022, at 9:36 PM, Mikhail V Matz ***@***.***> wrote:
Hmm, that’s surely possible… let me check
On Fri, Jul 15, 2022 at 5:11 PM Laura H Spencer ***@***.*** ***@***.***>> wrote:
Yes I have played with that setting quite a bit and tried various levels up to 0.99 (see code I attached in my first comment). Is it possible that setting isn’t actually registered by the underlying functions?
—
Reply to this email directly, view it on GitHub <#12 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABZUHGHAAMAE73M2Q2GO5STVUF5ORANCNFSM53IYAHDA>.
You are receiving this because you commented.
--
cheers
Misha
matzlab.weebly.com <http://matzlab.weebly.com/>
|
I'm interested in particular GO terms in my input files, but they do not get included in the "main data table" (GO division)_(input filename). I understand that all original GO terms aren't actually analyzed because they are represented by either a) a more specific term, or b) a highly similar term. I would like to see which redundant/similar GO term absorbed the GO terms of interest, but can't find my GO term of interest in any of the GO_MWU output. Are all original GO IDs then supposed to be accounted for in the main data table?
Any insight would be very helpful. Attached is some GO_MWU code and results showing that the GO term of interest (and its associated genes) is missing from the GO_MWU output. You'll see that I have relaxed the filtering settings to not remove any GO categories that contain a large fraction of genes or only a few genes, in an effort to not throw out GO terms.
testing_gomwu.zip
The text was updated successfully, but these errors were encountered: