Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GO_MWU.R error #7

Closed
Ruiqi-CUB opened this issue Dec 9, 2020 · 34 comments
Closed

GO_MWU.R error #7

Ruiqi-CUB opened this issue Dec 9, 2020 · 34 comments

Comments

@Ruiqi-CUB
Copy link

Ruiqi-CUB commented Dec 9, 2020

Hello Dr. Matz,

I was trying to run the GO_MWU.R but ended up with an error at the very first step. Would you mind having a look?

The code I tried to run was

gomwuStats(input, goDatabase, goAnnotations, goDivision,
	perlPath="perl", # replace with full path to perl executable if it is not in your system's PATH already
	largest=0.1,  # a GO category will not be considered if it contains more than this fraction of the total number of genes
	smallest=5,   # a GO category should contain at least this many genes to be considered
	clusterCutHeight=0.25, # threshold for merging similar (gene-sharing) terms. See README for details.
#	Alternative="g" # by default the MWU test is two-tailed; specify "g" or "l" of you want to test for "greater" or "less" instead. 
#	Module=TRUE,Alternative="g" # un-remark this if you are analyzing a SIGNED WGCNA module (values: 0 for not in module genes, kME for in-module genes). In the call to gomwuPlot below, specify absValue=0.001 (count number of "good genes" that fall into the module)
#	Module=TRUE # un-remark this if you are analyzing an UNSIGNED WGCNA module 
)

The error I got was

go.obo scruposum_gene2go.tab scruposum_foldchange.csv CC largest=0.1 smallest=5 cutHeight=0.25

Run parameters:

largest GO category as fraction of all genes (largest)  : 0.1
         smallest GO category as # of genes (smallest)  : 5
                clustering threshold (clusterCutHeight) : 0.25

-----------------
retrieving GO hierarchy, reformatting data...

-------------
go_reformat:
Genes with GO annotations, but not listed in measure table: 41394

Terms without defined level (old ontology?..): 0
-------------
-------------
go_nrify:
0 categories, 0 genes; size range 5-0
	0 too broad
	0 too small
	0 remaining

removing redundancy:

calculating GO term similarities based on shared genes...

 Error in read.table(inname, sep = "\t", header = T, check.names = F) : 
  no lines available in input 

I checked the format of my input files but they look fine to me.
image

Would you mind having a look? Thank you so much!

@z0on
Copy link
Owner

z0on commented Dec 9, 2020 via email

@Ruiqi-CUB
Copy link
Author

Ruiqi-CUB commented Dec 9, 2020 via email

@Ruiqi-CUB
Copy link
Author

Ruiqi-CUB commented Dec 9, 2020 via email

@z0on
Copy link
Owner

z0on commented Dec 9, 2020 via email

@Ruiqi-CUB
Copy link
Author

Ruiqi-CUB commented Dec 10, 2020 via email

@Ruiqi-CUB
Copy link
Author

Hi Misha, are suggestions if I have too many GO terms (~70) on the final figure? I have already used the "strict" cutoffs.
Thanks a lot

Ruiqi

@Ruiqi-CUB
Copy link
Author

Ruiqi-CUB commented Dec 18, 2020 via email

@z0on
Copy link
Owner

z0on commented Dec 18, 2020 via email

@Ruiqi-CUB
Copy link
Author

Sorry it seems like the figure did not go through via gmail.

Here is the figure with level1=0.01, level2=0.001, level3=0.0001
image

Here is the figure with another 10-fold down but it still looks messy.
image

One thing on the second figure I noticed is that some GO terms were plotted even the number is 0.

image

The figure get even messier with BP, even using level1=0.001, level2=0.0001, level3=0.00001
image

Here is the R code in GO_MWU.R
image

@z0on
Copy link
Owner

z0on commented Dec 18, 2020 via email

@Ruiqi-CUB
Copy link
Author

Ruiqi-CUB commented Dec 18, 2020 via email

@z0on
Copy link
Owner

z0on commented Dec 18, 2020 via email

@Ruiqi-CUB
Copy link
Author

Ruiqi-CUB commented Dec 18, 2020 via email

@Ruiqi-CUB
Copy link
Author

Ruiqi-CUB commented Dec 18, 2020 via email

@z0on
Copy link
Owner

z0on commented Dec 18, 2020 via email

@Ruiqi-CUB
Copy link
Author

Ruiqi-CUB commented Dec 18, 2020 via email

@Ruiqi-CUB
Copy link
Author

Hi Misha, I just send you all the files to your email. Please let me know if you can access them! Thanks a lot!

@z0on
Copy link
Owner

z0on commented Dec 18, 2020 via email

@Ruiqi-CUB
Copy link
Author

Ruiqi-CUB commented Dec 18, 2020 via email

@z0on
Copy link
Owner

z0on commented Dec 18, 2020 via email

@Ruiqi-CUB
Copy link
Author

Ruiqi-CUB commented Dec 18, 2020 via email

@Ruiqi-CUB
Copy link
Author

Just want to confirm, is dissim_(GO division)_(go-to-gene table filename) the same given the same GO division and go-to-gene table filename? Even if input filenames are different?

I am using a loop in R to perform GO_MWU for several datasets sharing the same go-to-gene table filename. I just found out that dissim_(GO division)_(go-to-gene table filename) is overwriten everytime performing GO_MWU in the same GO divison, e.g. input1.txt with BP and input2.txt with BP.

@z0on
Copy link
Owner

z0on commented Dec 19, 2020 via email

@Ruiqi-CUB
Copy link
Author

Ruiqi-CUB commented Dec 20, 2020 via email

@Ruiqi-CUB
Copy link
Author

Ruiqi-CUB commented Dec 20, 2020

Hi Misha,

I suspect that the dissim_(GO division)_(go-to-gene table filename) is not the same given the different input file (gene-lop10P), even if the GO division and go-to-gene table are the same.

I tried to run gomwuStats with input1.csv and input2.csv with CC and the same goAnnotations gene2go.tab. dissim_CC_gene2go.tab is overwriten once. After getting the output file, I tried to run gomwuPlot. GO_MWU figure for input2 can be plotted but there is any error message for input1 as below.

 Error in `[.data.frame`(diss, goods.names, goods.names) : 
  undefined columns selected 

Then I runned gomwuStats and gomwuPlot for input1.csv and input2.csv respectively. Both figures were plotted successfully. The dissim_MF_gene2go.tab for input1 is 15.5MB while dissim_MF_gene2go.tab for input2.csv is 15.6MB.

Could you please check your code to see if that is the issue. If it is, would you mind modifying the code to rename dissim_(GO division)(go-to-gene table filename) with dissim(GO division)(input filename)(go-to-gene table filename) instead?

Thank you so much!
Ruiqi

@z0on
Copy link
Owner

z0on commented Dec 21, 2020 via email

@Ruiqi-CUB
Copy link
Author

Ruiqi-CUB commented Dec 21, 2020 via email

@z0on
Copy link
Owner

z0on commented Dec 21, 2020 via email

@Ruiqi-CUB
Copy link
Author

Ruiqi-CUB commented Dec 21, 2020

I guess I have to change "dissim_".$div."_".$gen2go to "dissim_".$div."_".$measure."_".$gen2go at line 153 in gomwu_b.pl and line 53 in gomwu_a.pl, and in.dissim=paste("dissim",goDivision,goAnnotations,sep="_") to in.dissim=paste("dissim",goDivision,input,goAnnotations,sep="_") at line 173 in gomwu.functions.R. I have tested it with 2 input files and it work well, at least there is no error message.

@Ruiqi-CUB
Copy link
Author

Hi Dr. Matz,
Sorry to bother you again. I am trying to interpret the best GO table. Does level mean the GO term level? Does nseqs means the number of tested sequences(genes, isoforms, orthogroups, etc.) found associated witn the GO term? Thank you so much!

delta.rank         pval       level nseqs                                        term                          name        p.adj
  41        -871 6.495308e-05     5   205 GO:0000428;GO:0030880;GO:0016591;GO:0055029        RNA polymerase complex 5.249926e-04

@z0on
Copy link
Owner

z0on commented Dec 22, 2020 via email

@Ruiqi-CUB
Copy link
Author

Ruiqi-CUB commented Dec 22, 2020

Thank you so much! The tree with the cut-off line works really well! It helps me get the representative GOs from so many Go terms in my analyses! Also, the negative delta.rank (the value before pval) means down-regulation, correct?

The editting works really well with a loop in R. Since BP usually takes about one hour on a server and I have so many contrasts (3 species and 3 treatments), it is much more convenient to run gomwuStats with a loop first, then explore each one with gomwuPlot later. I have posted the changes I made in a previous comment.

I guess I have to change "dissim_".$div."_".$gen2go to "dissim_".$div."_".$measure."_".$gen2go at line 153 in gomwu_b.pl and line 53 in gomwu_a.pl, and in.dissim=paste("dissim",goDivision,goAnnotations,sep="_") to in.dissim=paste("dissim",goDivision,input,goAnnotations,sep="_") at line 173 in gomwu.functions.R. I have tested it with 2 input files and it work well, at least there is no error message.

@z0on
Copy link
Owner

z0on commented Dec 22, 2020 via email

@Ruiqi-CUB
Copy link
Author

Thank you! I am honored to contribute your code!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants