Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some discrepancy in file WT.C_v_WT.NC.txt between DE analysis and enrichment #4

Open
yige-luo opened this issue Aug 14, 2020 · 1 comment

Comments

@yige-luo
Copy link

Hi there,

I noticed a discrepancy with cascading effects when I followed the instruction. During DE analysis, a file called "WT.C_v_WT.NC.txt" is generated and used during pathway analysis:

DE_Analysis_mm.md

9. Write top.table to a file, adding in cpms and annotation

top.table$Gene <- rownames(top.table)
top.table <- top.table[,c("Gene", names(top.table)[1:6])]
top.table <- data.frame(top.table,anno[match(top.table$Gene,anno$Gene.stable.ID),],logcpm[match(top.table$Gene,rownames(logcpm)),])

head(top.table)

                         Gene            logFC   AveExpr       t       P.Value
ENSMUSG00000020608 ENSMUSG00000020608 -2.494152 7.871119 -44.93189 1.633862e-19
ENSMUSG00000052212 ENSMUSG00000052212  4.544592 6.203043  40.16863 1.138575e-18
ENSMUSG00000049103 ENSMUSG00000049103  2.155498 9.892016  40.16019 1.142725e-18
ENSMUSG00000030203 ENSMUSG00000030203 -4.127795 7.005929 -34.51508 1.565016e-17
ENSMUSG00000027508 ENSMUSG00000027508 -1.906200 8.124895 -33.88940 2.145300e-17
ENSMUSG00000021990 ENSMUSG00000021990 -2.682202 8.368960 -33.73316 2.323138e-17

write.table(top.table, file = "WT.C_v_WT.NC.txt", row.names = F, sep = "\t", quote = F)

enrichment_mm.md

1. topGO Example - Using Kolmogorov-Smirnov Testing

Our first example uses Kolmogorov-Smirnov Testing for enrichment testing of our mouse DE results, with GO annotation obtained from the Bioconductor database org.Mm.eg.db.

The first step in each topGO analysis is to create a topGOdata object. This contains the genes, the score for each gene (here we use the p-value from the DE test), the GO terms associated with each gene, and the ontology to be used (here we use the biological process ontology)

infile <- "WT.C_v_WT.NC.txt"
tmp <- read.delim(infile)
geneList <- tmp$P.Value
xx <- as.list(org.Mm.egENSEMBL2EG)
names(geneList) <- xx[sapply(strsplit(tmp$Gene,split="\."),"[[", 1L)]
head(geneList)

     74127        70686        14268        20112        67241        66775 
 9.057118e-18 3.288834e-17 6.570900e-17 6.921801e-17 2.519371e-16 2.746416e-16

I believe two different files called WT.C_v_WT.NC.txt are used because their p-values are not consistent in the above two .md files. (compare last column P.value in the first .md file and the 2nd line in the second .md file) I think this issue has some bad consequences as all the downstream analyses are affected and I could not reproduce the results. As I cannot find any files called WT.C_v_WT.NC.txt on GitHub, I would appreciate it if you confirm and resolve this issue.

Best wishes,
Yige

@hslyman
Copy link
Contributor

hslyman commented Aug 14, 2020

Thanks for the heads up! I will investigate ASAP. We're out of the office with another workshop until 20 August

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants