Skip to content

Commit

Permalink
new vignette S5a
Browse files Browse the repository at this point in the history
  • Loading branch information
vjcitn committed Jul 4, 2023
1 parent 2377fca commit 010bd39
Show file tree
Hide file tree
Showing 6 changed files with 155 additions and 1 deletion.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: CSHstats
Title: discussions and exercises on classical statistics for CSHL 2022
Version: 0.0.28
Version: 0.0.29
Authors@R:
person(given = "Vince",
family = "Carey",
Expand Down
13 changes: 13 additions & 0 deletions R/data.R
Original file line number Diff line number Diff line change
Expand Up @@ -47,3 +47,16 @@
#' @docType data
#' @format data.frame
"gtex_exc_chr20_b38"

#' vGene result for a recount dataset, SRP045638
#' @note see https://lcolladotor.github.io/cshl_rstats_genome_scale_2023/differential-gene-expression-analysis-with-limma.html#differential-expression
#' @format limma EList
"vGene"

#' topTable excerpt for SRP045638, see vGene
#' @format data.frame
"de_results"

#' model matrix for SRP045638, see vGene
#' @format data.frame
"mod"
16 changes: 16 additions & 0 deletions man/de_results.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

16 changes: 16 additions & 0 deletions man/mod.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

19 changes: 19 additions & 0 deletions man/vGene.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

90 changes: 90 additions & 0 deletions vignettes/S5a_lmdiagnostics.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
---
title: "S5a: Diagnostics of a differential gene expression exercise"
author: "Vincent J. Carey, stvjc at channing.harvard.edu"
date: "`r format(Sys.time(), '%B %d, %Y')`"
vignette: >
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{S5a: Diagnostics of a differential gene expression exercise}
%\VignetteEncoding{UTF-8}
output:
BiocStyle::html_document:
highlight: pygments
number_sections: yes
theme: united
toc: yes
---

# Fast forward

[This document](https://lcolladotor.github.io/cshl_rstats_genome_scale_2023/differential-gene-expression-analysis-with-limma.html#differential-expression) shows how SRP045638 can be retrieved
and analyzed. We start with filtered and normalized data in the `vGene` object.

```{r work1,message=FALSE}
library(edgeR)
library(CSHstats)
data(vGene)
names(vGene)
head(vGene$E[,1:5])
```

The model matrix is also important:
```{r lkn}
data(mod)
head(mod)
```

Here are the top results from the limma-voom analysis:
```{r lklim}
data(de_results)
options(digits=3)
de_results[1:5, c("gene_name", "logFC", "t", "P.Value", "adj.P.Val")]
```
This is a little different from the `de_results` computed in the
class document -- because we sort by p and take the most significant
results.


# Assessing the association "by hand"

The `vGene$E` structure holds the
estimated expression values, and `vGene$weights` are
quantities that measure relative variability of
the quantities in `E`. We can pick a gene of
interest and examine the marginal distribution
and estimate association.

The top DE gene was
"ENSG00000121210.15". Let's make a data.frame with the E
values, the covariates, and the weights.

```{r getit}
target = "ENSG00000121210.15"
ind = which(rownames(vGene$E)==target)
mydf = data.frame(cbind(KIAA0922=vGene$E[ind,],
mod[,-1], wts=vGene$weights[ind,]))
```

Now examine the marginal distribution:
```{r lkdist}
hist(mydf$KIAA0922)
```

Interesting. There's a big gap in the KIAA0922
expression distribution.

Exercise: Is that true for the
"raw" data, or is it an artifact of all the
computations we've done?

```{r lkbs}
library(beeswarm)
beeswarm(mydf$KIAA0922, pwcol=as.numeric(factor(mydf$prenatalprenatal)))
legend(.6,6,legend=c("postnatal", "prenatal"), col=c(1,2), pch=19)
```

Finally, we fit the linear model:
```{r lklm}
m1 = lm(KIAA0922~.-wts, data=mydf, weights=wts)
summary(m1)
plot(m1, which=2, col=(as.numeric(mydf$prenatalprenatal)+1), pch=19)
```

0 comments on commit 010bd39

Please sign in to comment.