In order to determine whether the count for a transcript is significantly different or differentially expressed under the treatment condition, we need to do a differential count analysis for the data. This is analogous to the differential gene analysis explained in Chapter 5, Analyzing Microarray Data with R. We will do such an analysis using the edgeR package in this recipe.

为了确定在处理后的转录的计数是显著差异还是差异表达，我们需要对数据进行差异计数分析。这类似与第5张的差异基因分析，用R来分析微阵列数据。本章的edgeR包进行分析。


1. First, start with the installation and loading of the required libraries as follows:

 安装并加载edgeR：

In [1]:
source("http://bioconductor.org/biocLite.R")
biocLite("edgeR")

Bioconductor version 3.7 (BiocInstaller 1.30.0), ?biocLite for help
A newer version of Bioconductor is available for this version of R,
  ?BiocUpgrade for help
BioC_mirror: https://bioconductor.org
Using Bioconductor 3.7 (BiocInstaller 1.30.0), R 3.5.1 (2018-07-02).
Installing package(s) 'edgeR'
also installing the dependency 'locfit'



package 'locfit' successfully unpacked and MD5 sums checked
package 'edgeR' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\Administrator\AppData\Local\Temp\RtmpQfKeYA\downloaded_packages


Old packages: 'ade4', 'ape', 'backports', 'BH', 'BiocManager', 'broom',
  'callr', 'caret', 'checkpoint', 'class', 'cli', 'clipr', 'codetools',
  'colorspace', 'curl', 'data.table', 'dbplyr', 'ddalpha', 'digest', 'dimRed',
  'doParallel', 'dplyr', 'evaluate', 'fansi', 'forcats', 'foreign', 'geometry',
  'ggplot2', 'haven', 'htmlwidgets', 'httpuv', 'httr', 'igraph', 'ipred',
  'IRdisplay', 'IRkernel', 'jsonlite', 'kernlab', 'knitr', 'later', 'lattice',
  'lava', 'magic', 'markdown', 'MASS', 'Matrix', 'mgcv', 'mime', 'MKmisc',
  'ModelMetrics', 'modelr', 'openssl', 'pillar', 'pkgconfig', 'pls',
  'processx', 'purrr', 'R.utils', 'R6', 'Rcpp', 'readr', 'readxl', 'recipes',
  'repr', 'reprex', 'rlang', 'rmarkdown', 'robustbase', 'rstudioapi', 'RUnit',
  'scales', 'sfsmisc', 'shiny', 'stringi', 'stringr', 'survival', 'testthat',
  'tibble', 'tidyr', 'tidyselect', 'tinytex', 'TTR', 'xfun', 'XML', 'xtable',
  'xts', 'zoo'


In [2]:
biocLite("goseq")

BioC_mirror: https://bioconductor.org
Using Bioconductor 3.7 (BiocInstaller 1.30.0), R 3.5.1 (2018-07-02).
Installing package(s) 'goseq'
also installing the dependencies 'BiasedUrn', 'geneLenDataBase'



package 'BiasedUrn' successfully unpacked and MD5 sums checked
package 'goseq' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\Administrator\AppData\Local\Temp\RtmpQfKeYA\downloaded_packages


installing the source package 'geneLenDataBase'

Old packages: 'ade4', 'ape', 'backports', 'BH', 'BiocManager', 'broom',
  'callr', 'caret', 'checkpoint', 'class', 'cli', 'clipr', 'codetools',
  'colorspace', 'curl', 'data.table', 'dbplyr', 'ddalpha', 'digest', 'dimRed',
  'doParallel', 'dplyr', 'evaluate', 'fansi', 'forcats', 'foreign', 'geometry',
  'ggplot2', 'haven', 'htmlwidgets', 'httpuv', 'httr', 'igraph', 'ipred',
  'IRdisplay', 'IRkernel', 'jsonlite', 'kernlab', 'knitr', 'later', 'lattice',
  'lava', 'magic', 'markdown', 'MASS', 'Matrix', 'mgcv', 'mime', 'MKmisc',
  'ModelMetrics', 'modelr', 'openssl', 'pillar', 'pkgconfig', 'pls',
  'processx', 'purrr', 'R.utils', 'R6', 'Rcpp', 'readr', 'readxl', 'recipes',
  'repr', 'reprex', 'rlang', 'rmarkdown', 'robustbase', 'rstudioapi', 'RUnit',
  'scales', 'sfsmisc', 'shiny', 'stringi', 'stringr', 'survival', 'testthat',
  'tibble', 'tidyr', 'tidyselect', 'tinytex', 'TTR', 'xfun', 'XML', 'xtable',
  'xts', 'zoo'


In [3]:
library(edgeR)

Loading required package: limma


In [4]:
library(goseq)

Loading required package: BiasedUrn
"package 'BiasedUrn' was built under R version 3.5.2"Loading required package: geneLenDataBase
"replacing previous import 'BiocGenerics::dims' by 'Biobase::dims' when loading 'AnnotationDbi'"


2. Now, read in the input data from the goseq library data directory as follows:

 直接从goseq包中读取数据：

In [5]:
myData <- read.table(system.file("extdata", "Li_sum.txt",
package='goseq'), sep = '\t', header = TRUE, stringsAsFactors =
FALSE,row.names=1)

3. Take a look at the content of a part of the data as follows:

 查看部分数据内容：

In [6]:
head(myData)

Unnamed: 0,lane1,lane2,lane3,lane4,lane5,lane6,lane8
ENSG00000215688,0,0,0,0,0,0,0
ENSG00000215689,0,0,0,0,0,0,0
ENSG00000220823,0,0,0,0,0,0,0
ENSG00000242499,0,0,0,0,0,0,0
ENSG00000224938,0,0,0,0,0,0,0
ENSG00000239242,0,0,0,0,0,0,0


4. The first four columns in your data are controls and the last three are the treatment samples (see the Getting ready section of this recipe). Assign these attributes to the data as follows：

 数据中的前四列是控件，最后三列是处理示例(请参阅本菜谱的“准备”部分)。将这些属性分配给数据，如下所示：

In [7]:
myTreat <- factor(rep(c("Control","Treatment"),times = c(4,3)))

5. Now, create a DGElist object using all the count data and treatment information as follows:

 创建一个DGElist对象：

In [8]:
myDG <- DGEList(myData,lib.size = colSums(myData),group = myTreat)

The DGElist object is a list with two components: counts and sample (treatment information), as shown in the following example:

DGElist对象包含两个部分：count和样本（处理信息）

In [9]:
myDG

Unnamed: 0,lane1,lane2,lane3,lane4,lane5,lane6,lane8
ENSG00000215688,0,0,0,0,0,0,0
ENSG00000215689,0,0,0,0,0,0,0
ENSG00000220823,0,0,0,0,0,0,0
ENSG00000242499,0,0,0,0,0,0,0
ENSG00000224938,0,0,0,0,0,0,0
ENSG00000239242,0,0,0,0,0,0,0
ENSG00000243140,0,0,0,0,0,0,0
ENSG00000240187,0,0,0,0,0,0,0
ENSG00000241444,0,0,0,0,0,0,0
ENSG00000242468,0,0,0,0,0,0,0

Unnamed: 0,group,lib.size,norm.factors
lane1,Control,1178832,1
lane2,Control,1384945,1
lane3,Control,1716355,1
lane4,Control,1767927,1
lane5,Treatment,2127868,1
lane6,Treatment,2142158,1
lane8,Treatment,816171,1


6. Now, estimate the dispersion in the data by typing the following command:

  输入以下命令来估计数据的离散度：

In [11]:
myDisp <- estimateCommonDisp(myDG)

7. This is followed by an exact test as follows:

 然后做更准确的测试：

In [12]:
mytest <- exactTest(myDisp)

8. Extract the top DE tags ranked by the p-value (or the absolute log fold change) using the following topTags function:

 使用以下topTags函数提取按p值(或绝对对数折叠变化)排序的顶级DE标记:

In [13]:
myRes <- topTags(mytest, sort.by = "PValue")

9. To see the results, simply check the head of the data.frame object as follows:

 要查看结果，只需检查data.frame对象的头部，如下所示：

In [14]:
head(myRes)

ERROR: Error: Two subscripts required
