# Introduction

As I researched single-cell differential expression packages, I came across [this paper](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2599-6) which concluded that "methods developed specifically for scRNAseq data do not show significantly better performance compared to the methods designed for bulk RNAseq data; and methods that consider behavior of each individual gene (not all genes) in calling DE genes outperform the other tools." DESeq2 was recommended in that paper, so I apply it here (as I did for the bulk sequencing data).

In [1]:
library("DESeq2")

Loading required package: S4Vectors

Loading required package: stats4

Loading required package: BiocGenerics

Loading required package: parallel


Attaching package: 'BiocGenerics'


The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB


The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs


The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, basename, cbind, colnames, dirname, do.call,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
    pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
    tapply, union, unique, unsplit, which, which.max, which.min



Attaching package: 'S4Vectors'


The

# mRNA

## `ct2`

Define a path prefix:

In [2]:
prefix <- "/data/clue/prod/mrna/vals/de/all/ct2/"

Define the directory with the counts and then a results directory.

In [3]:
countsdir <- paste(prefix,"input/",sep="")
resdir <- paste(prefix,"res/",sep="")

List the files in `countsdir`.

In [4]:
sampleFiles <- list.files(countsdir)

In [5]:
sampleFiles

File names are returned in alphabetical order, so the `col.csv` always comes before the `cts.csv`.

In [6]:
for (cond in c("A","B","G","P","R")) {
    
    cond_files <- grep(cond, sampleFiles, value = TRUE)
    
    cts <- as.matrix(read.csv(paste(countsdir, 
                                    cond_files[2], # using the second position, which is the cts.csv
                                    sep=""
                                   ),row.names=1, check.names = FALSE))
    
    coldata <- read.csv(paste(countsdir, 
                              cond_files[1], # using the first position, which is the col.csv
                              sep=""
                             ), row.names=1)
    
    # reclassify as factor, right now reading at an integer
    coldata$FID <- as.factor(coldata$FID)
  
    for (celltype in c("B_Naive", "pDC", "T4_Naive", "HSC", "T4_EM", 
                       "NK", "T_Tox", "B_Mem", "M_cDC", "T8_Naive")) {
        subct <- subset(cts, select=grep(paste(paste("^",as.character(celltype),"-",sep=""),colnames(cts),value = TRUE))
        subcoldata <- subset(coldata, CT == celltype)
        dds <- DESeqDataSetFromMatrix(countData = subct,
                              colData = subcoldata,
                              design = ~ COND
                             )
        dds <- DESeq(dds, parallel = TRUE)
        res <- results(dds, contrast = c("COND",cond,"C"))
        write.csv(as.data.frame(res), file=paste(resdir, cond,"_",as.character(celltype),".csv",sep=""))
    }
    
}

converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 3 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing

converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify 

In [9]:
for (cond in c("A","B","G","R")) {
    
    cond_files <- grep(cond, sampleFiles, value = TRUE)
    
    cts <- as.matrix(read.csv(paste(countsdir, 
                                    cond_files[2], # using the second position, which is the cts.csv
                                    sep=""
                                   ),row.names=1, check.names = FALSE))
    
    coldata <- read.csv(paste(countsdir, 
                              cond_files[1], # using the first position, which is the col.csv
                              sep=""
                             ), row.names=1)
    
    # reclassify as factor, right now reading at an integer
    coldata$FID <- as.factor(coldata$FID)
    
# #     # first, run the time course model outlined in the rnaseqgene vignette
# #     ddsTC <- DESeqDataSetFromMatrix(countData = cts,
# #                     colData = coldata,
# #                                 design = ~ cond + TIME + cond:TIME
# #                                )
# #     ddsTC <- DESeq(ddsTC, test="LRT", reduced = ~ cond + TIME)
# #     res <- results(ddsTC)
# #     write.csv(as.data.frame(res), file=paste(resdir,cond,".TC.csv",sep=""))
    
    # then for each time point, do just regular differential expression between condulation and control
    for (celltype in c("ncM", "cM", "cDC")) {
        subct <- subset(cts, select=grep(paste("^",as.character(celltype),"-",sep=""),colnames(cts),value = TRUE))
        subcoldata <- subset(coldata, CT == celltype)
        dds <- DESeqDataSetFromMatrix(countData = subct,
                              colData = subcoldata,
                              design = ~ COND
                             )
        dds <- DESeq(dds, parallel = TRUE)
        res <- results(dds, contrast = c("COND",cond,"C"))
        write.csv(as.data.frame(res), file=paste(resdir, cond,"_",as.character(celltype),".csv",sep=""))
    }
    
}

converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 13 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing

converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify

## `ct3`

Define a path prefix:

In [1]:
prefix <- "/data/clue/prod/mrna/vals/de/all/ct3/"

Define the directory with the counts and then a results directory.

In [4]:
countsdir <- paste(prefix,"input/",sep="")
resdir <- paste(prefix,"res/",sep="")

List the files in `countsdir`.

In [5]:
sampleFiles <- list.files(countsdir)

In [6]:
sampleFiles

File names are returned in alphabetical order, so the `col.csv` always comes before the `cts.csv`.

In [7]:
for (cond in c("A","B","G","R")) {
    
    cond_files <- grep(cond, sampleFiles, value = TRUE)
    
    cts <- as.matrix(read.csv(paste(countsdir, 
                                    cond_files[2], # using the second position, which is the cts.csv
                                    sep=""
                                   ),row.names=1, check.names = FALSE))
    
    coldata <- read.csv(paste(countsdir, 
                              cond_files[1], # using the first position, which is the col.csv
                              sep=""
                             ), row.names=1)
    
    # reclassify as factor, right now reading at an integer
    coldata$FID <- as.factor(coldata$FID)
  
    for (celltype in c("T_CD10+", "MAIT", "T4_RO+_Act", "T8_Naive_SELL+", "cDC2", 
                       "T8_TEMRA", "T4_Treg_Resting", "T4_Treg_Act", "NK_CD16+", 
                       "T8_CM", "T4_RO+_SELL+", "NK_CD56++", "T4_Naive_SELLint", 
                       "cDC1", "T8_HOBIT+HELIOS+", "T_gd", "T8_EM", "T4_CM")) {
        # the ++ creates problems for regex, don't need regex here because none of my cell types are mutual substrings
        subct <- subset(cts, select=grep(as.character(celltype), colnames(cts), fixed=TRUE, value = TRUE)) 
        subcoldata <- subset(coldata, CT == celltype)
        dds <- DESeqDataSetFromMatrix(countData = subct,
                              colData = subcoldata,
                              design = ~ COND
                             )
        dds <- DESeq(dds, parallel = TRUE)
        res <- results(dds, contrast = c("COND",cond,"C"))
        write.csv(as.data.frame(res), file=paste(resdir, cond,"_",as.character(celltype),".csv",sep=""))
    }
    
}

converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 5 genes
-- DESeq argument 'min

# ADTs

## `ct2`

Define a path prefix:

In [9]:
prefix <- "/data/clue/prod/adts/vals/de/all/ct2/"

Define the directory with the counts and then a results directory.

In [10]:
countsdir <- paste(prefix,"input/",sep="")
resdir <- paste(prefix,"res/",sep="")

List the files in `countsdir`.

In [11]:
sampleFiles <- list.files(countsdir)

In [12]:
sampleFiles

File names are returned in alphabetical order, so the `col.csv` always comes before the `cts.csv`.

In [13]:
for (cond in c("A","B","G","P","R")) {
    
    cond_files <- grep(cond, sampleFiles, value = TRUE)
    
    cts <- as.matrix(read.csv(paste(countsdir, 
                                    cond_files[2], # using the second position, which is the cts.csv
                                    sep=""
                                   ),row.names=1, check.names = FALSE))
    
    coldata <- read.csv(paste(countsdir, 
                              cond_files[1], # using the first position, which is the col.csv
                              sep=""
                             ), row.names=1)
    
    # reclassify as factor, right now reading at an integer
    coldata$FID <- as.factor(coldata$FID)
  
    for (celltype in c("B_Naive", "pDC", "T4_Naive", "HSC", "T4_EM", 
                       "NK", "T_Tox", "B_Mem", "M_cDC", "T8_Naive")) {
        subct <- subset(cts, select=grep(paste("^",as.character(celltype),"-",sep=""),colnames(cts),value = TRUE))
        subcoldata <- subset(coldata, CT == celltype)
        dds <- DESeqDataSetFromMatrix(countData = subct,
                              colData = subcoldata,
                              design = ~ COND
                             )
        dds <- DESeq(dds, parallel = TRUE)
        res <- results(dds, contrast = c("COND",cond,"C"))
        write.csv(as.data.frame(res), file=paste(resdir, cond,"_",as.character(celltype),".csv",sep=""))
    }
    
}

converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 9 genes
-- DESeq argument 'min

In [15]:
for (cond in c("A","B","G","R")) {
    
    cond_files <- grep(cond, sampleFiles, value = TRUE)
    
    cts <- as.matrix(read.csv(paste(countsdir, 
                                    cond_files[2], # using the second position, which is the cts.csv
                                    sep=""
                                   ),row.names=1, check.names = FALSE))
    
    coldata <- read.csv(paste(countsdir, 
                              cond_files[1], # using the first position, which is the col.csv
                              sep=""
                             ), row.names=1)
    
    # reclassify as factor, right now reading at an integer
    coldata$FID <- as.factor(coldata$FID)
    
# #     # first, run the time course model outlined in the rnaseqgene vignette
# #     ddsTC <- DESeqDataSetFromMatrix(countData = cts,
# #                     colData = coldata,
# #                                 design = ~ cond + TIME + cond:TIME
# #                                )
# #     ddsTC <- DESeq(ddsTC, test="LRT", reduced = ~ cond + TIME)
# #     res <- results(ddsTC)
# #     write.csv(as.data.frame(res), file=paste(resdir,cond,".TC.csv",sep=""))
    
    # then for each time point, do just regular differential expression between condulation and control
    for (celltype in c( "ncM", "cM", "cDC")) {
        subct <- subset(cts, select=grep(paste("^",as.character(celltype),"-",sep=""),colnames(cts),value = TRUE))
        subcoldata <- subset(coldata, CT == celltype)
        dds <- DESeqDataSetFromMatrix(countData = subct,
                              colData = subcoldata,
                              design = ~ COND
                             )
        dds <- DESeq(dds, parallel = TRUE)
        res <- results(dds, contrast = c("COND",cond,"C"))
        write.csv(as.data.frame(res), file=paste(resdir, cond,"_",as.character(celltype),".csv",sep=""))
    }
    
}

converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

final dispersion estimates, fitting model and testing: 14 workers

converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

final dispersion estimates, fitting model and testing: 14 workers

converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 2 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



## `ct3`

Define a path prefix:

In [10]:
prefix <- "/data/clue/prod/adts/vals/de/all/ct3/"

Define the directory with the counts and then a results directory.

In [11]:
countsdir <- paste(prefix,"input/",sep="")
resdir <- paste(prefix,"res/",sep="")

List the files in `countsdir`.

In [12]:
sampleFiles <- list.files(countsdir)

In [13]:
sampleFiles

File names are returned in alphabetical order, so the `col.csv` always comes before the `cts.csv`.

In [44]:
for (cond in c("A","B","G","R")) {
    
    cond_files <- grep(cond, sampleFiles, value = TRUE)
    
    cts <- as.matrix(read.csv(paste(countsdir, 
                                    cond_files[2], # using the second position, which is the cts.csv
                                    sep=""
                                   ),row.names=1, check.names = FALSE))
    
    coldata <- read.csv(paste(countsdir, 
                              cond_files[1], # using the first position, which is the col.csv
                              sep=""
                             ), row.names=1)
    
    # reclassify as factor, right now reading at an integer
    coldata$FID <- as.factor(coldata$FID)
  
    for (celltype in c("T_CD10+", "MAIT", "T4_RO+_Act", "T8_Naive_SELL+", "cDC2", 
                       "T8_TEMRA", "T4_Treg_Resting", "T4_Treg_Act", "NK_CD16+", 
                       "T8_CM", "T4_RO+_SELL+", "NK_CD56++", "T4_Naive_SELLint", 
                       "cDC1", "T8_HOBIT+HELIOS+", "T_gd", "T8_EM", "T4_CM")) {
        # the ++ creates problems for regex, don't need regex here because none of my cell types are mutual substrings
        subct <- subset(cts, select=grep(as.character(celltype), colnames(cts), fixed=TRUE, value = TRUE)) 
        subcoldata <- subset(coldata, CT == celltype)
        dds <- DESeqDataSetFromMatrix(countData = subct,
                              colData = subcoldata,
                              design = ~ COND
                             )
        dds <- DESeq(dds, parallel = TRUE)
        res <- results(dds, contrast = c("COND",cond,"C"))
        write.csv(as.data.frame(res), file=paste(resdir, cond,"_",as.character(celltype),".csv",sep=""))
    }
    
}

converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 4 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing

converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 1 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original coun

# IFNs

## mRNA

### `ct2`

Define a path prefix:

In [2]:
prefix <- "/data/clue/prod/mrna/vals/de/IFNs/ct2/"

Define the directory with the counts and then a results directory.

In [3]:
countsdir <- paste(prefix,"input/",sep="")
resdir <- paste(prefix,"res/",sep="")

List the files in `countsdir`.

In [4]:
sampleFiles <- list.files(countsdir)

In [5]:
sampleFiles

In [10]:
for (cond in c("B")) {
    
    cond_files <- grep(cond, sampleFiles, value = TRUE)
    
    cts <- as.matrix(read.csv(paste(countsdir, 
                                    cond_files[2], # using the second position, which is the cts.csv
                                    sep=""
                                   ),row.names=1, check.names = FALSE))
    
    coldata <- read.csv(paste(countsdir, 
                              cond_files[1], # using the first position, which is the col.csv
                              sep=""
                             ), row.names=1)
    
    # reclassify as factor, right now reading at an integer
    coldata$FID <- as.factor(coldata$FID)
    
# #     # first, run the time course model outlined in the rnaseqgene vignette
# #     ddsTC <- DESeqDataSetFromMatrix(countData = cts,
# #                     colData = coldata,
# #                                 design = ~ cond + TIME + cond:TIME
# #                                )
# #     ddsTC <- DESeq(ddsTC, test="LRT", reduced = ~ cond + TIME)
# #     res <- results(ddsTC)
# #     write.csv(as.data.frame(res), file=paste(resdir,cond,".TC.csv",sep=""))
    
    # then for each time point, do just regular differential expression between condulation and control
    for (celltype in c("B_Naive", "pDC", "T4_Naive", "HSC", "T4_EM", "NK", 
                       "T_Tox", "B_Mem", "T8_Naive", "ncM", "cM", "cDC")) {
        print(celltype)
        subct <- subset(cts, select=grep(paste("^",as.character(celltype),"-",sep=""),colnames(cts),value = TRUE))
        subcoldata <- subset(coldata, CT == celltype)
        dds <- DESeqDataSetFromMatrix(countData = subct,
                              colData = subcoldata,
                              design = ~ COND
                             )
        dds <- DESeq(dds, parallel = TRUE)
        res <- results(dds, contrast = c("COND",cond,"G"))
        write.csv(as.data.frame(res), file=paste(resdir, cond,"-G_",as.character(celltype),".csv",sep=""))
    }
    
}

[1] "B_Naive"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 7 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "pDC"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 10 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T4_Naive"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 3 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "HSC"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 18 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T4_EM"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 3 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "NK"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 4 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T_Tox"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 9 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "B_Mem"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 18 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T8_Naive"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 2 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "ncM"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 11 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "cM"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 30 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "cDC"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 29 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



In [11]:
for (cond in c("G")) {
    
    cond_files <- grep(cond, sampleFiles, value = TRUE)
    
    cts <- as.matrix(read.csv(paste(countsdir, 
                                    cond_files[2], # using the second position, which is the cts.csv
                                    sep=""
                                   ),row.names=1, check.names = FALSE))
    
    coldata <- read.csv(paste(countsdir, 
                              cond_files[1], # using the first position, which is the col.csv
                              sep=""
                             ), row.names=1)
    
    # reclassify as factor, right now reading at an integer
    coldata$FID <- as.factor(coldata$FID)
    
# #     # first, run the time course model outlined in the rnaseqgene vignette
# #     ddsTC <- DESeqDataSetFromMatrix(countData = cts,
# #                     colData = coldata,
# #                                 design = ~ cond + TIME + cond:TIME
# #                                )
# #     ddsTC <- DESeq(ddsTC, test="LRT", reduced = ~ cond + TIME)
# #     res <- results(ddsTC)
# #     write.csv(as.data.frame(res), file=paste(resdir,cond,".TC.csv",sep=""))
    
    # then for each time point, do just regular differential expression between condulation and control
    for (celltype in c("B_Naive", "pDC", "T4_Naive", "HSC", "T4_EM", "NK", 
                       "T_Tox", "B_Mem", "T8_Naive", "ncM", "cM", "cDC")) {
        print(celltype)
        subct <- subset(cts, select=grep(paste("^",as.character(celltype),"-",sep=""),colnames(cts),value = TRUE))
        subcoldata <- subset(coldata, CT == celltype)
        dds <- DESeqDataSetFromMatrix(countData = subct,
                              colData = subcoldata,
                              design = ~ COND
                             )
        dds <- DESeq(dds, parallel = TRUE)
        res <- results(dds, contrast = c("COND",cond,"B"))
        write.csv(as.data.frame(res), file=paste(resdir, cond,"-B_",as.character(celltype),".csv",sep=""))
    }
    
}

[1] "B_Naive"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 7 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "pDC"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 10 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T4_Naive"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 3 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "HSC"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 18 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T4_EM"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 3 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "NK"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 4 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T_Tox"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 9 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "B_Mem"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 18 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T8_Naive"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 2 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "ncM"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 11 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "cM"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 30 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "cDC"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 29 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



### `ct3`

Define a path prefix:

In [17]:
prefix <- "/data/clue/prod/mrna/vals/de/IFNs/ct3/"

Define the directory with the counts and then a results directory.

In [18]:
countsdir <- paste(prefix,"input/",sep="")
resdir <- paste(prefix,"res/",sep="")

List the files in `countsdir`.

In [19]:
sampleFiles <- list.files(countsdir)

In [20]:
sampleFiles

In [22]:
for (cond in c("B")) {
    
    cond_files <- grep(cond, sampleFiles, value = TRUE)
    
    cts <- as.matrix(read.csv(paste(countsdir, 
                                    cond_files[2], # using the second position, which is the cts.csv
                                    sep=""
                                   ),row.names=1, check.names = FALSE))
    
    coldata <- read.csv(paste(countsdir, 
                              cond_files[1], # using the first position, which is the col.csv
                              sep=""
                             ), row.names=1)
    
    # reclassify as factor, right now reading at an integer
    coldata$FID <- as.factor(coldata$FID)
    
# #     # first, run the time course model outlined in the rnaseqgene vignette
# #     ddsTC <- DESeqDataSetFromMatrix(countData = cts,
# #                     colData = coldata,
# #                                 design = ~ cond + TIME + cond:TIME
# #                                )
# #     ddsTC <- DESeq(ddsTC, test="LRT", reduced = ~ cond + TIME)
# #     res <- results(ddsTC)
# #     write.csv(as.data.frame(res), file=paste(resdir,cond,".TC.csv",sep=""))
    
    # then for each time point, do just regular differential expression between condulation and control
    for (celltype in c("T_CD10+", "MAIT", "T4_RO+_Act", "T8_Naive_SELL+", "cDC2", 
                       "T8_TEMRA", "T4_Treg_Resting", "T4_Treg_Act", "NK_CD16+", 
                       "T8_CM", "T4_RO+_SELL+", "NK_CD56++", "T4_Naive_SELLint", 
                       "cDC1", "T8_HOBIT+HELIOS+", "T_gd", "T8_EM", "T4_CM")) {
        print(celltype)
        # the ++ creates problems for regex, don't need regex here because none of my cell types are mutual substrings
        subct <- subset(cts, select=grep(as.character(celltype), colnames(cts), fixed=TRUE, value = TRUE)) 
        subcoldata <- subset(coldata, CT == celltype)
        dds <- DESeqDataSetFromMatrix(countData = subct,
                              colData = subcoldata,
                              design = ~ COND
                             )
        dds <- DESeq(dds, parallel = TRUE)
        res <- results(dds, contrast = c("COND",cond,"G"))
        write.csv(as.data.frame(res), file=paste(resdir, cond,"-G_",as.character(celltype),".csv",sep=""))
    }
    
}

[1] "T_CD10+"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 3 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "MAIT"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 4 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T4_RO+_Act"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 6 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T8_Naive_SELL+"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 3 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "cDC2"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 10 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T8_TEMRA"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 8 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T4_Treg_Resting"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers



[1] "T4_Treg_Act"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers



[1] "NK_CD16+"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 2 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T8_CM"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 3 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T4_RO+_SELL+"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 1 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "NK_CD56++"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 3 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T4_Naive_SELLint"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 3 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "cDC1"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 27 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T8_HOBIT+HELIOS+"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 1 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T_gd"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers



[1] "T8_EM"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 2 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T4_CM"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 1 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



In [23]:
for (cond in c("G")) {
    
    cond_files <- grep(cond, sampleFiles, value = TRUE)
    
    cts <- as.matrix(read.csv(paste(countsdir, 
                                    cond_files[2], # using the second position, which is the cts.csv
                                    sep=""
                                   ),row.names=1, check.names = FALSE))
    
    coldata <- read.csv(paste(countsdir, 
                              cond_files[1], # using the first position, which is the col.csv
                              sep=""
                             ), row.names=1)
    
    # reclassify as factor, right now reading at an integer
    coldata$FID <- as.factor(coldata$FID)
    
# #     # first, run the time course model outlined in the rnaseqgene vignette
# #     ddsTC <- DESeqDataSetFromMatrix(countData = cts,
# #                     colData = coldata,
# #                                 design = ~ cond + TIME + cond:TIME
# #                                )
# #     ddsTC <- DESeq(ddsTC, test="LRT", reduced = ~ cond + TIME)
# #     res <- results(ddsTC)
# #     write.csv(as.data.frame(res), file=paste(resdir,cond,".TC.csv",sep=""))
    
    # then for each time point, do just regular differential expression between condulation and control
    for (celltype in c("T_CD10+", "MAIT", "T4_RO+_Act", "T8_Naive_SELL+", "cDC2", 
                       "T8_TEMRA", "T4_Treg_Resting", "T4_Treg_Act", "NK_CD16+", 
                       "T8_CM", "T4_RO+_SELL+", "NK_CD56++", "T4_Naive_SELLint", 
                       "cDC1", "T8_HOBIT+HELIOS+", "T_gd", "T8_EM", "T4_CM")) {
        print(celltype)
        # the ++ creates problems for regex, don't need regex here because none of my cell types are mutual substrings
        subct <- subset(cts, select=grep(as.character(celltype), colnames(cts), fixed=TRUE, value = TRUE)) 
        subcoldata <- subset(coldata, CT == celltype)
        dds <- DESeqDataSetFromMatrix(countData = subct,
                              colData = subcoldata,
                              design = ~ COND
                             )
        dds <- DESeq(dds, parallel = TRUE)
        res <- results(dds, contrast = c("COND",cond,"B"))
        write.csv(as.data.frame(res), file=paste(resdir, cond,"-B_",as.character(celltype),".csv",sep=""))
    }
    
}

[1] "T_CD10+"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 3 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "MAIT"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 4 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T4_RO+_Act"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 6 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T8_Naive_SELL+"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 3 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "cDC2"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 10 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T8_TEMRA"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 8 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T4_Treg_Resting"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers



[1] "T4_Treg_Act"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers



[1] "NK_CD16+"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 2 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T8_CM"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 3 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T4_RO+_SELL+"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 1 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "NK_CD56++"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 3 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T4_Naive_SELLint"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 3 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "cDC1"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 27 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T8_HOBIT+HELIOS+"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 1 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T_gd"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers



[1] "T8_EM"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 2 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T4_CM"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 1 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



## ADTs

### `ct2`

Define a path prefix:

In [28]:
prefix <- "/data/clue/prod/adts/vals/de/IFNs/ct2/"

Define the directory with the counts and then a results directory.

In [29]:
countsdir <- paste(prefix,"input/",sep="")
resdir <- paste(prefix,"res/",sep="")

List the files in `countsdir`.

In [30]:
sampleFiles <- list.files(countsdir)

In [31]:
sampleFiles

In [32]:
for (cond in c("B")) {
    
    cond_files <- grep(cond, sampleFiles, value = TRUE)
    
    cts <- as.matrix(read.csv(paste(countsdir, 
                                    cond_files[2], # using the second position, which is the cts.csv
                                    sep=""
                                   ),row.names=1, check.names = FALSE))
    
    coldata <- read.csv(paste(countsdir, 
                              cond_files[1], # using the first position, which is the col.csv
                              sep=""
                             ), row.names=1)
    
    # reclassify as factor, right now reading at an integer
    coldata$FID <- as.factor(coldata$FID)
    
# #     # first, run the time course model outlined in the rnaseqgene vignette
# #     ddsTC <- DESeqDataSetFromMatrix(countData = cts,
# #                     colData = coldata,
# #                                 design = ~ cond + TIME + cond:TIME
# #                                )
# #     ddsTC <- DESeq(ddsTC, test="LRT", reduced = ~ cond + TIME)
# #     res <- results(ddsTC)
# #     write.csv(as.data.frame(res), file=paste(resdir,cond,".TC.csv",sep=""))
    
    # then for each time point, do just regular differential expression between condulation and control
    for (celltype in c("B_Naive", "pDC", "T4_Naive", "HSC", "T4_EM", "NK", 
                       "T_Tox", "B_Mem", "T8_Naive", "ncM", "cM", "cDC")) {
        print(celltype)
        subct <- subset(cts, select=grep(paste("^",as.character(celltype),"-",sep=""),colnames(cts),value = TRUE))
        subcoldata <- subset(coldata, CT == celltype)
        dds <- DESeqDataSetFromMatrix(countData = subct,
                              colData = subcoldata,
                              design = ~ COND
                             )
        dds <- DESeq(dds, parallel = TRUE)
        res <- results(dds, contrast = c("COND",cond,"G"))
        write.csv(as.data.frame(res), file=paste(resdir, cond,"-G_",as.character(celltype),".csv",sep=""))
    }
    
}

[1] "B_Naive"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers



[1] "pDC"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 2 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T4_Naive"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers



[1] "HSC"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 14 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T4_EM"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 1 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "NK"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 1 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T_Tox"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers



[1] "B_Mem"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers



[1] "T8_Naive"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers



[1] "ncM"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 1 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "cM"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 1 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "cDC"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

final dispersion estimates, fitting model and testing: 14 workers



In [34]:
for (cond in c("G")) {
    
    cond_files <- grep(cond, sampleFiles, value = TRUE)
    
    cts <- as.matrix(read.csv(paste(countsdir, 
                                    cond_files[2], # using the second position, which is the cts.csv
                                    sep=""
                                   ),row.names=1, check.names = FALSE))
    
    coldata <- read.csv(paste(countsdir, 
                              cond_files[1], # using the first position, which is the col.csv
                              sep=""
                             ), row.names=1)
    
    # reclassify as factor, right now reading at an integer
    coldata$FID <- as.factor(coldata$FID)
    
# #     # first, run the time course model outlined in the rnaseqgene vignette
# #     ddsTC <- DESeqDataSetFromMatrix(countData = cts,
# #                     colData = coldata,
# #                                 design = ~ cond + TIME + cond:TIME
# #                                )
# #     ddsTC <- DESeq(ddsTC, test="LRT", reduced = ~ cond + TIME)
# #     res <- results(ddsTC)
# #     write.csv(as.data.frame(res), file=paste(resdir,cond,".TC.csv",sep=""))
    
    # then for each time point, do just regular differential expression between condulation and control
    for (celltype in c("B_Naive", "pDC", "T4_Naive", "HSC", "T4_EM", "NK", 
                       "T_Tox", "B_Mem", "T8_Naive", "ncM", "cM", "cDC")) {
        print(celltype)
        subct <- subset(cts, select=grep(paste("^",as.character(celltype),"-",sep=""),colnames(cts),value = TRUE))
        subcoldata <- subset(coldata, CT == celltype)
        dds <- DESeqDataSetFromMatrix(countData = subct,
                              colData = subcoldata,
                              design = ~ COND
                             )
        dds <- DESeq(dds, parallel = TRUE)
        res <- results(dds, contrast = c("COND",cond,"B"))
        write.csv(as.data.frame(res), file=paste(resdir, cond,"-B_",as.character(celltype),".csv",sep=""))
    }
    
}

[1] "cDC"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

final dispersion estimates, fitting model and testing: 14 workers



### `ct3`

Define a path prefix:

In [35]:
prefix <- "/data/clue/prod/adts/vals/de/IFNs/ct3/"

Define the directory with the counts and then a results directory.

In [36]:
countsdir <- paste(prefix,"input/",sep="")
resdir <- paste(prefix,"res/",sep="")

List the files in `countsdir`.

In [37]:
sampleFiles <- list.files(countsdir)

In [38]:
sampleFiles

In [39]:
for (cond in c("B")) {
    
    cond_files <- grep(cond, sampleFiles, value = TRUE)
    
    cts <- as.matrix(read.csv(paste(countsdir, 
                                    cond_files[2], # using the second position, which is the cts.csv
                                    sep=""
                                   ),row.names=1, check.names = FALSE))
    
    coldata <- read.csv(paste(countsdir, 
                              cond_files[1], # using the first position, which is the col.csv
                              sep=""
                             ), row.names=1)
    
    # reclassify as factor, right now reading at an integer
    coldata$FID <- as.factor(coldata$FID)
    
# #     # first, run the time course model outlined in the rnaseqgene vignette
# #     ddsTC <- DESeqDataSetFromMatrix(countData = cts,
# #                     colData = coldata,
# #                                 design = ~ cond + TIME + cond:TIME
# #                                )
# #     ddsTC <- DESeq(ddsTC, test="LRT", reduced = ~ cond + TIME)
# #     res <- results(ddsTC)
# #     write.csv(as.data.frame(res), file=paste(resdir,cond,".TC.csv",sep=""))
    
    # then for each time point, do just regular differential expression between condulation and control
    for (celltype in c("T_CD10+", "MAIT", "T4_RO+_Act", "T8_Naive_SELL+", "cDC2", 
                       "T8_TEMRA", "T4_Treg_Resting", "T4_Treg_Act", "NK_CD16+", 
                       "T8_CM", "T4_RO+_SELL+", "NK_CD56++", "T4_Naive_SELLint", 
                       "cDC1", "T8_HOBIT+HELIOS+", "T_gd", "T8_EM", "T4_CM")) {
        print(celltype)
        # the ++ creates problems for regex, don't need regex here because none of my cell types are mutual substrings
        subct <- subset(cts, select=grep(as.character(celltype), colnames(cts), fixed=TRUE, value = TRUE)) 
        subcoldata <- subset(coldata, CT == celltype)
        dds <- DESeqDataSetFromMatrix(countData = subct,
                              colData = subcoldata,
                              design = ~ COND
                             )
        dds <- DESeq(dds, parallel = TRUE)
        res <- results(dds, contrast = c("COND",cond,"G"))
        write.csv(as.data.frame(res), file=paste(resdir, cond,"-G_",as.character(celltype),".csv",sep=""))
    }
    
}

[1] "T_CD10+"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 10 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "MAIT"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

final dispersion estimates, fitting model and testing: 14 workers



[1] "T4_RO+_Act"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 1 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T8_Naive_SELL+"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers



[1] "cDC2"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 2 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T8_TEMRA"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 1 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T4_Treg_Resting"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 1 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T4_Treg_Act"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 6 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "NK_CD16+"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 1 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T8_CM"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 1 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T4_RO+_SELL+"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers



[1] "NK_CD56++"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 2 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T4_Naive_SELLint"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 2 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "cDC1"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

final dispersion estimates, fitting model and testing: 14 workers



[1] "T8_HOBIT+HELIOS+"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 3 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T_gd"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

final dispersion estimates, fitting model and testing: 14 workers



[1] "T8_EM"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

final dispersion estimates, fitting model and testing: 14 workers



[1] "T4_CM"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 3 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



In [40]:
for (cond in c("G")) {
    
    cond_files <- grep(cond, sampleFiles, value = TRUE)
    
    cts <- as.matrix(read.csv(paste(countsdir, 
                                    cond_files[2], # using the second position, which is the cts.csv
                                    sep=""
                                   ),row.names=1, check.names = FALSE))
    
    coldata <- read.csv(paste(countsdir, 
                              cond_files[1], # using the first position, which is the col.csv
                              sep=""
                             ), row.names=1)
    
    # reclassify as factor, right now reading at an integer
    coldata$FID <- as.factor(coldata$FID)
    
# #     # first, run the time course model outlined in the rnaseqgene vignette
# #     ddsTC <- DESeqDataSetFromMatrix(countData = cts,
# #                     colData = coldata,
# #                                 design = ~ cond + TIME + cond:TIME
# #                                )
# #     ddsTC <- DESeq(ddsTC, test="LRT", reduced = ~ cond + TIME)
# #     res <- results(ddsTC)
# #     write.csv(as.data.frame(res), file=paste(resdir,cond,".TC.csv",sep=""))
    
    # then for each time point, do just regular differential expression between condulation and control
    for (celltype in c("T_CD10+", "MAIT", "T4_RO+_Act", "T8_Naive_SELL+", "cDC2", 
                       "T8_TEMRA", "T4_Treg_Resting", "T4_Treg_Act", "NK_CD16+", 
                       "T8_CM", "T4_RO+_SELL+", "NK_CD56++", "T4_Naive_SELLint", 
                       "cDC1", "T8_HOBIT+HELIOS+", "T_gd", "T8_EM", "T4_CM")) {
        print(celltype)
        # the ++ creates problems for regex, don't need regex here because none of my cell types are mutual substrings
        subct <- subset(cts, select=grep(as.character(celltype), colnames(cts), fixed=TRUE, value = TRUE)) 
        subcoldata <- subset(coldata, CT == celltype)
        dds <- DESeqDataSetFromMatrix(countData = subct,
                              colData = subcoldata,
                              design = ~ COND
                             )
        dds <- DESeq(dds, parallel = TRUE)
        res <- results(dds, contrast = c("COND",cond,"B"))
        write.csv(as.data.frame(res), file=paste(resdir, cond,"-B_",as.character(celltype),".csv",sep=""))
    }
    
}

[1] "T_CD10+"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 10 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "MAIT"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

final dispersion estimates, fitting model and testing: 14 workers



[1] "T4_RO+_Act"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 1 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T8_Naive_SELL+"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers



[1] "cDC2"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 2 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T8_TEMRA"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 1 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T4_Treg_Resting"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 1 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T4_Treg_Act"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 6 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "NK_CD16+"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 1 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T8_CM"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 1 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T4_RO+_SELL+"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers



[1] "NK_CD56++"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 2 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T4_Naive_SELLint"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 2 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "cDC1"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

final dispersion estimates, fitting model and testing: 14 workers



[1] "T8_HOBIT+HELIOS+"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 3 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing



[1] "T_gd"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

final dispersion estimates, fitting model and testing: 14 workers



[1] "T8_EM"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

final dispersion estimates, fitting model and testing: 14 workers



[1] "T4_CM"


converting counts to integer mode

estimating size factors

estimating dispersions

gene-wise dispersion estimates: 14 workers

mean-dispersion relationship

final dispersion estimates, fitting model and testing: 14 workers

-- replacing outliers and refitting for 3 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing

