# Differential Expression with DeSeq2

Now that I have my [count data](https://github.com/yaaminiv/yaaminiv-fish546-2016/blob/master/notebooks/2016-11-04-oly-gonad-OA-part3-kallisto.ipynb), I can use [DeSeq2](https://github.com/yaaminiv/yaaminiv-fish546-2016/blob/master/tutorials/DESeq2-tutorial/2016-10-26-DESeq2-Tutorial-Part-2.ipynb) to analyze differential expression. The process I used was adapted from [this notebook](https://github.com/sr320/eimd-sswd/blob/master/eimd_analysis.ipynb).

My goal is to compare gene expression between the female-106 and female-108 samples, and between the male-106 and male-108 samples. To do this, I will use the `DeSeq2` package in R. Before this can happen, I need to make sure my data is in the right format for analysis.

### 1. Reformat `kallisto quant` output files

#### Convert to a `.txt` file

My first step is converting my count data into a .txt file to use in DeSeq2. This is as simple as opening the `.tsv` file generated by `kallisto quant` in excel, and saving it as a `.txt` file.

![example file conversion](https://raw.githubusercontent.com/yaaminiv/yaaminiv-fish546-2016/master/analyses/kallisto-female-108/samplefileconversion.png)

#### Merge count data files

Next, I need to merge all of my count data files so I have one `.txt` file with all of my count information (the column label in the original files have the word "count" in them). The column headers will be as follows:

- Feature_ID
- Female_106 - Total gene reads	
- Male_106 - Total gene reads	
- Female_108 - Total gene reads	
- Male_108 - Total gene reads

I merged these in Excel. Additionally, I converted all of my count data to integers, as DESeq2 will only work with nonnegative integers. My final file looks like this:

![oly-gonad-oa-counts](https://raw.githubusercontent.com/yaaminiv/yaaminiv-fish546-2016/master/data/countdatascreenshot.png)

### 2. Use DeSeq2 in R

I am now ready to complete my analyses in R!

The first thing I did was use DESeq2 to compare differentially expressed genes in all treatments:

[R Script](https://github.com/yaaminiv/yaaminiv-fish546-2016/blob/master/analyses/oly_oa_gonad_DESeq2/2016-11-16-alltreatments-DESeq2.R)

[List of differentially expressed genes](https://raw.githubusercontent.com/yaaminiv/yaaminiv-fish546-2016/master/analyses/oly_oa_gonad_DESeq2/alltreatments_DEG.tab)

![all treatments](https://raw.githubusercontent.com/yaaminiv/yaaminiv-fish546-2016/master/analyses/oly_oa_gonad_DESeq2/alltreatments.png)

I noticed that there weren't many differentially expressed genes, maybe because I was comparing four separate conditions: Female_106, Male_106, Female_108 and Male_108. I then created new count files for all possible pairwise comparisons and ran them through DESeq2. Associated information for these comparisons I tested can be viewed below:

#### Control vs. Ocean Acidification conditions

**Female_106 vs. Female_108**

Oddly enough, there were no differentially expressed genes between these two!

[Count data](https://raw.githubusercontent.com/yaaminiv/yaaminiv-fish546-2016/master/data/2016-11-16-oly-gonad-oa-count-data-female106-female108.txt)

[R Script](https://github.com/yaaminiv/yaaminiv-fish546-2016/blob/master/analyses/oly_oa_gonad_DESeq2/2016-11-16-female106-female108-DESeq2.R)

[List of differentially expressed genes](https://raw.githubusercontent.com/yaaminiv/yaaminiv-fish546-2016/master/analyses/oly_oa_gonad_DESeq2/female106-female108_DEG.tab)

![female-106 vs. female-108](https://raw.githubusercontent.com/yaaminiv/yaaminiv-fish546-2016/master/analyses/oly_oa_gonad_DESeq2/female106-female108.png)

**Male_106 vs. Male_108**

[Count data]()

[R Script]()

[List of differentially expressed genes]()

![male-106 vs. male-108]()

#### Male vs. Female Gonads

**Female_106 vs. Male_106**

[Count data]()

[R Script]()

[List of differentially expressed genes]()

![female-106 vs. male-106]()

**Female_108 vs. Male_108**
[Count data]()

[R Script]()

[List of differentially expressed genes]()

![female-108 vs. male-108]()