# Kallisto

I will now use [kallisto (v0.43.0, downloaded 2016-11-04)](https://pachterlab.github.io/kallisto/about) to quantify the abundance of my target sequences (fastq files) in my overall transcriptome.

First, I will set my working directory to allow me to work from remote machines if needed.

In [1]:
pwd

'/Users/yaaminivenkataraman/Documents/School/Year1/FISH-546/yaaminiv-fish546-2016/notebooks'

In [2]:
cd /Users/yaaminivenkataraman/Documents/School/Year1/FISH-546/yaaminiv-fish546-2016

/Users/yaaminivenkataraman/Documents/School/Year1/FISH-546/yaaminiv-fish546-2016


In [3]:
pwd

'/Users/yaaminivenkataraman/Documents/School/Year1/FISH-546/yaaminiv-fish546-2016'

Similar to running a blastx, I first need to create a database (kallisto index). The code is as follows:

1. define the program, `kallisto index`
2. `-i` indicate the name for the new index
3. fasta file to be used to create an index

In [10]:
!/Applications/kallisto/kallisto index \
-i /Users/yaaminivenkataraman/Documents/School/Year1/FISH-546/yaaminiv-fish546-2016/data/kallisto-index-OlyO-v6 \
/Users/yaaminivenkataraman/Documents/School/Year1/FISH-546/yaaminiv-fish546-2016/data/OlyO_v6_transcriptome.fa \


[build] loading fasta file /Users/yaaminivenkataraman/Documents/School/Year1/FISH-546/yaaminiv-fish546-2016/data/OlyO_v6_transcriptome.fa
[build] k-mer length: 31
        from 833 target sequences
[build] counting k-mers ... done.
[build] building target de Bruijn graph ...  done 
[build] creating equivalence classes ...  done
[build] target de Bruijn graph has 599683 contigs and contains 74111966 k-mers 



I can now use my newly created index to run 4 separate commands to quantify my reads in the larger transcriptome. The code is as follows:

1. Define the program, `kallisto quant`
2. `-i` indicates when index to use
3. `-o` tells the program where to write the output
4. `--single` allows me to process single-end reads
5. `-l` estimated average fragment length from [FastQC output](https://github.com/yaaminiv/yaaminiv-fish546-2016/blob/master/notebooks/2016-10-19-oly-gonad-OA-part-1-FASTQC-results.ipynb)
6. `-s` estimated standard deviation of fragment length. I guessed 20%, or 0.20.
7. fastq file to be used

### **1. filtered_106A_Female_Mix_GATCAG_L004_R1.fastq**

In [7]:
!/Applications/kallisto/kallisto quant \
-i /Users/yaaminivenkataraman/Documents/School/Year1/FISH-546/yaaminiv-fish546-2016/data/kallisto-index-OlyO-v6 \
-o /Users/yaaminivenkataraman/Documents/School/Year1/FISH-546/yaaminiv-fish546-2016/analyses/kallisto-female-106 \
--single \
-l 76 \
-s .20 \
/Users/yaaminivenkataraman/Documents/School/Year1/FISH-546/yaaminiv-fish546-2016/data/filtered_106A_Female_Mix_GATCAG_L004_R1.fastq


[quant] fragment length distribution is truncated gaussian with mean = 76, sd = 0.2
[index] k-mer length: 31
[index] number of targets: 148,557
[index] number of k-mers: 74,111,966
[index] number of equivalence classes: 349,214
[quant] running in single-end mode
[quant] will process file 1: /Users/yaaminivenkataraman/Documents/School/Year1/FISH-546/yaaminiv-fish546-2016/data/filtered_106A_Female_Mix_GATCAG_L004_R1.fastq
[quant] finding pseudoalignments for the reads ... done
[quant] processed 39,823,239 reads, 34,794,164 reads pseudoaligned
[   em] quantifying the abundances ... done
[   em] the Expectation-Maximization algorithm ran for 1,194 rounds



[`kallisto quant` analysis results]()

### **2. filtered_106A_Male_Mix_TAGCTT_L004_R1.fastq**

In [8]:
!/Applications/kallisto/kallisto quant \
-i /Users/yaaminivenkataraman/Documents/School/Year1/FISH-546/yaaminiv-fish546-2016/data/kallisto-index-OlyO-v6 \
-o /Users/yaaminivenkataraman/Documents/School/Year1/FISH-546/yaaminiv-fish546-2016/analyses/kallisto-male-106 \
--single \
-l 76 \
-s .20 \
/Users/yaaminivenkataraman/Documents/School/Year1/FISH-546/yaaminiv-fish546-2016/data/filtered_106A_Male_Mix_TAGCTT_L004_R1.fastq


[quant] fragment length distribution is truncated gaussian with mean = 76, sd = 0.2
[index] k-mer length: 31
[index] number of targets: 148,557
[index] number of k-mers: 74,111,966
[index] number of equivalence classes: 349,214
[quant] running in single-end mode
[quant] will process file 1: /Users/yaaminivenkataraman/Documents/School/Year1/FISH-546/yaaminiv-fish546-2016/data/filtered_106A_Male_Mix_TAGCTT_L004_R1.fastq
[quant] finding pseudoalignments for the reads ... done
[quant] processed 59,446,949 reads, 54,059,394 reads pseudoaligned
[   em] quantifying the abundances ... done
[   em] the Expectation-Maximization algorithm ran for 1,141 rounds



[`kallisto quant` analysis results]()

### **3. filtered_108A_Female_Mix_GGCTAC_L004_R1.fastq**

In [9]:
!/Applications/kallisto/kallisto quant \
-i /Users/yaaminivenkataraman/Documents/School/Year1/FISH-546/yaaminiv-fish546-2016/data/kallisto-index-OlyO-v6 \
-o /Users/yaaminivenkataraman/Documents/School/Year1/FISH-546/yaaminiv-fish546-2016/analyses/kallisto-female-108 \
--single \
-l 76 \
-s .20 \
/Users/yaaminivenkataraman/Documents/School/Year1/FISH-546/yaaminiv-fish546-2016/data/filtered_108A_Female_Mix_GGCTAC_L004_R1.fastq


[quant] fragment length distribution is truncated gaussian with mean = 76, sd = 0.2
[index] k-mer length: 31
[index] number of targets: 148,557
[index] number of k-mers: 74,111,966
[index] number of equivalence classes: 349,214
[quant] running in single-end mode
[quant] will process file 1: /Users/yaaminivenkataraman/Documents/School/Year1/FISH-546/yaaminiv-fish546-2016/data/filtered_108A_Female_Mix_GGCTAC_L004_R1.fastq
[quant] finding pseudoalignments for the reads ... done
[quant] processed 45,936,627 reads, 41,710,716 reads pseudoaligned
[   em] quantifying the abundances ... done
[   em] the Expectation-Maximization algorithm ran for 1,171 rounds



[`kallisto quant` analysis results]()

### **4. filtered_108A_Male_Mix_AGTCAA_L004_R1.fastq**

In [10]:
!/Applications/kallisto/kallisto quant \
-i /Users/yaaminivenkataraman/Documents/School/Year1/FISH-546/yaaminiv-fish546-2016/data/kallisto-index-OlyO-v6 \
-o /Users/yaaminivenkataraman/Documents/School/Year1/FISH-546/yaaminiv-fish546-2016/analyses/kallisto-male-108 \
--single \
-l 76 \
-s .20 \
/Users/yaaminivenkataraman/Documents/School/Year1/FISH-546/yaaminiv-fish546-2016/data/filtered_108A_Male_Mix_AGTCAA_L004_R1.fastq


[quant] fragment length distribution is truncated gaussian with mean = 76, sd = 0.2
[index] k-mer length: 31
[index] number of targets: 148,557
[index] number of k-mers: 74,111,966
[index] number of equivalence classes: 349,214
[quant] running in single-end mode
[quant] will process file 1: /Users/yaaminivenkataraman/Documents/School/Year1/FISH-546/yaaminiv-fish546-2016/data/filtered_108A_Male_Mix_AGTCAA_L004_R1.fastq
[quant] finding pseudoalignments for the reads ... done
[quant] processed 55,791,565 reads, 50,931,304 reads pseudoaligned
[   em] quantifying the abundances ... done
[   em] the Expectation-Maximization algorithm ran for 1,285 rounds



[`kallisto quant` analysis results]()

I successfully used `kallisto` to estimate the abundances of my sequences! I can now take that count data information and use it with DESeq2 to analyze differential expression in my samples.