# Setup
**Environment:** qiime2-2020.11

## How to use this notebook:
1. [Add R kernel](https://irkernel.github.io/installation/#binary-panel) to jupyter notebook


2. Activate qiime2 conda environment
   ```
   conda activate qiime2-2020.11 
   ```
    
3. Launch Jupyter notebook
   ```
   jupyter notebook
   ```  

In [None]:
# change working directory to the project root directory
%cd ..

##  Download SILVA132 reference sequences and taxonomy

In [None]:
# Download SILVA132
!wget -P data/reference https://www.arb-silva.de/fileadmin/silva_databases/qiime/Silva_132_release.zip

# Decompress and delete the downloaded zip file 
!unzip data/reference/Silva_132_release.zip -d data/reference/silva_132 && rm -f data/reference/Silva_132_release.zip

# Copy and rename the reference sequence and taxonomy file
!cp data/reference/silva_132/SILVA_132_QIIME_release/rep_set/rep_set_16S_only/99/silva_132_99_16S.fna data/reference
!cp data/reference/silva_132/SILVA_132_QIIME_release/taxonomy/16S_only/99/consensus_taxonomy_7_levels.txt data/reference
!mv data/reference/consensus_taxonomy_7_levels.txt data/reference/silva_132_consensus_taxonomy_l7.txt

# Delete data to free up disk space
!rm -rf data/reference/silva_132

##  Download SILVA128 reference phylogeny

In [None]:
!wget -P data/reference https://data.qiime2.org/2020.11/common/sepp-refs-silva-128.qza

##  Download raw sequence data from NCBI SRA database

###  Download sequence using grabseqs

In [None]:
# run1; change the number of threads to use (-t) based on the available resources in your local computer
!grabseqs sra $(cat data/raw/casava-18-paired-end-demultiplexed-run1/SRR_Acc_Run1.txt) -m metadata.csv -o data/raw/casava-18-paired-end-demultiplexed-run1/ -r 3 -t 16

# run2; change the number of threads to use (-t) based on the available resources in your local computer
!grabseqs sra $(cat data/raw/casava-18-paired-end-demultiplexed-run2/SRR_Acc_Run2.txt) -m metadata.csv -o data/raw/casava-18-paired-end-demultiplexed-run2/ -r 3 -t 16

### Rename downloaded fastq files

In [None]:
# R script. Switch to R kernel when running this cell
    
# change working directory to the project root directory
setwd('..')

# run1 ######################################################################################################## 
# file path
path1 <- "data/raw/casava-18-paired-end-demultiplexed-run1/"

# metadata
mtd1 <- read.csv(paste0(path1, "metadata.csv"))

# make a lookup table for renaming fastq files
## 1.select column "Run" and "SampleName" 
lookup1 <- mtd1[, c("Run", "SampleName")] 

## 2.duplicate samples (rows)
lookup1 <- lookup1[rep(seq_len(nrow(lookup1)), each = 2), ] 

## 3.add forward and reverse read index
lookup1$Index <- rep(1:2, nrow(mtd1))

## 4.make original file names 
lookup1$Run <- paste0(path1, lookup1$Run, "_", lookup1$Index, ".fastq.gz")

## 5.make desired file names 
lookup1$SampleName <- paste0(path1, lookup1$SampleName, "_R", lookup1$Index, "_001.fastq.gz") 

# rename fastq files
file.rename(from = lookup1[["Run"]], to = lookup1[["SampleName"]])

# run2 ########################################################################################################
# file path
path2 <- "data/raw/casava-18-paired-end-demultiplexed-run2/"

# metadata
mtd2 <- read.csv(paste0(path2, "metadata.csv"))

# lookup table for renaming fastq files
lookup2 <- mtd2[, c("Run", "SampleName")] 
lookup2 <- lookup2[rep(seq_len(nrow(lookup2)), each = 2), ] 
lookup2$Index <- rep(1:2, nrow(mtd2))
lookup2$Run <- paste0(path2, lookup2$Run, "_", lookup2$Index, ".fastq.gz")
lookup2$SampleName <- paste0(path2, lookup2$SampleName, "_R", lookup2$Index, "_001.fastq.gz") 

# rename fastq files
file.rename(from = lookup2[["Run"]], to = lookup2[["SampleName"]])