# Processing marker-gene data in QIIME2, part1

**Environment:** qiime2-2020.11

## How to use this notebook:
1. Activate the `qiime2-2020.11` conda environment.
    ```
   source $HOME/miniconda3/bin/activate # use the path in your local machine to activate miniconda
   conda activate qiime2-2020.11 # activate qiime2 conda environment
    ```
    
2. Launch Jupyter notebook:
    ```
   jupyter notebook
    ```  

In [1]:
## Hide excessive warnings (optional):
import warnings
warnings.filterwarnings('ignore')

In [2]:
## change working directory to the project root directory
%cd ..

/media/md0/nutrition_group Dropbox/projects/yanxian/AqFl1_microbiota


##  Import feature table and representative sequences from dada2

###  Run1

In [3]:
# Import feature table
!qiime tools import \
  --input-path data/intermediate/dada2/table-run1.biom \
  --type 'FeatureTable[Frequency]' \
  --input-format BIOMV100Format \
  --output-path data/intermediate/qiime2/asv/table-run1.qza

# Import representative sequences
!qiime tools import \
  --input-path data/intermediate/dada2/rep-seqs-run1.fna \
  --type 'FeatureData[Sequence]' \
  --output-path data/intermediate/qiime2/asv/rep-seqs-run1.qza

[32mImported data/intermediate/dada2/table-run1.biom as BIOMV100Format to data/intermediate/qiime2/asv/table-run1.qza[0m
[32mImported data/intermediate/dada2/rep-seqs-run1.fna as DNASequencesDirectoryFormat to data/intermediate/qiime2/asv/rep-seqs-run1.qza[0m


###  Run2

In [4]:
# Import feature table
!qiime tools import \
  --input-path data/intermediate/dada2/table-run2.biom \
  --type 'FeatureTable[Frequency]' \
  --input-format BIOMV100Format \
  --output-path data/intermediate/qiime2/asv/table-run2.qza

# Import representative sequences
!qiime tools import \
  --input-path data/intermediate/dada2/rep-seqs-run2.fna \
  --type 'FeatureData[Sequence]' \
  --output-path data/intermediate/qiime2/asv/rep-seqs-run2.qza

[32mImported data/intermediate/dada2/table-run2.biom as BIOMV100Format to data/intermediate/qiime2/asv/table-run2.qza[0m
[32mImported data/intermediate/dada2/rep-seqs-run2.fna as DNASequencesDirectoryFormat to data/intermediate/qiime2/asv/rep-seqs-run2.qza[0m


##  Merge feature table and representative sequences  

In [5]:
# merge feature table
!qiime feature-table merge \
  --i-tables data/intermediate/qiime2/asv/table-run1.qza \
  --i-tables data/intermediate/qiime2/asv/table-run2.qza \
  --p-overlap-method error_on_overlapping_sample \
  --o-merged-table data/intermediate/qiime2/asv/table.qza

# merge representative sequences
!qiime feature-table merge-seqs \
  --i-data data/intermediate/qiime2/asv/rep-seqs-run1.qza \
  --i-data data/intermediate/qiime2/asv/rep-seqs-run2.qza \
  --o-merged-data data/intermediate/qiime2/asv/rep-seqs.qza

[32mSaved FeatureTable[Frequency] to: data/intermediate/qiime2/asv/table.qza[0m
[32mSaved FeatureData[Sequence] to: data/intermediate/qiime2/asv/rep-seqs.qza[0m


##  Visualize the merged feature table and representative sequences  

In [6]:
# visualize feature table
!qiime feature-table summarize \
  --i-table data/intermediate/qiime2/asv/table.qza \
  --m-sample-metadata-file data/metadata.tsv \
  --o-visualization data/intermediate/qiime2/asv/table.qzv 

# visualize representative sequences
!qiime feature-table tabulate-seqs \
  --i-data data/intermediate/qiime2/asv/rep-seqs.qza \
  --o-visualization data/intermediate/qiime2/asv/rep-seqs.qzv

[32mSaved Visualization to: data/intermediate/qiime2/asv/table.qzv[0m
[32mSaved Visualization to: data/intermediate/qiime2/asv/rep-seqs.qzv[0m


## Taxonomic  assignment

### Import reference sequence and taxonomy to train the feature-classifier

In [7]:
!qiime tools import \
  --type 'FeatureData[Sequence]' \
  --input-path data/reference/silva_132_99_16S.fna \
  --output-path data/intermediate/qiime2/asv/99-otus-silva132.qza

!qiime tools import \
  --type 'FeatureData[Taxonomy]' \
  --input-format HeaderlessTSVTaxonomyFormat \
  --input-path data/reference/silva_132_consensus_taxonomy_l7.txt \
  --output-path data/intermediate/qiime2/asv/ref-taxonomy-silva132.qza

[32mImported data/reference/silva_132_99_16S.fna as DNASequencesDirectoryFormat to data/intermediate/qiime2/asv/99-otus-silva132.qza[0m
[32mImported data/reference/silva_132_consensus_taxonomy_l7.txt as HeaderlessTSVTaxonomyFormat to data/intermediate/qiime2/asv/ref-taxonomy-silva132.qza[0m


### Extract V1-2 reference reads

In [8]:
%%time
!qiime feature-classifier extract-reads \
  --i-sequences data/intermediate/qiime2/asv/99-otus-silva132.qza \
  --p-f-primer AGAGTTTGATCMTGGCTCAG \
  --p-r-primer GCWGCCWCCCGTAGGWGT \
  --p-n-jobs 16 \
  --o-reads data/intermediate/qiime2/asv/ref-seqs-silva132.qza

[32mSaved FeatureData[Sequence] to: data/intermediate/qiime2/asv/ref-seqs-silva132.qza[0m
CPU times: user 12.2 s, sys: 1.69 s, total: 13.9 s
Wall time: 12min 34s


### Train the feature classifier

In [9]:
%%time
!qiime feature-classifier fit-classifier-naive-bayes \
  --i-reference-reads data/intermediate/qiime2/asv/ref-seqs-silva132.qza \
  --i-reference-taxonomy data/intermediate/qiime2/asv/ref-taxonomy-silva132.qza \
  --o-classifier data/intermediate/qiime2/asv/silva132-99otu-27-338-classifier.qza

[32mSaved TaxonomicClassifier to: data/intermediate/qiime2/asv/silva132-99otu-27-338-classifier.qza[0m
CPU times: user 29.9 s, sys: 5.68 s, total: 35.5 s
Wall time: 23min 36s


### Assign taxonomy  using the trained featureClassifier

In [10]:
%%time
!qiime feature-classifier classify-sklearn \
  --i-classifier data/intermediate/qiime2/asv/silva132-99otu-27-338-classifier.qza \
  --i-reads data/intermediate/qiime2/asv/rep-seqs.qza \
  --p-n-jobs 16 \
  --o-classification data/intermediate/qiime2/asv/taxonomy-silva132.qza

[32mSaved FeatureData[Taxonomy] to: data/intermediate/qiime2/asv/taxonomy-silva132.qza[0m
CPU times: user 6.85 s, sys: 1.01 s, total: 7.86 s
Wall time: 6min 10s


### Visualize taxonomy 

In [11]:
# taxonomy file
!qiime metadata tabulate \
  --m-input-file data/intermediate/qiime2/asv/taxonomy-silva132.qza \
  --o-visualization data/intermediate/qiime2/asv/taxonomy-silva132.qzv

# taxonomic barplot
!qiime taxa barplot \
  --i-table data/intermediate/qiime2/asv/table.qza \
  --i-taxonomy data/intermediate/qiime2/asv/taxonomy-silva132.qza \
  --m-metadata-file data/metadata.tsv \
  --o-visualization data/intermediate/qiime2/asv/taxa-bar-plots.qzv

[32mSaved Visualization to: data/intermediate/qiime2/asv/taxonomy-silva132.qzv[0m
[32mSaved Visualization to: data/intermediate/qiime2/asv/taxa-bar-plots.qzv[0m
