# Processing marker-gene data in QIIME2, part2

**Environment:** qiime2-2020.11

## How to use this notebook:
1. Activate the `qiime2-2020.11` conda environment.
    ```
   source $HOME/miniconda3/bin/activate # use the path in your local machine to activate miniconda
   conda activate qiime2-2020.11 # activate qiime2 conda environment
    ```
2. Install additional dependencies:
    ```
   conda install -c conda-forge deicode
   qiime dev refresh-cache
    ```  

3. Restart and run the notebook:
    ```
   jupyter notebook
    ```
      

In [1]:
## Hide excessive warnings (optional):
import warnings
warnings.filterwarnings('ignore')

In [2]:
## change working directory to the project root directory
%cd ..

/media/md0/nutrition_group Dropbox/projects/yanxian/Li_AqFl1-Microbiota_2020


#  Analyzing sequences at ASV level 

##  Import and visualize the filtered feature table

In [3]:
# Import table
!qiime tools import \
  --input-path data/intermediate/filtering/table-filtered.biom \
  --type 'FeatureTable[Frequency]' \
  --input-format BIOMV100Format \
  --output-path data/intermediate/qiime2/asv/table-filtered.qza

# Feature table summary 
!qiime feature-table summarize \
  --i-table data/intermediate/qiime2/asv/table-filtered.qza \
  --m-sample-metadata-file data/metadata.tsv \
  --o-visualization data/intermediate/qiime2/asv/table-filtered.qzv 

# Filtered taxonomic barplot 
!qiime taxa barplot \
  --i-table data/intermediate/qiime2/asv/table-filtered.qza \
  --i-taxonomy data/intermediate/qiime2/asv/taxonomy-silva132.qza \
  --m-metadata-file data/metadata.tsv \
  --o-visualization data/intermediate/qiime2/asv/taxa-bar-plots-filtered.qzv

[32mImported data/intermediate/filtering/table-filtered.biom as BIOMV100Format to data/intermediate/qiime2/asv/table-filtered.qza[0m
[32mSaved Visualization to: data/intermediate/qiime2/asv/table-filtered.qzv[0m
[32mSaved Visualization to: data/intermediate/qiime2/asv/taxa-bar-plots-filtered.qzv[0m


##  Phylogeny 

###  Filter representative sequences based on the filtered feature table

In [4]:
!qiime feature-table filter-seqs \
  --i-data data/intermediate/qiime2/asv/rep-seqs.qza \
  --i-table data/intermediate/qiime2/asv/table-filtered.qza \
  --p-no-exclude-ids \
  --o-filtered-data data/intermediate/qiime2/asv/rep-seqs-filtered.qza

!qiime feature-table tabulate-seqs \
  --i-data data/intermediate/qiime2/asv/rep-seqs-filtered.qza \
  --o-visualization data/intermediate/qiime2/asv/rep-seqs-filtered.qzv

[32mSaved FeatureData[Sequence] to: data/intermediate/qiime2/asv/rep-seqs-filtered.qza[0m
[32mSaved Visualization to: data/intermediate/qiime2/asv/rep-seqs-filtered.qzv[0m


###  Reference-based fragment insertion with SEPP

In [5]:
%%time
!qiime fragment-insertion sepp \
  --i-representative-sequences data/intermediate/qiime2/asv/rep-seqs-filtered.qza \
  --i-reference-database data/reference/sepp-refs-silva-128.qza \
  --o-tree data/intermediate/qiime2/asv/insertion-tree.qza \
  --o-placements data/intermediate/qiime2/asv/tree-placements.qza \
  --p-threads 16 \
  --p-debug

[32mSaved Phylogeny[Rooted] to: data/intermediate/qiime2/asv/insertion-tree.qza[0m
[32mSaved Placements to: data/intermediate/qiime2/asv/tree-placements.qza[0m
CPU times: user 10min 36s, sys: 1min 41s, total: 12min 17s
Wall time: 3h 12min 41s


### Filter uninserted representative sequences from the feature table  

In [6]:
!qiime fragment-insertion filter-features \
  --i-table data/intermediate/qiime2/asv/table-filtered.qza \
  --i-tree data/intermediate/qiime2/asv/insertion-tree.qza \
  --o-filtered-table data/intermediate/qiime2/asv/table-filtered-sepp-inserted.qza \
  --o-removed-table data/intermediate/qiime2/asv/table-filtered-sepp-uninserted.qza \
  --verbose

[32mSaved FeatureTable[Frequency] to: data/intermediate/qiime2/asv/table-filtered-sepp-inserted.qza[0m
[32mSaved FeatureTable[Frequency] to: data/intermediate/qiime2/asv/table-filtered-sepp-uninserted.qza[0m


### Feature table summary

In [7]:
!qiime feature-table summarize \
  --i-table data/intermediate/qiime2/asv/table-filtered-sepp-inserted.qza \
  --m-sample-metadata-file data/metadata.tsv \
  --o-visualization data/intermediate/qiime2/asv/table-filtered-sepp-inserted.qzv 

!qiime feature-table summarize \
  --i-table data/intermediate/qiime2/asv/table-filtered-sepp-uninserted.qza \
  --m-sample-metadata-file data/metadata.tsv \
  --o-visualization data/intermediate/qiime2/asv/table-filtered-sepp-uninserted.qzv 

[32mSaved Visualization to: data/intermediate/qiime2/asv/table-filtered-sepp-inserted.qzv[0m
[32mSaved Visualization to: data/intermediate/qiime2/asv/table-filtered-sepp-uninserted.qzv[0m


## Quality control: taxonomic composition of mock samples

### Import expected taxonomic composition of mock samples

In [8]:
!biom convert \
  -i data/reference/mock_expected.tsv \
  -o data/reference/mock-expected.biom \
  --table-type="OTU table" \
  --to-hdf5

!qiime tools import \
  --input-path data/reference/mock-expected.biom \
  --type 'FeatureTable[RelativeFrequency]' \
  --input-format BIOMV210Format \
  --output-path data/reference/mock-expected.qza

[32mImported data/reference/mock-expected.biom as BIOMV210Format to data/reference/mock-expected.qza[0m


### Get the observed taxonomic composition of mock samples 

In [9]:
# Subset mock samples
!qiime feature-table filter-samples \
  --i-table data/intermediate/qiime2/asv/table-filtered.qza \
  --m-metadata-file data/metadata.tsv \
  --p-where "SampleType='Mock'" \
  --p-no-exclude-ids \
  --o-filtered-table data/intermediate/qiime2/asv/quality-control/mock-observed.qza

# Inspect the taxonomic composition of mock samples at ASV level
!qiime taxa barplot \
  --i-table data/intermediate/qiime2/asv/quality-control/mock-observed.qza \
  --i-taxonomy data/intermediate/qiime2/asv/taxonomy-silva132.qza \
  --m-metadata-file data/metadata.tsv \
  --o-visualization data/intermediate/qiime2/asv/quality-control/mock-observed.qzv

# Agglomerate taxa at species level
!qiime taxa collapse \
  --i-table data/intermediate/qiime2/asv/quality-control/mock-observed.qza \
  --i-taxonomy data/intermediate/qiime2/asv/taxonomy-silva132.qza \
  --p-level 7 \
  --o-collapsed-table data/intermediate/qiime2/asv/quality-control/mock-observed-l7.qza

# Convert sequence counts into relative abundances
!qiime feature-table relative-frequency \
  --i-table data/intermediate/qiime2/asv/quality-control/mock-observed-l7.qza \
  --o-relative-frequency-table data/intermediate/qiime2/asv/quality-control/mock-observed-l7-rel.qza

[32mSaved FeatureTable[Frequency] to: data/intermediate/qiime2/asv/quality-control/mock-observed.qza[0m
[32mSaved Visualization to: data/intermediate/qiime2/asv/quality-control/mock-observed.qzv[0m
[32mSaved FeatureTable[Frequency] to: data/intermediate/qiime2/asv/quality-control/mock-observed-l7.qza[0m
[32mSaved FeatureTable[RelativeFrequency] to: data/intermediate/qiime2/asv/quality-control/mock-observed-l7-rel.qza[0m


### Compare observed and expected taxonomic composition of mock samples

In [10]:
!qiime quality-control evaluate-composition \
  --i-expected-features data/reference/mock-expected.qza \
  --i-observed-features data/intermediate/qiime2/asv/quality-control/mock-observed-l7-rel.qza \
  --o-visualization data/intermediate/qiime2/asv/quality-control/mock-comparison.qzv

[32mSaved Visualization to: data/intermediate/qiime2/asv/quality-control/mock-comparison.qzv[0m


## Alpha and beta diversity analysis

### Exclude mock and negative control samples from the feature table

In [11]:
!qiime feature-table filter-samples \
  --i-table data/intermediate/qiime2/asv/table-filtered-sepp-inserted.qza \
  --m-metadata-file data/metadata.tsv \
  --p-where "SampleType IN ('Mock', 'Extraction-blank', 'PCR-blank')" \
  --p-exclude-ids \
  --o-filtered-table data/intermediate/qiime2/asv/table-filtered-sepp-inserted-no-control.qza

[32mSaved FeatureTable[Frequency] to: data/intermediate/qiime2/asv/table-filtered-sepp-inserted-no-control.qza[0m


### Rarefaction analysis 

In [12]:
!qiime diversity alpha-rarefaction \
  --i-table data/intermediate/qiime2/asv/table-filtered-sepp-inserted-no-control.qza \
  --i-phylogeny data/intermediate/qiime2/asv/insertion-tree.qza \
  --p-max-depth 10601 \
  --p-steps 10 \
  --m-metadata-file data/metadata.tsv \
  --o-visualization data/intermediate/qiime2/asv/alpha-rarefaction.qzv


[32mSaved Visualization to: data/intermediate/qiime2/asv/alpha-rarefaction.qzv[0m


### Generate core metric results

In [13]:
!qiime diversity core-metrics-phylogenetic \
  --i-phylogeny data/intermediate/qiime2/asv/insertion-tree.qza \
  --i-table data/intermediate/qiime2/asv/table-filtered-sepp-inserted-no-control.qza \
  --m-metadata-file data/metadata.tsv \
  --p-sampling-depth 10601 \
  --output-dir data/intermediate/qiime2/asv/core-metrics-results

[32mSaved FeatureTable[Frequency] to: data/intermediate/qiime2/asv/core-metrics-results/rarefied_table.qza[0m
[32mSaved SampleData[AlphaDiversity] to: data/intermediate/qiime2/asv/core-metrics-results/faith_pd_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: data/intermediate/qiime2/asv/core-metrics-results/observed_features_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: data/intermediate/qiime2/asv/core-metrics-results/shannon_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: data/intermediate/qiime2/asv/core-metrics-results/evenness_vector.qza[0m
[32mSaved DistanceMatrix to: data/intermediate/qiime2/asv/core-metrics-results/unweighted_unifrac_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: data/intermediate/qiime2/asv/core-metrics-results/weighted_unifrac_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: data/intermediate/qiime2/asv/core-metrics-results/jaccard_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: data/intermediate/qiime2/asv/c

### Compare beta-diversity using robust Aitchison PCA 

In [14]:
!qiime deicode rpca \
  --i-table data/intermediate/qiime2/asv/table-filtered-sepp-inserted-no-control.qza \
  --p-min-feature-count 10 \
  --p-min-sample-count 1000 \
  --output-dir data/intermediate/qiime2/asv/robust-Aitchison-pca

!qiime emperor biplot \
  --i-biplot data/intermediate/qiime2/asv/robust-Aitchison-pca/biplot.qza \
  --m-sample-metadata-file data/metadata.tsv \
  --m-feature-metadata-file data/intermediate/qiime2/asv/taxonomy-silva132.qza \
  --o-visualization data/intermediate/qiime2/asv/robust-Aitchison-pca/biplot.qzv \
  --p-number-of-features 8

[32mSaved PCoAResults % Properties('biplot') to: data/intermediate/qiime2/asv/robust-Aitchison-pca/biplot.qza[0m
[32mSaved DistanceMatrix to: data/intermediate/qiime2/asv/robust-Aitchison-pca/distance_matrix.qza[0m
[32mSaved Visualization to: data/intermediate/qiime2/asv/robust-Aitchison-pca/biplot.qzv[0m


# Analyzing sequences at OTU level: 99% OTU

## Cluster sequences

### De novo OTU picking

In [15]:
%%time
!qiime vsearch cluster-features-de-novo \
  --i-table data/intermediate/qiime2/asv/table-filtered.qza \
  --i-sequences data/intermediate/qiime2/asv/rep-seqs-filtered.qza \
  --p-perc-identity 0.99 \
  --p-threads 16 \
  --o-clustered-table data/intermediate/qiime2/99otu/table-filtered-99otu.qza \
  --o-clustered-sequences data/intermediate/qiime2/99otu/rep-seqs-filtered-99otu.qza

[32mSaved FeatureTable[Frequency] to: data/intermediate/qiime2/99otu/table-filtered-99otu.qza[0m
[32mSaved FeatureData[Sequence] to: data/intermediate/qiime2/99otu/rep-seqs-filtered-99otu.qza[0m
CPU times: user 112 ms, sys: 61.7 ms, total: 174 ms
Wall time: 7.71 s


### Visualize the clustered feature table

In [16]:
# Feature table summary 
!qiime feature-table summarize \
  --i-table data/intermediate/qiime2/99otu/table-filtered-99otu.qza \
  --m-sample-metadata-file data/metadata.tsv \
  --o-visualization data/intermediate/qiime2/99otu/table-filtered-99otu.qzv 

# Taxonomic barplot 
!qiime taxa barplot \
  --i-table data/intermediate/qiime2/99otu/table-filtered-99otu.qza \
  --i-taxonomy data/intermediate/qiime2/asv/taxonomy-silva132.qza \
  --m-metadata-file data/metadata.tsv \
  --o-visualization data/intermediate/qiime2/99otu/taxa-bar-plots-filtered-99otu.qzv

[32mSaved Visualization to: data/intermediate/qiime2/99otu/table-filtered-99otu.qzv[0m
[32mSaved Visualization to: data/intermediate/qiime2/99otu/taxa-bar-plots-filtered-99otu.qzv[0m


## Phylogeny

### Reference-based fragment insertion with SEPP

In [17]:
%%time
!qiime fragment-insertion sepp \
  --i-representative-sequences data/intermediate/qiime2/99otu/rep-seqs-filtered-99otu.qza \
  --i-reference-database data/reference/sepp-refs-silva-128.qza \
  --o-tree data/intermediate/qiime2/99otu/insertion-tree-99otu.qza \
  --o-placements data/intermediate/qiime2/99otu/tree-placements-99otu.qza \
  --p-threads 16 \
  --p-debug

[32mSaved Phylogeny[Rooted] to: data/intermediate/qiime2/99otu/insertion-tree-99otu.qza[0m
[32mSaved Placements to: data/intermediate/qiime2/99otu/tree-placements-99otu.qza[0m
CPU times: user 4min 59s, sys: 49.5 s, total: 5min 49s
Wall time: 1h 43min 37s


### Filter uninserted sequences from the feature table

In [18]:
!qiime fragment-insertion filter-features \
  --i-table data/intermediate/qiime2/99otu/table-filtered-99otu.qza \
  --i-tree data/intermediate/qiime2/99otu/insertion-tree-99otu.qza \
  --o-filtered-table data/intermediate/qiime2/99otu/table-filtered-99otu-sepp-inserted.qza \
  --o-removed-table data/intermediate/qiime2/99otu/table-filtered-99otu-sepp-uninserted.qza \
  --verbose

[32mSaved FeatureTable[Frequency] to: data/intermediate/qiime2/99otu/table-filtered-99otu-sepp-inserted.qza[0m
[32mSaved FeatureTable[Frequency] to: data/intermediate/qiime2/99otu/table-filtered-99otu-sepp-uninserted.qza[0m


### Feature table summary 

In [19]:
!qiime feature-table summarize \
  --i-table data/intermediate/qiime2/99otu/table-filtered-99otu-sepp-inserted.qza \
  --m-sample-metadata-file data/metadata.tsv \
  --o-visualization data/intermediate/qiime2/99otu/table-filtered-99otu-sepp-inserted.qzv 

!qiime feature-table summarize \
  --i-table data/intermediate/qiime2/99otu/table-filtered-99otu-sepp-uninserted.qza \
  --m-sample-metadata-file data/metadata.tsv \
  --o-visualization data/intermediate/qiime2/99otu/table-filtered-99otu-sepp-uninserted.qzv 

[32mSaved Visualization to: data/intermediate/qiime2/99otu/table-filtered-99otu-sepp-inserted.qzv[0m
[32mSaved Visualization to: data/intermediate/qiime2/99otu/table-filtered-99otu-sepp-uninserted.qzv[0m


## Quality control: taxonomic composition of mock samples

### Get the observed taxonomic composition of mock samples 

In [20]:
# Subset mock samples
!qiime feature-table filter-samples \
  --i-table data/intermediate/qiime2/99otu/table-filtered-99otu-sepp-inserted.qza \
  --m-metadata-file data/metadata.tsv \
  --p-where "SampleType='Mock'" \
  --p-no-exclude-ids \
  --o-filtered-table data/intermediate/qiime2/99otu/quality-control/mock-observed.qza

# Inspect the taxonomic composition of mock samples at OTU level
!qiime taxa barplot \
  --i-table data/intermediate/qiime2/99otu/quality-control/mock-observed.qza \
  --i-taxonomy data/intermediate/qiime2/asv/taxonomy-silva132.qza \
  --m-metadata-file data/metadata.tsv \
  --o-visualization data/intermediate/qiime2/99otu/quality-control/mock-observed.qzv

# Agglomerate taxa at species level
!qiime taxa collapse \
  --i-table data/intermediate/qiime2/99otu/quality-control/mock-observed.qza \
  --i-taxonomy data/intermediate/qiime2/asv/taxonomy-silva132.qza \
  --p-level 7 \
  --o-collapsed-table data/intermediate/qiime2/99otu/quality-control/mock-observed-l7.qza

# Convert sequence counts into relative abundances
!qiime feature-table relative-frequency \
  --i-table data/intermediate/qiime2/99otu/quality-control/mock-observed-l7.qza \
  --o-relative-frequency-table data/intermediate/qiime2/99otu/quality-control/mock-observed-l7-rel.qza

[32mSaved FeatureTable[Frequency] to: data/intermediate/qiime2/99otu/quality-control/mock-observed.qza[0m
[32mSaved Visualization to: data/intermediate/qiime2/99otu/quality-control/mock-observed.qzv[0m
[32mSaved FeatureTable[Frequency] to: data/intermediate/qiime2/99otu/quality-control/mock-observed-l7.qza[0m
[32mSaved FeatureTable[RelativeFrequency] to: data/intermediate/qiime2/99otu/quality-control/mock-observed-l7-rel.qza[0m


### Compare observed and expected taxonomic composition of mock samples

In [21]:
!qiime quality-control evaluate-composition \
  --i-expected-features data/reference/mock-expected.qza \
  --i-observed-features data/intermediate/qiime2/99otu/quality-control/mock-observed-l7-rel.qza \
  --o-visualization data/intermediate/qiime2/99otu/quality-control/mock-comparison.qzv

[32mSaved Visualization to: data/intermediate/qiime2/99otu/quality-control/mock-comparison.qzv[0m


## Alpha and beta diversity analysis

### Exclude mock and negative control samples from the feature table

In [22]:
!qiime feature-table filter-samples \
  --i-table data/intermediate/qiime2/99otu/table-filtered-99otu-sepp-inserted.qza \
  --m-metadata-file data/metadata.tsv \
  --p-where "SampleType IN ('Mock', 'Extraction-blank', 'PCR-blank')" \
  --p-exclude-ids \
  --o-filtered-table data/intermediate/qiime2/99otu/table-filtered-99otu-sepp-inserted-no-control.qza

[32mSaved FeatureTable[Frequency] to: data/intermediate/qiime2/99otu/table-filtered-99otu-sepp-inserted-no-control.qza[0m


### Rarefaction analysis 

In [23]:
!qiime diversity alpha-rarefaction \
  --i-table data/intermediate/qiime2/99otu/table-filtered-99otu-sepp-inserted-no-control.qza \
  --i-phylogeny data/intermediate/qiime2/99otu/insertion-tree-99otu.qza \
  --p-max-depth 10601 \
  --p-steps 10 \
  --m-metadata-file data/metadata.tsv \
  --o-visualization data/intermediate/qiime2/99otu/alpha-rarefaction-99otu.qzv

[32mSaved Visualization to: data/intermediate/qiime2/99otu/alpha-rarefaction-99otu.qzv[0m


### Generate core metric results

In [24]:
!qiime diversity core-metrics-phylogenetic \
  --i-phylogeny data/intermediate/qiime2/99otu/insertion-tree-99otu.qza \
  --i-table data/intermediate/qiime2/99otu/table-filtered-99otu-sepp-inserted-no-control.qza \
  --m-metadata-file data/metadata.tsv \
  --p-sampling-depth 10601 \
  --output-dir data/intermediate/qiime2/99otu/core-metrics-results

[32mSaved FeatureTable[Frequency] to: data/intermediate/qiime2/99otu/core-metrics-results/rarefied_table.qza[0m
[32mSaved SampleData[AlphaDiversity] to: data/intermediate/qiime2/99otu/core-metrics-results/faith_pd_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: data/intermediate/qiime2/99otu/core-metrics-results/observed_features_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: data/intermediate/qiime2/99otu/core-metrics-results/shannon_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: data/intermediate/qiime2/99otu/core-metrics-results/evenness_vector.qza[0m
[32mSaved DistanceMatrix to: data/intermediate/qiime2/99otu/core-metrics-results/unweighted_unifrac_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: data/intermediate/qiime2/99otu/core-metrics-results/weighted_unifrac_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: data/intermediate/qiime2/99otu/core-metrics-results/jaccard_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: data/intermedi

### Compare beta-diversity using robust Aitchison PCA 

In [25]:
!qiime deicode rpca \
  --i-table data/intermediate/qiime2/99otu/table-filtered-99otu-sepp-inserted-no-control.qza \
  --p-min-feature-count 10 \
  --p-min-sample-count 1000 \
  --output-dir data/intermediate/qiime2/99otu/robust-Aitchison-pca

!qiime emperor biplot \
  --i-biplot data/intermediate/qiime2/99otu/robust-Aitchison-pca/biplot.qza \
  --m-sample-metadata-file data/metadata.tsv \
  --m-feature-metadata-file data/intermediate/qiime2/asv/taxonomy-silva132.qza \
  --o-visualization data/intermediate/qiime2/99otu/robust-Aitchison-pca/biplot.qzv \
  --p-number-of-features 8

[32mSaved PCoAResults % Properties('biplot') to: data/intermediate/qiime2/99otu/robust-Aitchison-pca/biplot.qza[0m
[32mSaved DistanceMatrix to: data/intermediate/qiime2/99otu/robust-Aitchison-pca/distance_matrix.qza[0m
[32mSaved Visualization to: data/intermediate/qiime2/99otu/robust-Aitchison-pca/biplot.qzv[0m


# Analyzing sequences at OTU level: 97% OTU

## Cluster sequences

### De novo OTU picking

In [26]:
!qiime vsearch cluster-features-de-novo \
  --i-table data/intermediate/qiime2/asv/table-filtered.qza \
  --i-sequences data/intermediate/qiime2/asv/rep-seqs-filtered.qza \
  --p-perc-identity 0.97 \
  --p-threads 16 \
  --o-clustered-table data/intermediate/qiime2/97otu/table-filtered-97otu.qza \
  --o-clustered-sequences data/intermediate/qiime2/97otu/rep-seqs-filtered-97otu.qza

[32mSaved FeatureTable[Frequency] to: data/intermediate/qiime2/97otu/table-filtered-97otu.qza[0m
[32mSaved FeatureData[Sequence] to: data/intermediate/qiime2/97otu/rep-seqs-filtered-97otu.qza[0m


### Visualize the clustered feature table

In [27]:
# Feature table summary 
!qiime feature-table summarize \
  --i-table data/intermediate/qiime2/97otu/table-filtered-97otu.qza \
  --m-sample-metadata-file data/metadata.tsv \
  --o-visualization data/intermediate/qiime2/97otu/table-filtered-97otu.qzv 

# Taxonomic barplot 
!qiime taxa barplot \
  --i-table data/intermediate/qiime2/97otu/table-filtered-97otu.qza \
  --i-taxonomy data/intermediate/qiime2/asv/taxonomy-silva132.qza \
  --m-metadata-file data/metadata.tsv \
  --o-visualization data/intermediate/qiime2/97otu/taxa-bar-plots-filtered-97otu.qzv

[32mSaved Visualization to: data/intermediate/qiime2/97otu/table-filtered-97otu.qzv[0m
[32mSaved Visualization to: data/intermediate/qiime2/97otu/taxa-bar-plots-filtered-97otu.qzv[0m


## Phylogeny

### Reference-based fragment insertion with SEPP

In [28]:
%%time
!qiime fragment-insertion sepp \
  --i-representative-sequences data/intermediate/qiime2/97otu/rep-seqs-filtered-97otu.qza \
  --i-reference-database data/reference/sepp-refs-silva-128.qza \
  --o-tree data/intermediate/qiime2/97otu/insertion-tree-97otu.qza \
  --o-placements data/intermediate/qiime2/97otu/tree-placements-97otu.qza \
  --p-threads 16 \
  --p-debug

[32mSaved Phylogeny[Rooted] to: data/intermediate/qiime2/97otu/insertion-tree-97otu.qza[0m
[32mSaved Placements to: data/intermediate/qiime2/97otu/tree-placements-97otu.qza[0m
CPU times: user 3min 24s, sys: 34.3 s, total: 3min 58s
Wall time: 1h 20min 36s


### Filter uninserted sequences from the feature table

In [29]:
!qiime fragment-insertion filter-features \
  --i-table data/intermediate/qiime2/97otu/table-filtered-97otu.qza \
  --i-tree data/intermediate/qiime2/97otu/insertion-tree-97otu.qza \
  --o-filtered-table data/intermediate/qiime2/97otu/table-filtered-97otu-sepp-inserted.qza \
  --o-removed-table data/intermediate/qiime2/97otu/table-filtered-97otu-sepp-uninserted.qza \
  --verbose

[32mSaved FeatureTable[Frequency] to: data/intermediate/qiime2/97otu/table-filtered-97otu-sepp-inserted.qza[0m
[32mSaved FeatureTable[Frequency] to: data/intermediate/qiime2/97otu/table-filtered-97otu-sepp-uninserted.qza[0m


### Feature table summary 

In [30]:
!qiime feature-table summarize \
  --i-table data/intermediate/qiime2/97otu/table-filtered-97otu-sepp-inserted.qza \
  --m-sample-metadata-file data/metadata.tsv \
  --o-visualization data/intermediate/qiime2/97otu/table-filtered-97otu-sepp-inserted.qzv 

!qiime feature-table summarize \
  --i-table data/intermediate/qiime2/97otu/table-filtered-97otu-sepp-uninserted.qza \
  --m-sample-metadata-file data/metadata.tsv \
  --o-visualization data/intermediate/qiime2/97otu/table-filtered-97otu-sepp-uninserted.qzv 

[32mSaved Visualization to: data/intermediate/qiime2/97otu/table-filtered-97otu-sepp-inserted.qzv[0m
[32mSaved Visualization to: data/intermediate/qiime2/97otu/table-filtered-97otu-sepp-uninserted.qzv[0m


## Quality control: taxonomic composition of mock samples

### Get the observed taxonomic composition of mock samples 

In [31]:
# Subset mock samples
!qiime feature-table filter-samples \
  --i-table data/intermediate/qiime2/97otu/table-filtered-97otu-sepp-inserted.qza \
  --m-metadata-file data/metadata.tsv \
  --p-where "SampleType='Mock'" \
  --p-no-exclude-ids \
  --o-filtered-table data/intermediate/qiime2/97otu/quality-control/mock-observed.qza

# Inspect the taxonomic composition of mock samples at OTU level
!qiime taxa barplot \
  --i-table data/intermediate/qiime2/97otu/quality-control/mock-observed.qza \
  --i-taxonomy data/intermediate/qiime2/asv/taxonomy-silva132.qza \
  --m-metadata-file data/metadata.tsv \
  --o-visualization data/intermediate/qiime2/97otu/quality-control/mock-observed.qzv

# Agglomerate taxa at species level
!qiime taxa collapse \
  --i-table data/intermediate/qiime2/97otu/quality-control/mock-observed.qza \
  --i-taxonomy data/intermediate/qiime2/asv/taxonomy-silva132.qza \
  --p-level 7 \
  --o-collapsed-table data/intermediate/qiime2/97otu/quality-control/mock-observed-l7.qza

# Convert sequence counts into relative abundances
!qiime feature-table relative-frequency \
  --i-table data/intermediate/qiime2/97otu/quality-control/mock-observed-l7.qza \
  --o-relative-frequency-table data/intermediate/qiime2/97otu/quality-control/mock-observed-l7-rel.qza

[32mSaved FeatureTable[Frequency] to: data/intermediate/qiime2/97otu/quality-control/mock-observed.qza[0m
[32mSaved Visualization to: data/intermediate/qiime2/97otu/quality-control/mock-observed.qzv[0m
[32mSaved FeatureTable[Frequency] to: data/intermediate/qiime2/97otu/quality-control/mock-observed-l7.qza[0m
[32mSaved FeatureTable[RelativeFrequency] to: data/intermediate/qiime2/97otu/quality-control/mock-observed-l7-rel.qza[0m


### Compare observed and expected taxonomic composition of mock samples

In [32]:
!qiime quality-control evaluate-composition \
  --i-expected-features data/reference/mock-expected.qza \
  --i-observed-features data/intermediate/qiime2/97otu/quality-control/mock-observed-l7-rel.qza \
  --o-visualization data/intermediate/qiime2/97otu/quality-control/mock-comparison.qzv

[32mSaved Visualization to: data/intermediate/qiime2/97otu/quality-control/mock-comparison.qzv[0m


## Alpha and beta diversity analysis

### Exclude mock and negative control samples from the feature table

In [33]:
!qiime feature-table filter-samples \
  --i-table data/intermediate/qiime2/97otu/table-filtered-97otu-sepp-inserted.qza \
  --m-metadata-file data/metadata.tsv \
  --p-where "SampleType IN ('Mock', 'Extraction-blank', 'PCR-blank')" \
  --p-exclude-ids \
  --o-filtered-table data/intermediate/qiime2/97otu/table-filtered-97otu-sepp-inserted-no-control.qza

[32mSaved FeatureTable[Frequency] to: data/intermediate/qiime2/97otu/table-filtered-97otu-sepp-inserted-no-control.qza[0m


### Rarefaction analysis 

In [34]:
!qiime diversity alpha-rarefaction \
  --i-table data/intermediate/qiime2/97otu/table-filtered-97otu-sepp-inserted-no-control.qza \
  --i-phylogeny data/intermediate/qiime2/97otu/insertion-tree-97otu.qza \
  --p-max-depth 10601 \
  --p-steps 10 \
  --m-metadata-file data/metadata.tsv \
  --o-visualization data/intermediate/qiime2/97otu/alpha-rarefaction-97otu.qzv

[32mSaved Visualization to: data/intermediate/qiime2/97otu/alpha-rarefaction-97otu.qzv[0m


### Generate core metric results

In [35]:
!qiime diversity core-metrics-phylogenetic \
  --i-phylogeny data/intermediate/qiime2/97otu/insertion-tree-97otu.qza \
  --i-table data/intermediate/qiime2/97otu/table-filtered-97otu-sepp-inserted-no-control.qza \
  --m-metadata-file data/metadata.tsv \
  --p-sampling-depth 10601 \
  --output-dir data/intermediate/qiime2/97otu/core-metrics-results

[32mSaved FeatureTable[Frequency] to: data/intermediate/qiime2/97otu/core-metrics-results/rarefied_table.qza[0m
[32mSaved SampleData[AlphaDiversity] to: data/intermediate/qiime2/97otu/core-metrics-results/faith_pd_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: data/intermediate/qiime2/97otu/core-metrics-results/observed_features_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: data/intermediate/qiime2/97otu/core-metrics-results/shannon_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: data/intermediate/qiime2/97otu/core-metrics-results/evenness_vector.qza[0m
[32mSaved DistanceMatrix to: data/intermediate/qiime2/97otu/core-metrics-results/unweighted_unifrac_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: data/intermediate/qiime2/97otu/core-metrics-results/weighted_unifrac_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: data/intermediate/qiime2/97otu/core-metrics-results/jaccard_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: data/intermedi

### Compare beta-diversity using robust Aitchison PCA 

In [36]:
!qiime deicode rpca \
  --i-table data/intermediate/qiime2/97otu/table-filtered-97otu-sepp-inserted-no-control.qza \
  --p-min-feature-count 10 \
  --p-min-sample-count 1000 \
  --output-dir data/intermediate/qiime2/97otu/robust-Aitchison-pca

!qiime emperor biplot \
  --i-biplot data/intermediate/qiime2/97otu/robust-Aitchison-pca/biplot.qza \
  --m-sample-metadata-file data/metadata.tsv \
  --m-feature-metadata-file data/intermediate/qiime2/asv/taxonomy-silva132.qza \
  --o-visualization data/intermediate/qiime2/97otu/robust-Aitchison-pca/biplot.qzv \
  --p-number-of-features 8

[32mSaved PCoAResults % Properties('biplot') to: data/intermediate/qiime2/97otu/robust-Aitchison-pca/biplot.qza[0m
[32mSaved DistanceMatrix to: data/intermediate/qiime2/97otu/robust-Aitchison-pca/distance_matrix.qza[0m
[32mSaved Visualization to: data/intermediate/qiime2/97otu/robust-Aitchison-pca/biplot.qzv[0m
