# QIIME 2 enables comprehensive end-to-end analysis of diverse microbiome data and comparative studies with publicly available data

this is a QIIME 2 Artifact API notebook which replicated the QIIME 2 CLI analyses

**environment:** qiime2-2019.10

## How to use this notebook:

1. Activate the `qiime2-2019.10` conda environment.
    ```
    conda activate qiime2-2019.10
    ```

2. Make sure that `jupyter serverextensions` are enabled.  


    Close this notebook and jupyter session, and run:  
    `jupyter serverextension enable --py qiime2 --sys-prefix`  
      
3. Install additional dependencies:
    ```
    conda install songbird -c conda-forge
    conda install -c conda-forge redbiom
    conda install -c bioconda bowtie2
    pip install https://github.com/knights-lab/SHOGUN/archive/master.zip
    pip install https://github.com/qiime2/q2-shogun/archive/master.zip
    conda install cytoolz
    qiime dev refresh-cache
    ```  

4. Restart and run the notebook

# import QIIME 2 plugins and other dependencies

In [3]:
import qiime2
import warnings
import os
import subprocess
warnings.filterwarnings('ignore')

# all plugins that are being used throughout this notebook are imported here
from qiime2.plugins import composition, \
                           deblur, \
                           demux, \
                           diversity, \
                           feature_classifier, \
                           feature_table, \
                           fragment_insertion, \
                           longitudinal, \
                           metadata, \
                           quality_filter, \
                           taxa
                           #shogun, \
                           #songbird, \

## Acquire data from ECAM study 

define the working directory

In [None]:
workdir='./'

In [None]:
!mkdir $workdir/qiime2-ecam-tutorial-api
!cd $workdir/qiime2-ecam-tutorial-api

In [None]:
# NOTE: the file is 1.04GB in size
!wget -O $workdir/81253.zip "https://qiita.ucsd.edu/public_artifact_download/?artifact_id=81253"

In [None]:
!unzip $workdir/81253.zip

In [None]:
!mv $workdir/mapping_files/81253_mapping_file.txt $workdir/metadata.tsv

## Import DNA sequence data into QIIME 2 & create a visual summary

### 1. Create the manifest file with the required column headers

In [None]:
!echo "sample-id\tabsolute-filepath" > manifest.tsv

### 2. Use a loop function to insert the sample names into the sample-id column and add the full paths to the sequence files in the absolute-filepath column

In [None]:
!for f in `ls per_sample_FASTQ/81253/*.gz`; \
do n=`basename $f`; echo "12802.${n%.fastq.gz}\t$PWD/$f"; done >> manifest.tsv

### 3. Use the manifest file to import the sequences into QIIME 2

In [4]:
manifest_single_end = qiime2.Artifact.import_data('SampleData[PairedEndSequencesWithQuality]',
                                                  view_type='PairedEndFastqManifestPhred33V2',
                                                  view="/Users/dalena/Data/meta/example_rawdata/filepath.manifest1")

### 4. Create a summary of the demultiplexed artifact

In [5]:
demux_summary = demux.visualizers.summarize(manifest_single_end)

### 5. Visualize feature table

In [32]:
demux_summary.visualization

## Import metadata as an object

In [None]:
metadata_ecam = qiime2.Metadata.load(workdir+'/metadata.tsv')

## Sequence quality control and feature table construction

### 1. Apply intial quality filtering 

In [None]:
demux_q_score = quality_filter.methods.q_score(manifest_single_end)

### 2. Apply Deblur workflow

In [None]:
# this step is time-consuming
deblur_sequences = deblur.methods.denoise_16S(manifest_single_end,
                                              trim_length=150,
                                              sample_stats=True,
                                              jobs_to_start=4)

### 3. Create a visualization summary of deblur statistics

In [None]:
deblur_viz = deblur.visualizers.visualize_stats(deblur_sequences.stats)
deblur_viz.visualization

### 4. Visualize representative sequences

In [None]:
deblur_seq_viz = feature_table.visualizers.tabulate_seqs(deblur_sequences.representative_sequences)
deblur_seq_viz.visualization

### 5. Visualize feature table

In [None]:
feature_table_viz = feature_table.visualizers.summarize(deblur_sequences.table,
                                                        metadata_ecam)
feature_table_viz.visualization

## Generate a phylogenetic tree

### 1. Download a backbone tree

In [None]:
!wget \
  -O $workdir/sepp-refs-gg-13-8.qza \
  "https://data.qiime2.org/2019.10/common/sepp-refs-gg-13-8.qza"

In [None]:
sepp_reference_db = qiime2.Artifact.load(workdir+'sepp-refs-gg-13-8.qza')

### 2. Create an insertion tree

In [None]:
sepp_tree = fragment_insertion.methods.sepp(representative_sequences=deblur_sequences.representative_sequences,
                                            reference_database=sepp_reference_db,
                                            threads=4)

### 3. Filter feature table

In [None]:
filtered_deblur_sequences = fragment_insertion.methods.filter_features(deblur_sequences.table,
                                                                       sepp_tree.tree)

## Taxonomic classification

### 1. Download and import required files

In [None]:
!wget -O $workdir'human-stool.qza' \
https://github.com/BenKaehler/readytowear/raw/master/data/gg_13_8/515f-806r/human-stool.qza

In [None]:
human_stool = qiime2.Artifact.load(workdir+'human-stool.qza')

In [None]:
!wget -O $workdir'ref-seqs-v4.qza' \
https://github.com/BenKaehler/readytowear/raw/master/data/gg_13_8/515f-806r/ref-seqs-v4.qza

In [None]:
ref_seqs_v4 = qiime2.Artifact.load(workdir+'ref-seqs-v4.qza')

In [None]:
!wget -O $workdir'ref-tax.qza' \
https://github.com/BenKaehler/readytowear/raw/master/data/gg_13_8/515f-806r/ref-tax.qza

In [None]:
ref_tax = qiime2.Artifact.load(workdir+'ref-tax.qza')

### 2. Train a classifier

In [None]:
human_stool_v4_classifier = feature_classifier.methods.fit_classifier_naive_bayes(ref_seqs_v4,
                                                                                  ref_tax,
                                                                                  human_stool)

### 3. Assign taxonomy

In [None]:
taxonomy = feature_classifier.methods.classify_sklearn(deblur_sequences.representative_sequences,
                                                       human_stool_v4_classifier.classifier)

### 4. Visualize taxonomies

In [None]:
taxonomy_viz = metadata.visualizers.tabulate(taxonomy.classification.view(qiime2.Metadata))
taxonomy_viz.visualization

## Filter ECAM data to contain children samples only

### 1. Filter feature table

In [None]:
child_only = feature_table.methods.filter_samples(deblur_sequences.table,
                                                  metadata=metadata_ecam,
                                                  where="[mom_or_child]='C'")

### 2. Visualize new feature table

In [None]:
child_only_viz = feature_table.visualizers.summarize(child_only.filtered_table,
                                                     metadata_ecam)
child_only_viz.visualization

## Alpha rarefaction plots

In [None]:
alpha_rarefaction = diversity.visualizers.alpha_rarefaction(child_only.filtered_table,
                                                            phylogeny=sepp_tree.tree,
                                                            max_depth=10000,
                                                            metadata=metadata_ecam)

## Basic data exploration and diversity analyses

### 0. Filter feature table to include only one sample per subject per month

In [None]:
child_only_norep = feature_table.methods.filter_samples(child_only.filtered_table,
                                                           metadata=metadata_ecam,
                                                           where="[month_replicate]='no'")

In [None]:
child_only_norep_viz = feature_table.visualizers.summarize(child_only_norep.filtered_table,
                                                           metadata_ecam)
child_only_norep_viz.visualization

### 1. Generate taxonomic barplot

In [None]:
child_taxa = taxa.visualizers.barplot(child_only_norep.filtered_table,
                                      taxonomy.classification,
                                      metadata_ecam)
child_taxa.visualization

### 2. Compute alpha and beta diversity

In [None]:
child_only_norep_core_metrics = diversity.pipelines.core_metrics_phylogenetic(child_only_norep.filtered_table,
                                                                              phylogeny=sepp_tree.tree,
                                                                              sampling_depth=3400,
                                                                              metadata=metadata_ecam,
                                                                              n_jobs=4)

## Perform statistical tests on diversity and generate interactive visualization

### 1. Statistical test on alpha diversity

#### A. Across all time points

In [None]:
shannon_child_only_norep_viz = \
 diversity.visualizers.alpha_group_significance(child_only_norep_core_metrics.shannon_vector,
                                                metadata_ecam)
shannon_child_only_norep_viz.visualization

#### B. At last time point (month 24)

In [None]:
child_only_norep_C24 = feature_table.methods.filter_samples(child_only_norep.filtered_table,
                                                            metadata=metadata_ecam,
                                                            where="[month]='24'")

In [None]:
child_only_norep_C24_core_metrics = diversity.pipelines.core_metrics_phylogenetic(child_only_norep_C24.filtered_table,
                                                                                  phylogeny=sepp_tree.tree,
                                                                                  sampling_depth=3400,
                                                                                  metadata=metadata_ecam,
                                                                                  n_jobs=4)

In [None]:
shannon_child_only_norep_C24_viz = \
 diversity.visualizers.alpha_group_significance(child_only_norep_C24_core_metrics.shannon_vector,
                                                metadata_ecam)
shannon_child_only_norep_C24_viz.visualization

### 2. Statistical test on beta diversity

In [None]:
uw_unifrac_delivery_child_only_norep_C24_viz = \
 diversity.visualizers.beta_group_significance(child_only_norep_C24_core_metrics.unweighted_unifrac_distance_matrix,
                                               metadata=metadata_ecam.get_column('delivery'),
                                               pairwise=True)

## Longitudinal data analysis

### 1. Linear mixed effects models

In [None]:
child_only_core_metrics = diversity.pipelines.core_metrics_phylogenetic(child_only.filtered_table,
                                                                        phylogeny=sepp_tree.tree,
                                                                        sampling_depth=3400,
                                                                        metadata=metadata_ecam,
                                                                        n_jobs=4)

In [None]:
metadata_ecam_w_shannon = metadata_ecam.merge(child_only_core_metrics.shannon_vector.view(qiime2.Metadata))

In [None]:
lme_shannon_child_only_viz = \
 longitudinal.visualizers.linear_mixed_effects(metadata=metadata_ecam_w_shannon,
                                               metric='shannon',
                                               random_effects='day_of_life',
                                               group_columns='delivery,diet',
                                               state_column='day_of_life',
                                               individual_id_column='host_subject_id')
lme_shannon_child_only_viz.visualization

### 2. Volatility visualization

In [None]:
volatility_shannon_child_only_viz = \
 longitudinal.visualizers.volatility(metadata_ecam_w_shannon,
                                     default_metric='shannon',
                                     default_group_column='delivery',
                                     state_column='month',
                                     individual_id_column='host_subject_id')
volatility_shannon_child_only_viz.visualization

## Differential abundance testing

### Option 1: ANCOM

In [None]:
# Create a new feature table that contains only samples from children at 6 months
child_only_norep_C6 = feature_table.methods.filter_samples(child_only_norep.filtered_table,
                                                           metadata=metadata_ecam,
                                                           where="[month]='6'")

In [None]:
# filter out low abundant features
filtered_child_only_norep_C6 = feature_table.methods.filter_features(child_only_norep_C6.filtered_table,
                                                                     min_samples=5,
                                                                     min_frequency=20)

In [None]:
# add a pseudocount
composition_table_C6 = composition.methods.add_pseudocount(filtered_child_only_norep_C6.filtered_table)

In [None]:
# run ANCOM
ancom_C6_delivery = composition.visualizers.ancom(composition_table_C6.composition_table,
                                                  metadata_ecam.get_column('delivery'))

### Option 2: songbird

In [None]:
# make a folder to store songbird results
!mkdir $workdir/songbird-results

In [None]:
# run songbird
songbird_norep_C6 = songbird.methods.multinomial(child_only_norep_C6.filtered_table,
                                                 metadata_ecam,
                                                 formula="delivery+abx_exposure+diet+sex",
                                                 epochs=10000,
                                                 differential_prior=0.5)

In [None]:
# examine estimated coefficients
songbird_norep_C6.differentials.export_data(workdir+'songbird-results/differentials6monthControlled')

## Meta-analysis through Qiita database using redbiom

NOTE: there is no redbiom Python API, so the commands below are a copy from the CLI notebook

In [None]:
# check the name of contexts and number of samples and features indexed
!redbiom summarize contexts

In [None]:
# identify samples where interested sequence was observed
!redbiom search features --context Deblur-Illumina-16S-V4-150nt-780653 \
TACGTAGGGTGCAAGCGTTATCCGGAATTATTGGGCGTAAAGGGCTCGTAGGCGGTTCGTCGCGTCCGGTGTGAAAGTCCATCGCTTAACGGTGGATCTGCGCCGGGTACGGGCGGGCTGGAGTGCGGTAGGGGAGACTGGAATTCCCGG > observed_samples.txt

In [None]:
# search against only EMP samples
!redbiom summarize samples \
  --category empo_3 \
  --from observed_samples.txt

In [None]:
# search against infant samples
!redbiom select samples-from-metadata \
  --context Deblur-Illumina-16S-V4-150nt-780653 \
  --from observed_samples.txt "where (host_age < 3 or age < 3) and qiita_study_id != 10249" > infant_samples.txt

In [None]:
# summarize the metadata of infant samples
!redbiom search metadata \
  --categories birth

!redbiom summarize metadata birth_method birth_mode

!redbiom summarize samples \
     --category birth_mode \
     --from infant_samples.txt

In [None]:
# check sample balance in modes of delivery
!redbiom summarize metadata-category \
  --counter \
  --category birth_mode

In [None]:
# summarize samples over study id category
!redbiom summarize samples \
  --category qiita_study_id \
  --from infant_samples.txt

## _Supprot Protocols:_ Exporting QIIME 2 data

A sample export of the SEPP insertion tree

In [None]:
sepp_tree.tree.export_data('extracted-insertion-tree')

## _Support protocols:_ Analysis of shotgun metagenomic data

### 1. Download sample data

In [None]:
!for i in query refseqs taxonomy bt2-database; \
 do wget https://github.com/qiime2/q2-shogun/raw/master/q2_shogun/tests/data/$i.qza; done

In [None]:
shogun_query = qiime2.Artifact.load(workdir + '/query.qza')

In [None]:
shogun_refseqs = qiime2.Artifact.load(workdir + '/refseqs.qza')

In [None]:
shogun_taxonomy = qiime2.Artifact.load(workdir + '/taxonomy.qza')

In [None]:
bowtie2_db = qiime2.Artifact.load(workdir + '/bt2-database.qza')

### 2. Run shotgun metagenomics pipeline

In [None]:
taxa_table = shogun.methods.nobunaga(query=shogun_query,
                                     reference_reads=shogun_refseqs,
                                     reference_taxonomy=shogun_taxonomy,
                                     database=bowtie2_db)