## Core Diversity Analyses of the GCMP dataset

This notebook will perform basic QIIME2 analysis of the GCMP dataset. The goal is not to produce interesting analytical products at this time, but rather to calculate basic diversity parameters that can be used in downstream phylogenetic comparisons. 

In [5]:
from qiime2 import Artifact
from qiime2.plugins.feature_table.methods import filter_samples
from qiime2.plugins.taxa.methods import filter_table
#The below try/except block is unsightly but the alpha function got moved between recent versions of QIIME2
#and it's nice if the notebook is compatible with either
try:
    from qiime2.plugins.diversity.methods import alpha
except:
    from qiime2.plugins.diversity.pipelines import alpha
from qiime2.plugins.diversity.visualizers import alpha_group_significance

from qiime2.plugins.feature_table.methods import rarefy
from qiime2.plugins.feature_table.visualizers import summarize

from qiime2.metadata import Metadata
from os.path import abspath,exists,join
import shutil

In [6]:
#### Check that required files exist
mucus_feature_table = "../../organelle_removal/output/M_ft.qza"
tissue_feature_table = "../../organelle_removal/output/T_ft.qza"
skeleton_feature_table = "../../organelle_removal/output/S_ft.qza"
mapping_file = "../../organelle_removal/input/GCMP_EMP_map_r28_no_empty_samples.txt"
taxonomy_file = "../../organelle_removal/output/silva_metaxa2_reference_taxonomy.qza"
sequence_file = "../../organelle_removal/output/GCMP_seqs.qza"
output_dir = abspath("../output/")
input_directory = abspath("../input")

required_files = [mucus_feature_table,tissue_feature_table,skeleton_feature_table,mapping_file,taxonomy_file,sequence_file]



#### Check that all required files really exist and are named correctly

In [7]:
print("Verifying that all needed starting data files exist.")
for existing_file in required_files:
    if not exists(existing_file):
        raise IOError(f"Required file {existing_file} not found. Please ensure it is in that directory.")
print("Done.")

Verifying that all needed starting data files exist.
Done.


#### Copy input files to input folder

Move all needed files to the input folder

In [72]:
mucus_feature_table = shutil.copy(mucus_feature_table,input_directory)
tissue_feature_table = shutil.copy(tissue_feature_table,input_directory)
skeleton_feature_table = shutil.copy(skeleton_feature_table,input_directory)
taxonomy_file = shutil.copy(taxonomy_file,input_directory)
mapping_file = shutil.copy(mapping_file,input_directory)

#### Filter the feature tables to remove mitochondria and chloroplasts

In [2]:
from qiime2.plugins.feature_table.methods import filter_features

metadata = Metadata.load(mapping_file)
seqs = Artifact.load(sequence_file)
mucus_features = Artifact.load(mucus_feature_table)
tissue_features = Artifact.load(tissue_feature_table)
skeleton_features = Artifact.load(skeleton_feature_table)

feature_tables = {"mucus":mucus_features,"tissue": tissue_features,"skeleton":skeleton_features}

taxonomy = Artifact.load(taxonomy_file)

NameError: name 'Metadata' is not defined

#### Filter the feature tables to exclude mitochondria and chloroplasts

Note that it is critical to use the taxonomies supplemented with MeTaxa2 chloroplasts (from the organelle removal step) to avoid high numbers of misannotated mitochondria (typically showing up as 'Unclassified bacteria')

In [59]:
filtered_feature_tables = {}
for compartment,table in feature_tables.items():
    print("Removing mitochondria from:", compartment,table)
    #NOTE: the QIIME2 api does NOT return a single object (as I thought based on the  documentation, but a NamedTuple
    #structure with each output in it)
    filter_table_results = filter_table(table,taxonomy,exclude="mitochondria,chloroplast",mode="contains")
    filtered_table = filter_table_results.filtered_table
    filtered_feature_tables[compartment]=filtered_table
    
    #Save the resulting feature table to disk
    output_filename = f"feature_table_{compartment}.qza"
    output_filepath = join(output_dir,output_filename)
    print(f"Saving results to:{output_filepath}")
    filtered_table.save(output_filepath)


    
    

Removing mitochondria from: mucus <artifact: FeatureTable[Frequency] uuid: 8045253c-8a06-4ba5-9188-36ae7ca39531>
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/core_analysis/output/feature_table_mucus.qza
Removing mitochondria from: tissue <artifact: FeatureTable[Frequency] uuid: 14e5fea4-6aee-4dec-9065-33a307fb3140>
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/core_analysis/output/feature_table_tissue.qza
Removing mitochondria from: skeleton <artifact: FeatureTable[Frequency] uuid: f7e6592c-affd-418c-aaf3-876a1294691e>
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/core_analysis/output/feature_table_skeleton.qza


#### Output Summaries of the filtered feature tables

Note that this will produce some warnings (regarding headers in the .csv files), but these are safe to ignore. 

In [66]:
for compartment,table in filtered_feature_tables.items():
    #Output a sample summary
    summary_visualization = summarize(filtered_table,sample_metadata=metadata)
    vis = summary_visualization.visualization
    output_filename = f"feature_table_{compartment}.qzv"
    output_filepath = join(output_dir,output_filename)
    print(f"Saving summary file to:{output_filepath}")
    vis.save(output_filepath)

  os.path.join(output_dir, 'sample-frequency-detail.csv'))
  os.path.join(output_dir, 'feature-frequency-detail.csv'))


Saving summary file to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/core_analysis/output/feature_table_mucus.qzv


  os.path.join(output_dir, 'sample-frequency-detail.csv'))
  os.path.join(output_dir, 'feature-frequency-detail.csv'))


Saving summary file to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/core_analysis/output/feature_table_tissue.qzv


  os.path.join(output_dir, 'sample-frequency-detail.csv'))
  os.path.join(output_dir, 'feature-frequency-detail.csv'))


Saving summary file to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/core_analysis/output/feature_table_skeleton.qzv


#### Rarefy the filtered feature tables to equal depth

Removal of mitochondria and chloroplasts may affect sequencing depth in each compartment.

In [58]:
rarefaction_depth = 1000
rarefied_feature_tables = {}
for compartment,table in filtered_feature_tables.items():
    print(f"Rarefying: {compartment} feature table {table} to {rarefaction_depth} sequences/sample")
    rarefy_results = rarefy(table=table, sampling_depth=rarefaction_depth)
    #Get the rarefied table out of the NamedTuple of results
    rarefied_filtered_table = rarefy_results.rarefied_table
    
    
    #Save the resulting feature table to disk
    output_filename = f"feature_table_{compartment}_{rarefaction_depth}.qza"
    output_filepath = join(output_dir,output_filename)
    print(f"Saving results to:{output_filepath}")
    rarefied_filtered_table.save(output_filepath)
    
    #Store rarefied feature table in a dict so we don't have to reload
    rarefied_feature_tables[compartment]=rarefied_filtered_table
    

Rarefying: mucus feature table <artifact: FeatureTable[Frequency] uuid: 1281ec4b-6260-4801-8009-0a96098914ca> to 1000 sequences/sample
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/core_analysis/output/feature_table_mucus_1000.qza
Rarefying: tissue feature table <artifact: FeatureTable[Frequency] uuid: bcbed849-610a-4f33-b244-7e000196fed3> to 1000 sequences/sample
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/core_analysis/output/feature_table_tissue_1000.qza
Rarefying: skeleton feature table <artifact: FeatureTable[Frequency] uuid: dbe1f42f-7130-4a4d-8f60-17f47bd5c643> to 1000 sequences/sample
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/core_analysis/output/feature_table_skeleton_1000.qza


#### Calculate non-phylogenetic alpha diversity for each rarefied, filtered sample

In [69]:
metrics = ['observed_otus','gini_index','dominance','simpson_e']
alpha_diversities = {}
for compartment,table in rarefied_feature_tables.items():
    for metric in metrics:
        print(f"Calculating alpha diversity for {compartment} using {metric}")
        alpha_results = alpha(table=table,metric = metric)
        alpha_diversity = alpha_results.alpha_diversity
        alpha_diversities[f"{compartment}_{metric}_{rarefaction_depth}"] = alpha_diversity
        
        #Save the resulting feature table to disk
        output_filename = f"adiv_{compartment}_{metric}_{rarefaction_depth}.qza"
        output_filepath = join(output_dir,output_filename)
        print(f"Saving results to:{output_filepath}")
        alpha_diversity.save(output_filepath)
        
        #Calculate alpha group significance for categorical variables
        alpha_group_sig_results = alpha_group_significance(alpha_diversity=alpha_diversity,metadata=metadata)
        alpha_group_sig_visualization = alpha_group_sig_results.visualization
        output_filename = f"adiv_{compartment}_{metric}_{rarefaction_depth}_group_sig.qzv"       
        output_filepath = join(output_dir,output_filename)
        print(f"Saving significance results to:{output_filepath}")
        alpha_group_sig_visualization.save(output_filename)
    
        

Calculating alpha diversity for mucus using observed_otus
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/core_analysis/output/adiv_mucus_observed_otus_1000.qza
Calculating alpha diversity for mucus using gini_index
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/core_analysis/output/adiv_mucus_gini_index_1000.qza
Calculating alpha diversity for mucus using dominance
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/core_analysis/output/adiv_mucus_dominance_1000.qza
Calculating alpha diversity for mucus using simpson_e
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/core_analysis/output/adiv_mucus_simpson_e_1000.qza
Calculating alpha diversity for tissue using observed_otus
Sav

#### Infer a phylogenetic tree and calculate phylogenetic alpha diversity

Because short reads are not useful for inferring deep phylogenetic relationships, we will map our short reads to an existing tree. 

In [None]:
from qiime2.plugins.fragment_insertion.methods import sepp

In [None]:
from qiime2.plugins.diversity.methods import alpha_phylogenetic
alpha_diversities = {}
metrics = ['faith_pd']
for compartment,table in rarefied_feature_tables.items():
    for metric in metrics:
        print(f"Calculating alpha diversity for {compartment} using {metric}")
        alpha_results = alpha_phylogenetic(table=table,metric = metric)
        alpha_diversity = alpha_results.alpha_diversity
        alpha_diversities[f"{compartment}_{metric}_{rarefaction_depth}"] = alpha_diversity
        
        #Save the resulting feature table to disk
        output_filename = f"adiv_{compartment}_{metric}_{rarefaction_depth}.qza"
        output_filepath = join(output_dir,output_filename)
        print(f"Saving results to:{output_filepath}")
        alpha_diversity.save(output_filepath)
        
        #Calculate alpha group significance for categorical variables
        alpha_group_sig_results = alpha_group_significance(alpha_diversity=alpha_diversity,metadata=metadata)
        alpha_group_sig_visualization = alpha_group_sig_results.visualization
        output_filename = f"adiv_{compartment}_{metric}_{rarefaction_depth}_group_sig.qzv"
        output_filepath = join(output_dir,output_filename)
        print(f"Saving significance results to:{output_filepath}")
        alpha_group_sig_visualization.save(output_filename)