## Effects of Mitochondrial Removal Protocol on Coral Microbiome Alpha and Beta Diversity

This notebook tests how the choice of mitochondrial annotation and removal method influences coral alpha and beta diversity. The strategy is to perform alpha and beta diversity analysis on coral mucus, tissue, and skeleton samples using either standard Greengenes_13_8 or SILVA annotations, or to do the same with expanded versions of these references.

#### Set up

We'll import QIIME2 artifact API functions and objects to do the analysis, as well as some basic python functions for working with the file system (e.g. from os.path)

In [1]:
from qiime2 import Artifact
from qiime2.plugins.feature_table.methods import filter_samples
from qiime2.plugins.taxa.methods import filter_table
#The below try/except block is unsightly but the alpha function got moved between recent versions of QIIME2
#and it's nice if the notebook is compatible with either
try:
    from qiime2.plugins.diversity.methods import alpha,beta
except:
    from qiime2.plugins.diversity.pipelines import alpha,beta
from qiime2.plugins.diversity.visualizers import alpha_group_significance,beta_group_significance

from qiime2.plugins.feature_table.methods import rarefy
from qiime2.plugins.feature_table.visualizers import summarize

from qiime2.metadata import Metadata

from os.path import abspath,exists,join
import shutil

#### Set up input filenames

We'll set up input filenames all at once so we can refer to them later.

In [2]:
#### Check that required files exist
mucus_feature_table = "../output/M_ft.qza"
tissue_feature_table = "../output/T_ft.qza"
skeleton_feature_table = "../output/S_ft.qza"
mapping_file = "../input/GCMP_EMP_map_r28_no_empty_samples.txt"
sequence_file = "../output/GCMP_seqs.qza"
output_dir = abspath("../output/")
input_directory = abspath("../input")

taxonomy_files = {"silva_metaxa2":"../output/silva_metaxa2_reference_taxonomy.qza",\
                 "silva":"../output/silva_reference_taxonomy.qza",\
                 "greengenes":"../output/greengenes_reference_taxonomy.qza",\
                 "greengenes_metaxa2":"../output/greengenes_metaxa2_reference_taxonomy.qza"}

required_files = [mucus_feature_table,tissue_feature_table,skeleton_feature_table,mapping_file,sequence_file]
required_files.extend(taxonomy_files.values())



#### Check that all required files really exist and are named correctly

In [3]:
print("Verifying that all needed starting data files exist.")
for existing_file in required_files:
    if not exists(existing_file):
        raise IOError(f"Required file {existing_file} not found. Please ensure it is in that directory.")
print("Done.")


Verifying that all needed starting data files exist.
Done.


#### Check QIIME2 version

Do a quick check that the qiime version is what's expected. If you get an error at this step due to a different qiime2 verison, the code may very well still work, but if you want to exactly reproduce the results, you'll want QIIME2 2020.8.0


In [4]:
from qiime2 import __version__ as qiime_version

if qiime_version != "2020.8.0":
    raise ValueError("This code was developed with QIIME2 2020.8.0. It will *probably* work with related versions, but there are no guarantees as some functions may change in call signature.")



ValueError: This code was developed with QIIME2 2020.8.0. It will *probably* work with related versions, but there are no guarantees as some functions may change in call signature.

#### Generate filtered tables using several sets of taxonomy annotations

We will filter mitochondria out of our feature tables using either the default taxonomies (greengenes_13_8 or SILVA), or our supplemented versions with additional metaxa2 mitochondrial 16S rRNA sequences.

In [5]:
from qiime2.plugins.feature_table.methods import filter_features
from collections import defaultdict

filtered_feature_tables_by_taxonomy = defaultdict(dict)


for label,taxonomy_file in taxonomy_files.items():
    print(f"Analyzing data using the {label} taxonomy ({taxonomy_file})")
    taxonomy = Artifact.load(taxonomy_file)
    metadata = Metadata.load(mapping_file)
    seqs = Artifact.load(sequence_file)
    mucus_features = Artifact.load(mucus_feature_table)
    tissue_features = Artifact.load(tissue_feature_table)
    skeleton_features = Artifact.load(skeleton_feature_table)

    feature_tables = {"mucus":mucus_features,"tissue": tissue_features,"skeleton":skeleton_features}

    
    for compartment,table in feature_tables.items():
        print("Removing mitochondria from:", compartment,table)
        #NOTE: the QIIME2 api does NOT return a single object (as I thought based on the  documentation, but a NamedTuple
        #structure with each output in it)
        filter_table_results = filter_table(table,taxonomy,exclude="mitochondria,chloroplast",mode="contains")
        filtered_table = filter_table_results.filtered_table
    
        #Save the resulting feature table to disk
        output_filename = f"feature_table_{label}_{compartment}.qza"
        output_filepath = join(output_dir,output_filename)
        print(f"Saving results to:{output_filepath}")
        filtered_table.save(output_filepath)
        
        #Output a sample summary
        summary_visualization = summarize(filtered_table,sample_metadata=metadata)
        vis = summary_visualization.visualization
        output_filename = f"feature_table_{label}_{compartment}.qzv"
        output_filepath = join(output_dir,output_filename)
        print(f"Saving summary file to:{output_filepath}")
        vis.save(output_filepath)
        
        filtered_feature_tables_by_taxonomy[label][compartment]=filtered_table
    
    print(f"Done with processing {label} taxonomy annotations!\n\n")

Analyzing data using the silva_metaxa2 taxonomy (../output/silva_metaxa2_reference_taxonomy.qza)
Removing mitochondria from: mucus <artifact: FeatureTable[Frequency] uuid: 80b538d6-2ca4-40aa-8d6c-5ac8ee7bc869>
Saving results to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/feature_table_silva_metaxa2_mucus.qza




Saving summary file to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/feature_table_silva_metaxa2_mucus.qzv
Removing mitochondria from: tissue <artifact: FeatureTable[Frequency] uuid: e3ff2a67-e8c6-4cc1-9d9a-1181721024cc>
Saving results to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/feature_table_silva_metaxa2_tissue.qza




Saving summary file to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/feature_table_silva_metaxa2_tissue.qzv
Removing mitochondria from: skeleton <artifact: FeatureTable[Frequency] uuid: 219a3d7d-72d5-4c8f-b109-0888294262ef>
Saving results to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/feature_table_silva_metaxa2_skeleton.qza




Saving summary file to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/feature_table_silva_metaxa2_skeleton.qzv
Done with processing silva_metaxa2 taxonomy annotations!


Analyzing data using the silva taxonomy (../output/silva_reference_taxonomy.qza)
Removing mitochondria from: mucus <artifact: FeatureTable[Frequency] uuid: 80b538d6-2ca4-40aa-8d6c-5ac8ee7bc869>
Saving results to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/feature_table_silva_mucus.qza




Saving summary file to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/feature_table_silva_mucus.qzv
Removing mitochondria from: tissue <artifact: FeatureTable[Frequency] uuid: e3ff2a67-e8c6-4cc1-9d9a-1181721024cc>
Saving results to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/feature_table_silva_tissue.qza




Saving summary file to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/feature_table_silva_tissue.qzv
Removing mitochondria from: skeleton <artifact: FeatureTable[Frequency] uuid: 219a3d7d-72d5-4c8f-b109-0888294262ef>
Saving results to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/feature_table_silva_skeleton.qza




Saving summary file to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/feature_table_silva_skeleton.qzv
Done with processing silva taxonomy annotations!


Analyzing data using the greengenes taxonomy (../output/greengenes_reference_taxonomy.qza)
Removing mitochondria from: mucus <artifact: FeatureTable[Frequency] uuid: 80b538d6-2ca4-40aa-8d6c-5ac8ee7bc869>
Saving results to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/feature_table_greengenes_mucus.qza




Saving summary file to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/feature_table_greengenes_mucus.qzv
Removing mitochondria from: tissue <artifact: FeatureTable[Frequency] uuid: e3ff2a67-e8c6-4cc1-9d9a-1181721024cc>
Saving results to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/feature_table_greengenes_tissue.qza




Saving summary file to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/feature_table_greengenes_tissue.qzv
Removing mitochondria from: skeleton <artifact: FeatureTable[Frequency] uuid: 219a3d7d-72d5-4c8f-b109-0888294262ef>
Saving results to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/feature_table_greengenes_skeleton.qza




Saving summary file to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/feature_table_greengenes_skeleton.qzv
Done with processing greengenes taxonomy annotations!


Analyzing data using the greengenes_metaxa2 taxonomy (../output/greengenes_metaxa2_reference_taxonomy.qza)
Removing mitochondria from: mucus <artifact: FeatureTable[Frequency] uuid: 80b538d6-2ca4-40aa-8d6c-5ac8ee7bc869>
Saving results to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/feature_table_greengenes_metaxa2_mucus.qza




Saving summary file to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/feature_table_greengenes_metaxa2_mucus.qzv
Removing mitochondria from: tissue <artifact: FeatureTable[Frequency] uuid: e3ff2a67-e8c6-4cc1-9d9a-1181721024cc>
Saving results to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/feature_table_greengenes_metaxa2_tissue.qza




Saving summary file to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/feature_table_greengenes_metaxa2_tissue.qzv
Removing mitochondria from: skeleton <artifact: FeatureTable[Frequency] uuid: 219a3d7d-72d5-4c8f-b109-0888294262ef>
Saving results to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/feature_table_greengenes_metaxa2_skeleton.qza




Saving summary file to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/feature_table_greengenes_metaxa2_skeleton.qzv
Done with processing greengenes_metaxa2 taxonomy annotations!




## Rarefy tables to even depth

In [6]:
from collections import defaultdict
rarefaction_depth = 1000

rarefied_feature_tables_by_taxonomy = defaultdict(dict)

for label,filtered_feature_tables in filtered_feature_tables_by_taxonomy.items():

    for compartment,table in filtered_feature_tables.items():
        print(f"Rarefying: {compartment} feature table {table} to {rarefaction_depth} sequences/sample")
        rarefy_results = rarefy(table=table, sampling_depth=rarefaction_depth)
        #Get the rarefied table out of the NamedTuple of results
        rarefied_filtered_table = rarefy_results.rarefied_table

        #Save the resulting feature table to disk
        output_filename = f"feature_table_{label}_{compartment}_{rarefaction_depth}.qza"
        output_filepath = join(output_dir,output_filename)
        print(f"Saving results to:{output_filepath}")
        rarefied_filtered_table.save(output_filepath)

        #Store rarefied feature table in a dict so we don't have to reload
        rarefied_feature_tables_by_taxonomy[label][compartment]=rarefied_filtered_table


Rarefying: mucus feature table <artifact: FeatureTable[Frequency] uuid: 201843ce-95b3-44c2-97db-60d569daeffb> to 1000 sequences/sample
Saving results to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/feature_table_silva_metaxa2_mucus_1000.qza
Rarefying: tissue feature table <artifact: FeatureTable[Frequency] uuid: d729f850-c4a0-4e16-b48a-c758c418b07d> to 1000 sequences/sample
Saving results to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/feature_table_silva_metaxa2_tissue_1000.qza
Rarefying: skeleton feature table <artifact: FeatureTable[Frequency] uuid: 62cdaf26-aab1-4d00-a0c0-fbdb0bb3607a> to 1000 sequences/sample
Saving results to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/feature_table_silva_metaxa2_skeleton_1000.qza
Rarefying: mucus feature table <artifact: FeatureTable[Frequency] uuid: ce5329c6-c659-4ea6-b556-8a

#### Calculate alpha diversity for each combination of taxonomic scheme and anatomy 

In [7]:
metrics = ['observed_features','gini_index','dominance','simpson_e']
alpha_diversities = {}
for label, rarefied_feature_tables in rarefied_feature_tables_by_taxonomy.items():
    for compartment,table in rarefied_feature_tables.items():
        for metric in metrics:
            print(f"Calculating alpha diversity for {compartment} using {metric}")
            alpha_results = alpha(table=table,metric = metric)
            alpha_diversity = alpha_results.alpha_diversity
            alpha_diversities[f"{label}_{compartment}_{metric}_{rarefaction_depth}"] = alpha_diversity

            #Save the resulting feature table to disk
            output_filename = f"adiv_{label}_{compartment}_{metric}_{rarefaction_depth}.qza"
            output_filepath = join(output_dir,output_filename)
            print(f"Saving results to:{output_filepath}")
            alpha_diversity.save(output_filepath)

            #Calculate alpha group significance for categorical variables
            alpha_group_sig_results = alpha_group_significance(alpha_diversity=alpha_diversity,metadata=metadata)
            alpha_group_sig_visualization = alpha_group_sig_results.visualization
            output_filename = f"adiv_{label}_{compartment}_{metric}_{rarefaction_depth}_group_sig.qzv"
            output_filepath = join(output_dir,output_filename)
            print(f"Saving significance results to:{output_filepath}")
            alpha_group_sig_visualization.save(output_filepath)


Calculating alpha diversity for mucus using observed_features
Saving results to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/adiv_silva_metaxa2_mucus_observed_features_1000.qza
Saving significance results to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/adiv_silva_metaxa2_mucus_observed_features_1000_group_sig.qzv
Calculating alpha diversity for mucus using gini_index
Saving results to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/adiv_silva_metaxa2_mucus_gini_index_1000.qza
Saving significance results to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/adiv_silva_metaxa2_mucus_gini_index_1000_group_sig.qzv
Calculating alpha diversity for mucus using dominance
Saving results to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_remo

Saving significance results to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/adiv_silva_skeleton_observed_features_1000_group_sig.qzv
Calculating alpha diversity for skeleton using gini_index
Saving results to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/adiv_silva_skeleton_gini_index_1000.qza
Saving significance results to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/adiv_silva_skeleton_gini_index_1000_group_sig.qzv
Calculating alpha diversity for skeleton using dominance
Saving results to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/adiv_silva_skeleton_dominance_1000.qza
Saving significance results to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/adiv_silva_skeleton_dominance_1000_group_sig.qzv
Calculating

Saving significance results to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/adiv_greengenes_metaxa2_tissue_observed_features_1000_group_sig.qzv
Calculating alpha diversity for tissue using gini_index
Saving results to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/adiv_greengenes_metaxa2_tissue_gini_index_1000.qza
Saving significance results to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/adiv_greengenes_metaxa2_tissue_gini_index_1000_group_sig.qzv
Calculating alpha diversity for tissue using dominance
Saving results to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/adiv_greengenes_metaxa2_tissue_dominance_1000.qza
Saving significance results to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/adiv_greengenes_meta

#### Test the effects of mitochondrial removal on between family beta-diversity

If mitochondria are misannotated at different rates between coral families, we might expect that this could potentially artificially inflate inter-family beta-diversity. Alternatively, it's possible that *removal* of mitochondria may reduce intra-family variability, effectively shrinking the observed variance within each family and thereby increasing the significance of inter-family beta diversity. The code below calculates permanova between coral families under each taxonomic scheme to test these ideas. 

In [8]:
metrics = ['braycurtis']
beta_diversities = {}
for label, rarefied_feature_tables in rarefied_feature_tables_by_taxonomy.items():
    for compartment,table in rarefied_feature_tables.items():
        for metric in metrics:
            print(f"Calculating beta diversity for {compartment} using {metric}")
            beta_results = beta(table=table,metric = metric)
            beta_dm = beta_results.distance_matrix
            beta_diversities[f"{label}_{compartment}_{metric}_{rarefaction_depth}"] = beta_dm

            #Save the resulting feature table to disk
            output_filename = f"bdiv_{label}_{compartment}_{metric}_{rarefaction_depth}.qza"
            output_filepath = join(output_dir,output_filename)
            print(f"Saving results to:{output_filepath}")
            beta_dm.save(output_filepath)

            #Calculate beta group significance for categorical variables
            sig_method = 'permanova'
            metadata_column = 'taxonomy_string_to_family'
            
            beta_group_sig_results =\
              beta_group_significance(distance_matrix=beta_dm,method=sig_method,metadata=metadata.get_column(metadata_column))
            
            beta_group_sig_visualization = beta_group_sig_results.visualization
            
            output_filename = f"bdiv_{label}_{compartment}_{metric}_{rarefaction_depth}_{sig_method}_group_sig.qzv"
            output_filepath = join(output_dir,output_filename)
            print(f"Saving significance results to:{output_filepath}")
            beta_group_sig_visualization.save(output_filepath)

Calculating beta diversity for mucus using braycurtis
Saving results to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/bdiv_silva_metaxa2_mucus_braycurtis_1000.qza


Invalid limit will be ignored.
  ax.set_xlim(-.5, len(self.plot_data) - .5, auto=None)
  plt.tight_layout()


Saving significance results to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/bdiv_silva_metaxa2_mucus_braycurtis_1000_permanova_group_sig.qzv
Calculating beta diversity for tissue using braycurtis
Saving results to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/bdiv_silva_metaxa2_tissue_braycurtis_1000.qza
Saving significance results to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/bdiv_silva_metaxa2_tissue_braycurtis_1000_permanova_group_sig.qzv
Calculating beta diversity for skeleton using braycurtis
Saving results to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/bdiv_silva_metaxa2_skeleton_braycurtis_1000.qza
Saving significance results to:/mnt/c/Users/Dylan/Documents/zaneveld/2_14_gcmp/GCMP_Global_Disease/analysis/organelle_removal/output/bdiv_silva_metaxa2_skele