## Effects of Mitochondrial Removal Protocol on Coral Microbiome Alpha and Beta Diversity, accounting for rarefaction

This notebook tests how the choice of mitochondrial annotation and removal method influences coral alpha and beta diversity, accounting for rarefaction. The strategy is to perform alpha and beta diversity analysis on coral mucus, tissue, and skeleton samples using either standard Greengenes_13_8 or SILVA annotations, or to do the same with expanded versions of these references. 

**However**, the standard Greengees_13_8 and SILVA annotations will be filtered to just samples that survive rarefaction in the analysis with the versions of these that include supplemental mitochondrial sequences.

#### How this notebook is different from `adiv_and_bdiv_effects_of_mitochondrial_removal.ipynb`

An initial notebook tested the effects of improved mitochondrial removal (adiv_and_bdiv_effects_of_mitochondrial_removal.ipynb). However, while mitochondrial removal may improve accuracy, it also reduces the number of sequences per sample. During rarefaction, many of these may fall below the rarefaction threshold and fall out of the analysis. This may cause some trends to appear non-significant in the version of samples that have mitochondria removed correctly.

In the other analysis, it was not possible to tell if this was due to better mitochondria removal eliminating artifactual effects, or because a lower sample size 

#### Set up

We'll import QIIME2 artifact API functions and objects to do the analysis, as well as some basic python functions for working with the file system (e.g. from os.path)

In [2]:
from qiime2 import Artifact,Metadata
from qiime2.plugins.feature_table.methods import filter_samples
from qiime2.plugins.taxa.methods import filter_table
#The below try/except block is unsightly but the alpha function got moved between recent versions of QIIME2
#and it's nice if the notebook is compatible with either
try:
    from qiime2.plugins.diversity.methods import alpha,beta
except:
    from qiime2.plugins.diversity.pipelines import alpha,beta
from qiime2.plugins.diversity.visualizers import alpha_group_significance,beta_group_significance

from qiime2.plugins.feature_table.methods import rarefy
from qiime2.plugins.feature_table.visualizers import summarize
from qiime2.plugins.feature_table.methods import filter_samples

from qiime2.metadata import Metadata

from os.path import abspath,exists,join
from os import mkdir

import shutil

import pandas as pd

#### Set up input filenames

We'll set up input filenames all at once so we can refer to them later.

In [6]:
#### Check that required files exist
mucus_feature_table = "../output/M_ft.qza"
tissue_feature_table = "../output/T_ft.qza"
skeleton_feature_table = "../output/S_ft.qza"
overall_feature_table = "../output/gcmp_raw_overall_ft_all_compartments.qza"

mapping_file = "../input/GCMP_EMP_map_r28_no_empty_samples.txt"
sequence_file = "../output/GCMP_seqs.qza"

output_dir = abspath("../output/effects_of_rarefaction_analysis")
input_directory = abspath("../input")

taxonomy_files = {"silva_metaxa2":"../output/silva_metaxa2_reference_taxonomy.qza",\
                 "silva":"../output/silva_reference_taxonomy.qza",\
                 "greengenes":"../output/greengenes_reference_taxonomy.qza",\
                 "greengenes_metaxa2":"../output/greengenes_metaxa2_reference_taxonomy.qza"}

required_files = [mucus_feature_table,tissue_feature_table,skeleton_feature_table,overall_feature_table,mapping_file,sequence_file]
required_files.extend(taxonomy_files.values())



#### Check that all required files really exist and are named correctly

In [8]:
print("Verifying that all needed starting data files exist.")
for existing_file in required_files:
    if not exists(existing_file):
        raise IOError(f"Required file {existing_file} not found. Please ensure it is in that directory.")
print("Done.")

if not exists(output_dir):
    print(f"Output directory {output_dir} does not yet exist, creating it...")
    mkdir(output_dir)
    print("Done.")

Verifying that all needed starting data files exist.
Done.


#### Check QIIME2 version

Do a quick check that the qiime version is what's expected. If you get an error at this step due to a different qiime2 verison, the code may very well still work, but if you want to exactly reproduce the results, you'll want QIIME2 2020.8.0


In [9]:
from qiime2 import __version__ as qiime_version

if qiime_version != "2020.8.0":
    raise ValueError("This code was developed with QIIME2 2020.8.0. It will *probably* work with related versions, but there are no guarantees as some functions may change in call signature.")



#### Generate filtered tables using several sets of taxonomy annotations

We will filter mitochondria out of our feature tables using either the default taxonomies (greengenes_13_8 or SILVA), or our supplemented versions with additional metaxa2 mitochondrial 16S rRNA sequences.

In [12]:
from qiime2.plugins.feature_table.methods import filter_features
from collections import defaultdict

filtered_feature_tables_by_taxonomy = defaultdict(dict)

metadata = Metadata.load(mapping_file)
seqs = Artifact.load(sequence_file)

for label,taxonomy_file in taxonomy_files.items():
    
    print(f"Analyzing data using the {label} taxonomy ({taxonomy_file})")
    taxonomy = Artifact.load(taxonomy_file)  
    
    mucus_features = Artifact.load(mucus_feature_table)
    tissue_features = Artifact.load(tissue_feature_table)
    skeleton_features = Artifact.load(skeleton_feature_table)
    overall_features = Artifact.load(overall_feature_table)
    feature_tables = {"all":overall_features,"mucus":mucus_features,\
                      "tissue": tissue_features,"skeleton":skeleton_features,}
    
    for compartment,table in feature_tables.items():
        print("Removing mitochondria from:", compartment,table)
        #NOTE: the QIIME2 api does NOT return a single object (as I thought based on the  documentation, but a NamedTuple
        #structure with each output in it)
        filter_table_results = filter_table(table,taxonomy,exclude="mitochondria,chloroplast",mode="contains")
        filtered_table = filter_table_results.filtered_table
    
        #Save the resulting feature table to disk
        output_filename = f"feature_table_{label}_{compartment}.qza"
        output_filepath = join(output_dir,output_filename)
        print(f"Saving results to:{output_filepath}")
        filtered_table.save(output_filepath)
        
        #Output a sample summary
        summary_visualization = summarize(filtered_table,sample_metadata=metadata)
        vis = summary_visualization.visualization
        output_filename = f"feature_table_{label}_{compartment}.qzv"
        output_filepath = join(output_dir,output_filename)
        print(f"Saving summary file to:{output_filepath}")
        vis.save(output_filepath)
        
        filtered_feature_tables_by_taxonomy[label][compartment]=filtered_table
    
    print(f"Done with processing {label} taxonomy annotations!\n\n")

Analyzing data using the silva_metaxa2 taxonomy (../output/silva_metaxa2_reference_taxonomy.qza)
Removing mitochondria from: all <artifact: FeatureTable[Frequency] uuid: ca2f992d-79c2-49bb-9145-3cdae1c09977>
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/feature_table_silva_metaxa2_all.qza


  os.path.join(output_dir, 'sample-frequency-detail.csv'))
  os.path.join(output_dir, 'feature-frequency-detail.csv'))


Saving summary file to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/feature_table_silva_metaxa2_all.qzv
Removing mitochondria from: mucus <artifact: FeatureTable[Frequency] uuid: 8045253c-8a06-4ba5-9188-36ae7ca39531>
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/feature_table_silva_metaxa2_mucus.qza


  os.path.join(output_dir, 'sample-frequency-detail.csv'))
  os.path.join(output_dir, 'feature-frequency-detail.csv'))


Saving summary file to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/feature_table_silva_metaxa2_mucus.qzv
Removing mitochondria from: tissue <artifact: FeatureTable[Frequency] uuid: 14e5fea4-6aee-4dec-9065-33a307fb3140>
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/feature_table_silva_metaxa2_tissue.qza


  os.path.join(output_dir, 'sample-frequency-detail.csv'))
  os.path.join(output_dir, 'feature-frequency-detail.csv'))


Saving summary file to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/feature_table_silva_metaxa2_tissue.qzv
Removing mitochondria from: skeleton <artifact: FeatureTable[Frequency] uuid: f7e6592c-affd-418c-aaf3-876a1294691e>
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/feature_table_silva_metaxa2_skeleton.qza


  os.path.join(output_dir, 'sample-frequency-detail.csv'))
  os.path.join(output_dir, 'feature-frequency-detail.csv'))


Saving summary file to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/feature_table_silva_metaxa2_skeleton.qzv
Done with processing silva_metaxa2 taxonomy annotations!


Analyzing data using the silva taxonomy (../output/silva_reference_taxonomy.qza)
Removing mitochondria from: all <artifact: FeatureTable[Frequency] uuid: ca2f992d-79c2-49bb-9145-3cdae1c09977>
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/feature_table_silva_all.qza


  os.path.join(output_dir, 'sample-frequency-detail.csv'))
  os.path.join(output_dir, 'feature-frequency-detail.csv'))


Saving summary file to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/feature_table_silva_all.qzv
Removing mitochondria from: mucus <artifact: FeatureTable[Frequency] uuid: 8045253c-8a06-4ba5-9188-36ae7ca39531>
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/feature_table_silva_mucus.qza


  os.path.join(output_dir, 'sample-frequency-detail.csv'))
  os.path.join(output_dir, 'feature-frequency-detail.csv'))


Saving summary file to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/feature_table_silva_mucus.qzv
Removing mitochondria from: tissue <artifact: FeatureTable[Frequency] uuid: 14e5fea4-6aee-4dec-9065-33a307fb3140>
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/feature_table_silva_tissue.qza


  os.path.join(output_dir, 'sample-frequency-detail.csv'))
  os.path.join(output_dir, 'feature-frequency-detail.csv'))


Saving summary file to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/feature_table_silva_tissue.qzv
Removing mitochondria from: skeleton <artifact: FeatureTable[Frequency] uuid: f7e6592c-affd-418c-aaf3-876a1294691e>
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/feature_table_silva_skeleton.qza


  os.path.join(output_dir, 'sample-frequency-detail.csv'))
  os.path.join(output_dir, 'feature-frequency-detail.csv'))


Saving summary file to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/feature_table_silva_skeleton.qzv
Done with processing silva taxonomy annotations!


Analyzing data using the greengenes taxonomy (../output/greengenes_reference_taxonomy.qza)
Removing mitochondria from: all <artifact: FeatureTable[Frequency] uuid: ca2f992d-79c2-49bb-9145-3cdae1c09977>
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/feature_table_greengenes_all.qza


  os.path.join(output_dir, 'sample-frequency-detail.csv'))
  os.path.join(output_dir, 'feature-frequency-detail.csv'))


Saving summary file to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/feature_table_greengenes_all.qzv
Removing mitochondria from: mucus <artifact: FeatureTable[Frequency] uuid: 8045253c-8a06-4ba5-9188-36ae7ca39531>
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/feature_table_greengenes_mucus.qza


  os.path.join(output_dir, 'sample-frequency-detail.csv'))
  os.path.join(output_dir, 'feature-frequency-detail.csv'))


Saving summary file to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/feature_table_greengenes_mucus.qzv
Removing mitochondria from: tissue <artifact: FeatureTable[Frequency] uuid: 14e5fea4-6aee-4dec-9065-33a307fb3140>
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/feature_table_greengenes_tissue.qza


  os.path.join(output_dir, 'sample-frequency-detail.csv'))
  os.path.join(output_dir, 'feature-frequency-detail.csv'))


Saving summary file to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/feature_table_greengenes_tissue.qzv
Removing mitochondria from: skeleton <artifact: FeatureTable[Frequency] uuid: f7e6592c-affd-418c-aaf3-876a1294691e>
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/feature_table_greengenes_skeleton.qza


  os.path.join(output_dir, 'sample-frequency-detail.csv'))
  os.path.join(output_dir, 'feature-frequency-detail.csv'))


Saving summary file to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/feature_table_greengenes_skeleton.qzv
Done with processing greengenes taxonomy annotations!


Analyzing data using the greengenes_metaxa2 taxonomy (../output/greengenes_metaxa2_reference_taxonomy.qza)
Removing mitochondria from: all <artifact: FeatureTable[Frequency] uuid: ca2f992d-79c2-49bb-9145-3cdae1c09977>
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/feature_table_greengenes_metaxa2_all.qza


  os.path.join(output_dir, 'sample-frequency-detail.csv'))
  os.path.join(output_dir, 'feature-frequency-detail.csv'))


Saving summary file to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/feature_table_greengenes_metaxa2_all.qzv
Removing mitochondria from: mucus <artifact: FeatureTable[Frequency] uuid: 8045253c-8a06-4ba5-9188-36ae7ca39531>
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/feature_table_greengenes_metaxa2_mucus.qza


  os.path.join(output_dir, 'sample-frequency-detail.csv'))
  os.path.join(output_dir, 'feature-frequency-detail.csv'))


Saving summary file to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/feature_table_greengenes_metaxa2_mucus.qzv
Removing mitochondria from: tissue <artifact: FeatureTable[Frequency] uuid: 14e5fea4-6aee-4dec-9065-33a307fb3140>
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/feature_table_greengenes_metaxa2_tissue.qza


  os.path.join(output_dir, 'sample-frequency-detail.csv'))
  os.path.join(output_dir, 'feature-frequency-detail.csv'))


Saving summary file to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/feature_table_greengenes_metaxa2_tissue.qzv
Removing mitochondria from: skeleton <artifact: FeatureTable[Frequency] uuid: f7e6592c-affd-418c-aaf3-876a1294691e>
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/feature_table_greengenes_metaxa2_skeleton.qza


  os.path.join(output_dir, 'sample-frequency-detail.csv'))
  os.path.join(output_dir, 'feature-frequency-detail.csv'))


Saving summary file to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/feature_table_greengenes_metaxa2_skeleton.qzv
Done with processing greengenes_metaxa2 taxonomy annotations!




## Rarefy tables to even depth

In [13]:
from collections import defaultdict
rarefaction_depth = 1000

rarefied_feature_tables_by_taxonomy = defaultdict(dict)

for label,filtered_feature_tables in filtered_feature_tables_by_taxonomy.items():

    for compartment,table in filtered_feature_tables.items():
        print(f"Rarefying: {compartment} feature table {table} to {rarefaction_depth} sequences/sample")
        rarefy_results = rarefy(table=table, sampling_depth=rarefaction_depth)
        #Get the rarefied table out of the NamedTuple of results
        rarefied_filtered_table = rarefy_results.rarefied_table

        #Save the resulting feature table to disk
        output_filename = f"feature_table_{label}_{compartment}_{rarefaction_depth}.qza"
        output_filepath = join(output_dir,output_filename)
        print(f"Saving results to:{output_filepath}")
        rarefied_filtered_table.save(output_filepath)

        #Store rarefied feature table in a dict so we don't have to reload
        rarefied_feature_tables_by_taxonomy[label][compartment]=rarefied_filtered_table


Rarefying: all feature table <artifact: FeatureTable[Frequency] uuid: 2bb82c5f-0195-4da4-a532-879a73aad7e4> to 1000 sequences/sample
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/feature_table_silva_metaxa2_all_1000.qza
Rarefying: mucus feature table <artifact: FeatureTable[Frequency] uuid: f92e8cf5-d582-4bc0-85a4-ef6611b659ab> to 1000 sequences/sample
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/feature_table_silva_metaxa2_mucus_1000.qza
Rarefying: tissue feature table <artifact: FeatureTable[Frequency] uuid: 35a8b0a7-e99e-4f27-84a4-2a9d41978fa6> to 1000 sequences/sample
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal

#### New step: harmonize sample sets between treatments

Next we will filter the samples in the rarefied gg and silva files to match the samples in the gg+metaxa2 or silva+metaxa2 annotation datasets.

In [14]:


#Substeps

#Check the list of feature tables


# Build a two level dict to hold sample ids for every feature table by taxonomic
#scheme then by compartment
sample_ids_by_taxonomy = defaultdict(dict)

#Iterate over all the feature tables from the last step
#and collect their sample ids

for taxonomy_name,data_dict in rarefied_feature_tables_by_taxonomy.items():
    print("Taxonomy scheme:",taxonomy_name)
    for compartment,table in data_dict.items():
        print("Compartment:",compartment)
        
        # View as Pandas dataframe
        df = table.view(pd.DataFrame)
        # Extract ids from this dataframe
        n_samples,n_features = df.shape
        sample_ids = list(df.index)
        print("This table has ",len(sample_ids),"samples")
        sample_ids_by_taxonomy[taxonomy_name][compartment] = sample_ids
        


Taxonomy scheme: silva_metaxa2
Compartment: all
This table has  1099 samples
Compartment: mucus
This table has  312 samples
Compartment: tissue
This table has  360 samples
Compartment: skeleton
This table has  364 samples
Taxonomy scheme: silva
Compartment: all
This table has  1155 samples
Compartment: mucus
This table has  318 samples
Compartment: tissue
This table has  389 samples
Compartment: skeleton
This table has  382 samples
Taxonomy scheme: greengenes
Compartment: all
This table has  1156 samples
Compartment: mucus
This table has  318 samples
Compartment: tissue
This table has  390 samples
Compartment: skeleton
This table has  382 samples
Taxonomy scheme: greengenes_metaxa2
Compartment: all
This table has  1100 samples
Compartment: mucus
This table has  312 samples
Compartment: tissue
This table has  361 samples
Compartment: skeleton
This table has  364 samples


Now that we have a data structure holding the sample ids for every feature table, we need to specify that we will filter certain tables based on others. 

In [15]:
pairings = {"silva":"silva_metaxa2","greengenes":"greengenes_metaxa2"}
filtered_tables = defaultdict(dict)
for target_tables,source_tables in pairings.items():
    print(f"Filtering {target_tables} to have same samples as {source_tables}")
    for compartment,table in rarefied_feature_tables_by_taxonomy[target_tables].items():
        print("Compartment:",compartment)
        
        
         # Extract ids from this table
        df = table.view(pd.DataFrame)
        n_samples,n_features = df.shape
        sample_ids = list(df.index)
        
        print("Pre-filtering, this table has ",len(sample_ids),"samples")
        
        #Convert the list of ids to QIIME2 metadata
        #this requires going list --> DataFrame --> Metadata
        id_list = sample_ids_by_taxonomy[source_tables][compartment]
        id_df = pd.DataFrame (id_list,columns=['#SampleID'])
        id_df = id_df.set_index('#SampleID')
        id_md = Metadata(id_df)
        
        filter_results = filter_samples(table,metadata=id_md)
        filtered_table = filter_results.filtered_table
        
        # View as Pandas dataframe
        df = filtered_table.view(pd.DataFrame)
        # Extract ids from this dataframe
        n_samples,n_features = df.shape
        sample_ids = list(df.index)
        print("Post-filtering, this table has ",len(sample_ids),"samples")

        #Update the feature_table dict with this new version
        rarefied_feature_tables_by_taxonomy[target_tables][compartment] = filtered_table
        
#Run qiime feature-table filter-samples to filter these sample ids
#out of the the standard gg feature table

#Repeat for metaxa2

Filtering silva to have same samples as silva_metaxa2
Compartment: all
Pre-filtering, this table has  1155 samples
Post-filtering, this table has  1099 samples
Compartment: mucus
Pre-filtering, this table has  318 samples
Post-filtering, this table has  312 samples
Compartment: tissue
Pre-filtering, this table has  389 samples
Post-filtering, this table has  360 samples
Compartment: skeleton
Pre-filtering, this table has  382 samples
Post-filtering, this table has  364 samples
Filtering greengenes to have same samples as greengenes_metaxa2
Compartment: all
Pre-filtering, this table has  1156 samples
Post-filtering, this table has  1100 samples
Compartment: mucus
Pre-filtering, this table has  318 samples
Post-filtering, this table has  312 samples
Compartment: tissue
Pre-filtering, this table has  390 samples
Post-filtering, this table has  361 samples
Compartment: skeleton
Pre-filtering, this table has  382 samples
Post-filtering, this table has  364 samples


#### Calculate alpha diversity for each combination of taxonomic scheme and anatomy 

In [16]:
metrics = ['observed_features','gini_index','dominance','simpson_e']
alpha_diversities = {}
for label, rarefied_feature_tables in rarefied_feature_tables_by_taxonomy.items():
    for compartment,table in rarefied_feature_tables.items():
        for metric in metrics:
            print(f"Calculating alpha diversity for {compartment} using {metric}")
            alpha_results = alpha(table=table,metric = metric)
            alpha_diversity = alpha_results.alpha_diversity
            alpha_diversities[f"{label}_{compartment}_{metric}_{rarefaction_depth}"] = alpha_diversity

            #Save the resulting feature table to disk
            output_filename = f"adiv_{label}_{compartment}_{metric}_{rarefaction_depth}_samples_harmonized.qza"
            output_filepath = join(output_dir,output_filename)
            print(f"Saving results to:{output_filepath}")
            alpha_diversity.save(output_filepath)

            #Calculate alpha group significance for categorical variables
            alpha_group_sig_results = alpha_group_significance(alpha_diversity=alpha_diversity,metadata=metadata)
            alpha_group_sig_visualization = alpha_group_sig_results.visualization
            output_filename = f"adiv_{label}_{compartment}_{metric}_{rarefaction_depth}_group_sig_samples_harmonized.qzv"
            output_filepath = join(output_dir,output_filename)
            print(f"Saving significance results to:{output_filepath}")
            alpha_group_sig_visualization.save(output_filepath)


Calculating alpha diversity for all using observed_features
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/adiv_silva_metaxa2_all_observed_features_1000_samples_harmonized.qza
Saving significance results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/adiv_silva_metaxa2_all_observed_features_1000_group_sig_samples_harmonized.qzv
Calculating alpha diversity for all using gini_index
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/adiv_silva_metaxa2_all_gini_index_1000_samples_harmonized.qza
Saving significance results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/g

Saving significance results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/adiv_silva_metaxa2_skeleton_dominance_1000_group_sig_samples_harmonized.qzv
Calculating alpha diversity for skeleton using simpson_e
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/adiv_silva_metaxa2_skeleton_simpson_e_1000_samples_harmonized.qza
Saving significance results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/adiv_silva_metaxa2_skeleton_simpson_e_1000_group_sig_samples_harmonized.qzv
Calculating alpha diversity for all using observed_features
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Glob

Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/adiv_silva_skeleton_gini_index_1000_samples_harmonized.qza
Saving significance results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/adiv_silva_skeleton_gini_index_1000_group_sig_samples_harmonized.qzv
Calculating alpha diversity for skeleton using dominance
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/adiv_silva_skeleton_dominance_1000_samples_harmonized.qza
Saving significance results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_anal

Saving significance results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/adiv_greengenes_tissue_simpson_e_1000_group_sig_samples_harmonized.qzv
Calculating alpha diversity for skeleton using observed_features
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/adiv_greengenes_skeleton_observed_features_1000_samples_harmonized.qza
Saving significance results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/adiv_greengenes_skeleton_observed_features_1000_group_sig_samples_harmonized.qzv
Calculating alpha diversity for skeleton using gini_index
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Project

Calculating alpha diversity for tissue using dominance
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/adiv_greengenes_metaxa2_tissue_dominance_1000_samples_harmonized.qza
Saving significance results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/adiv_greengenes_metaxa2_tissue_dominance_1000_group_sig_samples_harmonized.qzv
Calculating alpha diversity for tissue using simpson_e
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/adiv_greengenes_metaxa2_tissue_simpson_e_1000_samples_harmonized.qza
Saving significance results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disea

#### Test the effects of mitochondrial removal on between family beta-diversity

If mitochondria are misannotated at different rates between coral families, we might expect that this could potentially artificially inflate inter-family beta-diversity. Alternatively, it's possible that *removal* of mitochondria may reduce intra-family variability, effectively shrinking the observed variance within each family and thereby increasing the significance of inter-family beta diversity. The code below calculates permanova between coral families under each taxonomic scheme to test these ideas. 

In [17]:
metrics = ['braycurtis']
beta_diversities = {}
for label, rarefied_feature_tables in rarefied_feature_tables_by_taxonomy.items():
    for compartment,table in rarefied_feature_tables.items():
        for metric in metrics:
            print(f"Calculating beta diversity for {compartment} using {metric}")
            beta_results = beta(table=table,metric = metric)
            beta_dm = beta_results.distance_matrix
            beta_diversities[f"{label}_{compartment}_{metric}_{rarefaction_depth}"] = beta_dm

            #Save the resulting feature table to disk
            output_filename = f"bdiv_{label}_{compartment}_{metric}_{rarefaction_depth}.qza"
            output_filepath = join(output_dir,output_filename)
            print(f"Saving results to:{output_filepath}")
            beta_dm.save(output_filepath)

            #Calculate beta group significance for categorical variables
            sig_method = 'permanova'
            metadata_column = 'taxonomy_string_to_family'
            
            beta_group_sig_results =\
              beta_group_significance(distance_matrix=beta_dm,method=sig_method,metadata=metadata.get_column(metadata_column))
            
            beta_group_sig_visualization = beta_group_sig_results.visualization
            
            output_filename = f"bdiv_{label}_{compartment}_{metric}_{rarefaction_depth}_{sig_method}_group_sig.qzv"
            output_filepath = join(output_dir,output_filename)
            print(f"Saving significance results to:{output_filepath}")
            beta_group_sig_visualization.save(output_filepath)

Calculating beta diversity for all using braycurtis
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/bdiv_silva_metaxa2_all_braycurtis_1000.qza


Invalid limit will be ignored.
  ax.set_xlim(-.5, len(self.plot_data) - .5, auto=None)


Saving significance results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/bdiv_silva_metaxa2_all_braycurtis_1000_permanova_group_sig.qzv
Calculating beta diversity for mucus using braycurtis
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/bdiv_silva_metaxa2_mucus_braycurtis_1000.qza
Saving significance results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_removal/output/effects_of_rarefaction_analysis/bdiv_silva_metaxa2_mucus_braycurtis_1000_permanova_group_sig.qzv
Calculating beta diversity for tissue using braycurtis
Saving results to:/Users/jzaneveld/Dropbox/Zaneveld_Lab_Organization/Projects/GCMP_Global_Disease/gcmp_global_disease/analysis/organelle_rem