# Run all Phylogenetic Independent Constrast analyses

## Overview
The purpose of this notebook is to run all Phylogenetic Independent Contrasts analyses for the comparison of coral disease, growth rate and microbiome composition

Descent with modification causes species to have correlated traits. Therefore, correlations between traits across species cannot safely be tested for using standard statistical methods, since the observations (species) are not independent of one another, which is an assumption of e.g. Pearson regression. 

We use two methods to address this: Phylogenetic Independent Contrasts (PICs) and Phylogenetic Generalized Least Squares regression (PGLS), both of which regress traits against one another while taking into account the structure of the tree. 

## Running this notebook

This notebook will run the PICs. It requires a tree and a trait table, in our case both at the genus level. 
The expected context for the notebook is that it is in a `core_analysis` folder, containing `input`,`output`, and `procedure` as subfolders. Thus, from this notebook the expected relative path to all data will be `../output/name_of_some_file.tsv`

The notebook also requires R to be installed, along with the ggplot2 and phytools packages.

# Import all required python libraries

We'll import all required python libraries now so there aren't surprises later.

In [250]:
from os.path import join,exists
from os import listdir
import subprocess
from pandas import DataFrame
from statsmodels.stats.multitest import fdrcorrection
from numpy import array

## Check for all required files

Before starting in earnest, we'll also check that all required files are present.

In [261]:

results_dir = join("..","output")
data_files = listdir(results_dir)

#List all files used in the analysis

trait_table = join(results_dir,"GCMP_trait_table_with_abundances_and_adiv_and_metadata_zeros.tsv")
trait_table_growth_data = join(results_dir,'GCMP_trait_table_with_abundances_and_adiv_and_metadata_and_growth_data.tsv')
trait_table_australia = join(results_dir,'GCMP_trait_table_genus_australia_only.tsv')
trait_table_beta_diversity = join(results_dir,"GCMP_trait_table_with_abundances_and_adiv_and_metadata_and_growth_data_pcoa.tsv")

tree = join(results_dir,'huang_roy_genus_tree.newick')

required_files = [trait_table,trait_table_growth_data,trait_table_australia,tree]

#Check that each required file is present
for required_file in required_files:
    
    if not exists(required_file):
        raise ValueError(f"Required file {required_file} is not in {results_dir}")
        
    print(f"File {required_file} ..... OK!")

File ../output/GCMP_trait_table_with_abundances_and_adiv_and_metadata_zeros.tsv ..... OK!
File ../output/GCMP_trait_table_with_abundances_and_adiv_and_metadata_and_growth_data.tsv ..... OK!
File ../output/GCMP_trait_table_genus_australia_only.tsv ..... OK!
File ../output/huang_roy_genus_tree.newick ..... OK!


## Background on the phylomorphospace R script

Next we will run a custom R script (`phylomorpospace_r14.r`) to run PIC analysis and generate phylomorphospaces.

The general interface for the script is as follows:

`Rscript phylomorphospace_r14.r {path_to_trait_table} {path_to_tree} {x_trait} {y_trait} {filter_column} {filter_value}`

- path_to_trait_table -- this is the path to a .tsv format trait table saying which species have which traits
- path_to_tree -- a path to a .newick format phylogeny for the species
- x_trait -- the x-axis trait for PIC analysis (independent variable)
- y_trait -- the y-axis trait for PIC analysis (response variable)
- filter_column -- if provided, a column in the trait table that will be used to filter results
- filter_value -- if provided, keep only data rows where the filter column has this value
- suffix -- if provided, add an extra suffix to the output folder (useful to distinguish special analyses)

Example:
`Rscript phylomorphospace_r14.r ../output/GCMP_trait_table_with_abundances_and_adiv_and_metadata.tsv ../output/huang_roy_genus_tree.newick   perc_dis dominance_tissue Weedy 0`

This example correlates disease prevalence (perc_dis) against microbiome dominance in tissue (dominance_tissue) for just corals whose functional group is not Weedy (filter column `Weedy`, filter_value `0`) using the standard GCMP trait table and phylogeny.

The script will generate output folders for each analysis with graphics, and statistical results, saved by default in subfolders of `../output/PIC_results` that are named based on the x and y trait values used, the filter column and the filter value. 

# Define two utility functions for running the R script and parsing output

To allow for a summary Supplementary Data file containing all results, we will parse the output and save key stats in a dataframe. Before beginning the actual analysis, we'll define two functions to a) run the phylomorphospace R script and b) parse the results

In [252]:
def phylogenetic_independent_contrasts(pic_trait_table,pic_tree,pic_x_trait,pic_y_trait,pic_filter_column,pic_filter_value,\
  pic_suffix,output_dir="../PIC_results/",verbose=True):
    """Run the phylomorphospace script"""
    #Build up the command we want to run
    pic_cmd = f"Rscript phylomorphospace_r14.r {pic_trait_table} {pic_tree} {pic_x_trait} {pic_y_trait} {pic_filter_column} {pic_filter_value} {output_dir} {pic_suffix}"
    print(pic_cmd)

    try:
        pic_output = subprocess.check_output(pic_cmd.split(),stderr=subprocess.STDOUT)
        pic_output = str(pic_output)
    except subprocess.CalledProcessError as exc:
        print(exc.output)
        return exc.output
    
    result_lines = pic_output.split("\\n")
    if verbose:    
        for line in result_lines:
            print(line)
            

    results = parse_pic_result_lines(result_lines)
    results['trait_table'] = pic_trait_table
    results['tree'] = tree
    results['pic_x_trait'] = pic_x_trait
    results['pic_y_trait'] = pic_y_trait
    results['pic_filter_column'] = pic_filter_column
    results['pic_filter_value'] = pic_filter_value
    results['pic_suffix'] = pic_suffix
    
    
    return results

def parse_pic_result_lines(lines):
    results = {}
    for line in lines:
        if line.startswith("pic.X"):
            fields = line.split()[1:]
            if len(fields)==4:
                slope,std_error,T,p =  fields
                sig_marker = 'n.s'
            else:
                slope,std_error,T,p,sig_marker = fields
                
            
            results['slope'] = slope
            results['slope_std_error'] = std_error
            results['T_stat'] = T
            results['p'] = p
            results['sig_marker'] = sig_marker
            
        if line.startswith('[1] \"Outputting results to: '):
            results['results_dir'] = line.split(":")[1].rstrip(",")
        if line.startswith("Multiple R-squared:"):
            R2 = float(line.split(":")[1].split(",")[0])
            results['R2'] = R2
            
    print("R2:",results['R2'])
    print("p:",p)
    return results

def get_FDR(df,p_value_column_name = "p"):
    p_values = list(df[p_value_column_name])
    p_values = array(list(map(float,[p.strip("<") for p in p_values])))
    rejected,fdr_values = fdrcorrection(p_values,alpha=0.05,method='indep',is_sorted=False)   
    return fdr_values
    

# Analysis 1. Compare multiple alpha diversity metrics against disease in each coral compartment 

In [254]:
#Set output directory
analysis_label = "alpha_diversity_vs_disease"
analysis_output_dir = join(results_dir,"PIC_results",f"A1_{analysis_label}")

compartments = ["all","mucus","tissue","skeleton"]
metrics = ["observed_features","gini_index","dominance"]

# Make a dataframe to hold all the results
from pandas import DataFrame
results_df = DataFrame({},columns = ["analysis_label","pic_x_trait","pic_y_trait","R2","p","sig_marker","FDR_q","slope","pic_filter_column","pic_filter_value","results_dir","slope_std_error","T_stat"])

for compartment in compartments:
    for metric in metrics:
        
        pic_trait_table = trait_table
        pic_tree = tree
        pic_x_trait = f'{metric}_{compartment}'
        pic_y_trait = 'perc_dis'
        pic_filter_column = 'None'
        pic_filter_value = 'None'
        pic_suffix = ''

        result = phylogenetic_independent_contrasts(pic_trait_table,pic_tree,pic_x_trait,pic_y_trait,pic_filter_column,\
          pic_filter_value,analysis_output_dir,pic_suffix,verbose = False)
        result["analysis_label"] = analysis_label
        results_df = results_df.append(result,ignore_index=True)   
        results_df["FDR_q"] = get_FDR(results_df)


print("Done!")

Rscript phylomorphospace_r14.r ../output/GCMP_trait_table_with_abundances_and_adiv_and_metadata_zeros.tsv ../output/huang_roy_genus_tree.newick observed_features_all perc_dis None None  ../output/PIC_results/A1_alpha_diversity_vs_disease
R2: 0.0001051
p: 0.947
Rscript phylomorphospace_r14.r ../output/GCMP_trait_table_with_abundances_and_adiv_and_metadata_zeros.tsv ../output/huang_roy_genus_tree.newick gini_index_all perc_dis None None  ../output/PIC_results/A1_alpha_diversity_vs_disease
R2: 0.002423
p: 0.751
Rscript phylomorphospace_r14.r ../output/GCMP_trait_table_with_abundances_and_adiv_and_metadata_zeros.tsv ../output/huang_roy_genus_tree.newick dominance_all perc_dis None None  ../output/PIC_results/A1_alpha_diversity_vs_disease
R2: 0.1112
p: 0.027
Rscript phylomorphospace_r14.r ../output/GCMP_trait_table_with_abundances_and_adiv_and_metadata_zeros.tsv ../output/huang_roy_genus_tree.newick observed_features_mucus perc_dis None None  ../output/PIC_results/A1_alpha_diversity_vs_dise

In [256]:
results_df.to_csv(join(analysis_output_dir,"PIC_results_summary.tsv"),sep="\t")
results_df

Unnamed: 0,analysis_label,pic_x_trait,pic_y_trait,R2,p,sig_marker,FDR_q,slope,pic_filter_column,pic_filter_value,results_dir,slope_std_error,T_stat,pic_suffix,trait_table,tree
0,alpha_diversity_vs_disease,observed_features_all,perc_dis,0.000105,0.947,n.s,0.957,0.0006098,,,../output/PIC_results/A1_alpha_diversity_vs_d...,0.0091792,0.066,../output/PIC_results/A1_alpha_diversity_vs_di...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
1,alpha_diversity_vs_disease,gini_index_all,perc_dis,0.002423,0.751,n.s,0.957,2.044,,,../output/PIC_results/A1_alpha_diversity_vs_d...,6.4,0.319,../output/PIC_results/A1_alpha_diversity_vs_di...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
2,alpha_diversity_vs_disease,dominance_all,perc_dis,0.1112,0.027,*,0.162,9.537,,,../output/PIC_results/A1_alpha_diversity_vs_d...,4.162,2.292,../output/PIC_results/A1_alpha_diversity_vs_di...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
3,alpha_diversity_vs_disease,observed_features_mucus,perc_dis,0.01819,0.407,n.s,0.814,-0.008576,,,../output/PIC_results/A1_alpha_diversity_vs_d...,0.010219,-0.839,../output/PIC_results/A1_alpha_diversity_vs_di...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
4,alpha_diversity_vs_disease,gini_index_mucus,perc_dis,0.03169,0.272,n.s,0.6528,8.268,,,../output/PIC_results/A1_alpha_diversity_vs_d...,7.413,1.115,../output/PIC_results/A1_alpha_diversity_vs_di...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
5,alpha_diversity_vs_disease,dominance_mucus,perc_dis,0.06299,0.118,n.s,0.472,5.467,,,../output/PIC_results/A1_alpha_diversity_vs_d...,3.42,1.598,../output/PIC_results/A1_alpha_diversity_vs_di...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
6,alpha_diversity_vs_disease,observed_features_tissue,perc_dis,0.04115,0.209,n.s,0.627,0.010093,,,../output/PIC_results/A1_alpha_diversity_vs_d...,0.007904,1.277,../output/PIC_results/A1_alpha_diversity_vs_di...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
7,alpha_diversity_vs_disease,gini_index_tissue,perc_dis,0.004043,0.697,n.s,0.957,-1.47,,,../output/PIC_results/A1_alpha_diversity_vs_d...,3.742,-0.393,../output/PIC_results/A1_alpha_diversity_vs_di...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
8,alpha_diversity_vs_disease,dominance_tissue,perc_dis,0.1618,0.0101,*,0.1212,11.469,,,../output/PIC_results/A1_alpha_diversity_vs_d...,4.235,2.708,../output/PIC_results/A1_alpha_diversity_vs_di...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
9,alpha_diversity_vs_disease,observed_features_skeleton,perc_dis,0.006031,0.625,n.s,0.957,0.003632,,,../output/PIC_results/A1_alpha_diversity_vs_d...,0.007371,0.493,../output/PIC_results/A1_alpha_diversity_vs_di...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick


# Analysis 2. Compare dominance against disease in Australia-only data

In [258]:
analysis_label = "alpha_diversity_vs_disease_australia_only"
analysis_output_dir = join(results_dir,"PIC_results",f"A2_{analysis_label}")

compartments = ["all","mucus","tissue","skeleton"]
metrics = ["dominance"]

# Make a dataframe to hold all the results
from pandas import DataFrame
results_df = DataFrame({},columns = ["analysis_label","pic_x_trait","pic_y_trait","R2","p","sig_marker","FDR_q","slope","pic_filter_column","pic_filter_value","results_dir","slope_std_error","T_stat"])

for compartment in compartments:
    for metric in metrics:
        analysis_label = "alpha_diversity_vs_disease_australia_only"
        pic_trait_table = trait_table
        pic_tree = tree
        pic_x_trait = f'{metric}_{compartment}'
        pic_y_trait = 'perc_dis'
        pic_filter_column = None
        pic_filter_value = None
        pic_suffix = ''

        result = phylogenetic_independent_contrasts(pic_trait_table,pic_tree,pic_x_trait,pic_y_trait,pic_filter_column,\
              pic_filter_value,analysis_output_dir,pic_suffix,verbose = False)
        result["analysis_label"] = analysis_label
        results_df = results_df.append(result,ignore_index=True)   
        results_df["FDR_q"] = get_FDR(results_df)


print("Done!")

Rscript phylomorphospace_r14.r ../output/GCMP_trait_table_with_abundances_and_adiv_and_metadata_zeros.tsv ../output/huang_roy_genus_tree.newick dominance_all perc_dis None None  ../output/PIC_results/A2_alpha_diversity_vs_disease_australia_only
R2: 0.1112
p: 0.027
Rscript phylomorphospace_r14.r ../output/GCMP_trait_table_with_abundances_and_adiv_and_metadata_zeros.tsv ../output/huang_roy_genus_tree.newick dominance_mucus perc_dis None None  ../output/PIC_results/A2_alpha_diversity_vs_disease_australia_only
R2: 0.06299
p: 0.118
Rscript phylomorphospace_r14.r ../output/GCMP_trait_table_with_abundances_and_adiv_and_metadata_zeros.tsv ../output/huang_roy_genus_tree.newick dominance_tissue perc_dis None None  ../output/PIC_results/A2_alpha_diversity_vs_disease_australia_only
R2: 0.1618
p: 0.0101
Rscript phylomorphospace_r14.r ../output/GCMP_trait_table_with_abundances_and_adiv_and_metadata_zeros.tsv ../output/huang_roy_genus_tree.newick dominance_skeleton perc_dis None None  ../output/PIC_r

In [259]:
results_df.to_csv(join(analysis_output_dir,"PIC_results_summary.tsv"),sep="\t")
results_df


Unnamed: 0,analysis_label,pic_x_trait,pic_y_trait,R2,p,sig_marker,FDR_q,slope,pic_filter_column,pic_filter_value,results_dir,slope_std_error,T_stat,pic_suffix,trait_table,tree
0,alpha_diversity_vs_disease_australia_only,dominance_all,perc_dis,0.1112,0.027,*,0.054,9.537,,,../output/PIC_results/A2_alpha_diversity_vs_d...,4.162,2.292,../output/PIC_results/A2_alpha_diversity_vs_di...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
1,alpha_diversity_vs_disease_australia_only,dominance_mucus,perc_dis,0.06299,0.118,n.s,0.157333,5.467,,,../output/PIC_results/A2_alpha_diversity_vs_d...,3.42,1.598,../output/PIC_results/A2_alpha_diversity_vs_di...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
2,alpha_diversity_vs_disease_australia_only,dominance_tissue,perc_dis,0.1618,0.0101,*,0.0404,11.469,,,../output/PIC_results/A2_alpha_diversity_vs_d...,4.235,2.708,../output/PIC_results/A2_alpha_diversity_vs_di...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
3,alpha_diversity_vs_disease_australia_only,dominance_skeleton,perc_dis,7.2e-05,0.957,n.s,0.957,0.2221,,,../output/PIC_results/A2_alpha_diversity_vs_d...,4.1367,0.054,../output/PIC_results/A2_alpha_diversity_vs_di...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick


# Analysis 3. Compare Beta Diversity vs. Disease



In [272]:
results_df = DataFrame({},columns = ["analysis_label","pic_x_trait","pic_y_trait","R2","p","sig_marker","FDR_q","slope","pic_filter_column","pic_filter_value","results_dir","slope_std_error","T_stat"])
analysis_label = "beta_diversity_vs_disease"   
analysis_output_dir = join(results_dir,"PIC_results",f"A3_{analysis_label}")

for metric in ["unweighted_unifrac","weighted_unifrac"]:
    for PC_axis in [1,2,3]:
        for compartment in ["all","mucus","tissue","skeleton"]:
            
            pic_trait_table = trait_table_beta_diversity
            pic_tree = tree
            pic_x_trait = f"{compartment}_{metric}_ordination_PC{PC_axis}"
            pic_y_trait = 'perc_dis'
            
            pic_filter_column = 'None'
            pic_filter_value = 'None'
      
            pic_suffix = ''

            result = phylogenetic_independent_contrasts(pic_trait_table,pic_tree,pic_x_trait,pic_y_trait,pic_filter_column,\
              pic_filter_value,analysis_output_dir,pic_suffix,verbose = False)
            try:
                result["analysis_label"] = analysis_label
            except TypeError:
                lines = str(result).split("\n")
                for line in lines:
                    print(result)
            results_df = results_df.append(result,ignore_index=True)   
            results_df["FDR_q"] = get_FDR(results_df)
            


print("Done!")

Rscript phylomorphospace_r14.r ../output/GCMP_trait_table_with_abundances_and_adiv_and_metadata_and_growth_data_pcoa.tsv ../output/huang_roy_genus_tree.newick all_unweighted_unifrac_ordination_PC1 perc_dis None None  ../output/PIC_results/A3_beta_diversity_vs_disease
R2: 0.07612
p: 0.284
Rscript phylomorphospace_r14.r ../output/GCMP_trait_table_with_abundances_and_adiv_and_metadata_and_growth_data_pcoa.tsv ../output/huang_roy_genus_tree.newick mucus_unweighted_unifrac_ordination_PC1 perc_dis None None  ../output/PIC_results/A3_beta_diversity_vs_disease
R2: 0.004515
p: 0.805
Rscript phylomorphospace_r14.r ../output/GCMP_trait_table_with_abundances_and_adiv_and_metadata_and_growth_data_pcoa.tsv ../output/huang_roy_genus_tree.newick tissue_unweighted_unifrac_ordination_PC1 perc_dis None None  ../output/PIC_results/A3_beta_diversity_vs_disease
R2: 0.007053
p: 0.749
Rscript phylomorphospace_r14.r ../output/GCMP_trait_table_with_abundances_and_adiv_and_metadata_and_growth_data_pcoa.tsv ../ou

In [273]:
results_df.to_csv(join(analysis_output_dir,"PIC_results_summary.tsv"),sep="\t")
results_df

Unnamed: 0,analysis_label,pic_x_trait,pic_y_trait,R2,p,sig_marker,FDR_q,slope,pic_filter_column,pic_filter_value,results_dir,slope_std_error,T_stat,pic_suffix,trait_table,tree
0,beta_diversity_vs_disease,all_unweighted_unifrac_ordination_PC1,perc_dis,0.07612,0.284,n.s,0.7608,-6.974,,,../output/PIC_results/A3_beta_diversity_vs_di...,6.273,-1.112,../output/PIC_results/A3_beta_diversity_vs_dis...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
1,beta_diversity_vs_disease,mucus_unweighted_unifrac_ordination_PC1,perc_dis,0.004515,0.805,n.s,0.924571,-2.133,,,../output/PIC_results/A3_beta_diversity_vs_di...,8.464,-0.252,../output/PIC_results/A3_beta_diversity_vs_dis...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
2,beta_diversity_vs_disease,tissue_unweighted_unifrac_ordination_PC1,perc_dis,0.007053,0.749,n.s,0.924571,-2.02,,,../output/PIC_results/A3_beta_diversity_vs_di...,6.188,-0.326,../output/PIC_results/A3_beta_diversity_vs_dis...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
3,beta_diversity_vs_disease,skeleton_unweighted_unifrac_ordination_PC1,perc_dis,0.03474,0.474,n.s,0.808,4.144,,,../output/PIC_results/A3_beta_diversity_vs_di...,5.64,0.735,../output/PIC_results/A3_beta_diversity_vs_dis...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
4,beta_diversity_vs_disease,all_unweighted_unifrac_ordination_PC2,perc_dis,0.08682,0.251,n.s,0.7608,19.04,,,../output/PIC_results/A3_beta_diversity_vs_di...,15.94,1.194,../output/PIC_results/A3_beta_diversity_vs_dis...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
5,beta_diversity_vs_disease,mucus_unweighted_unifrac_ordination_PC2,perc_dis,0.02183,0.585,n.s,0.808,-4.645,,,../output/PIC_results/A3_beta_diversity_vs_di...,8.31,-0.559,../output/PIC_results/A3_beta_diversity_vs_dis...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
6,beta_diversity_vs_disease,tissue_unweighted_unifrac_ordination_PC2,perc_dis,0.02326,0.559,n.s,0.808,-6.049,,,../output/PIC_results/A3_beta_diversity_vs_di...,10.121,-0.598,../output/PIC_results/A3_beta_diversity_vs_dis...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
7,beta_diversity_vs_disease,skeleton_unweighted_unifrac_ordination_PC2,perc_dis,0.134,0.148,n.s,0.7608,-16.14,,,../output/PIC_results/A3_beta_diversity_vs_di...,10.6,-1.523,../output/PIC_results/A3_beta_diversity_vs_dis...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
8,beta_diversity_vs_disease,all_unweighted_unifrac_ordination_PC3,perc_dis,0.1126,0.188,n.s,0.7608,-27.38,,,../output/PIC_results/A3_beta_diversity_vs_di...,19.84,-1.38,../output/PIC_results/A3_beta_diversity_vs_dis...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
9,beta_diversity_vs_disease,mucus_unweighted_unifrac_ordination_PC3,perc_dis,0.0475,0.417,n.s,0.808,6.325,,,../output/PIC_results/A3_beta_diversity_vs_di...,7.57,0.836,../output/PIC_results/A3_beta_diversity_vs_dis...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick


# Analysis 4. Test dominance vs. disease in alpha vs. gamma proteobacteria dominated microbiomes

In [280]:
from pandas import DataFrame

results_df = DataFrame({},columns = ["analysis_label","pic_x_trait","pic_y_trait","R2","p","sig_marker","FDR_q","slope","pic_filter_column","pic_filter_value","results_dir","slope_std_error","T_stat"])

analysis_label = "gamma_proteobacteria_dominance_vs_disease"   
analysis_output_dir = join(results_dir,"PIC_results",f"A4_{analysis_label}")

compartments = ["all","mucus","tissue","skeleton"]
metrics = ["dominance","observed_features","gini_index"]

for compartment in compartments:
    for metric in metrics:
        for microbial_taxon in ["D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria",\
                                "D_0__Bacteria;D_1__Proteobacteria;D_2__Alphaproteobacteria"]:
            
            pic_trait_table = trait_table
            pic_tree = tree
            pic_x_trait = f'{metric}_{compartment}'
            pic_y_trait = 'perc_dis'
            
            pic_filter_column = f'most_abundant_class_{compartment}'
            pic_filter_value = microbial_taxon
      
            pic_suffix = ''

            result = phylogenetic_independent_contrasts(pic_trait_table,pic_tree,pic_x_trait,pic_y_trait,pic_filter_column,\
              pic_filter_value,analysis_output_dir,pic_suffix,verbose = False)
            
            result["analysis_label"] = analysis_label
            results_df = results_df.append(result,ignore_index=True)   
            results_df["FDR_q"] = get_FDR(results_df)


print("Done!")

Rscript phylomorphospace_r14.r ../output/GCMP_trait_table_with_abundances_and_adiv_and_metadata_zeros.tsv ../output/huang_roy_genus_tree.newick dominance_all perc_dis most_abundant_class_all D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria  ../output/PIC_results/A4_gamma_proteobacteria_dominance_vs_disease
R2: 0.6683
p: 1.08e-06
Rscript phylomorphospace_r14.r ../output/GCMP_trait_table_with_abundances_and_adiv_and_metadata_zeros.tsv ../output/huang_roy_genus_tree.newick dominance_all perc_dis most_abundant_class_all D_0__Bacteria;D_1__Proteobacteria;D_2__Alphaproteobacteria  ../output/PIC_results/A4_gamma_proteobacteria_dominance_vs_disease
R2: 0.1471
p: 0.116
Rscript phylomorphospace_r14.r ../output/GCMP_trait_table_with_abundances_and_adiv_and_metadata_zeros.tsv ../output/huang_roy_genus_tree.newick observed_features_all perc_dis most_abundant_class_all D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria  ../output/PIC_results/A4_gamma_proteobacteria_dominance_vs_dis

R2: 0.09309
p: 0.191
Done!


In [281]:
results_df.to_csv(join(analysis_output_dir,"PIC_results_summary.tsv"),sep="\t")
results_df

Unnamed: 0,analysis_label,pic_x_trait,pic_y_trait,R2,p,sig_marker,FDR_q,slope,pic_filter_column,pic_filter_value,results_dir,slope_std_error,T_stat,pic_suffix,trait_table,tree
0,gamma_proteobacteria_dominance_vs_disease,dominance_all,perc_dis,0.6683,1.08e-06,***,2.6e-05,60.45,most_abundant_class_all,D_0__Bacteria;D_1__Proteobacteria;D_2__Gammapr...,../output/PIC_results/A4_gamma_proteobacteria...,9.08,6.658,../output/PIC_results/A4_gamma_proteobacteria_...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
1,gamma_proteobacteria_dominance_vs_disease,dominance_all,perc_dis,0.1471,0.116,n.s,0.4776,4.912,most_abundant_class_all,D_0__Bacteria;D_1__Proteobacteria;D_2__Alphapr...,../output/PIC_results/A4_gamma_proteobacteria...,2.957,1.661,../output/PIC_results/A4_gamma_proteobacteria_...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
2,gamma_proteobacteria_dominance_vs_disease,observed_features_all,perc_dis,0.03323,0.394,n.s,0.675429,0.02403,most_abundant_class_all,D_0__Bacteria;D_1__Proteobacteria;D_2__Gammapr...,../output/PIC_results/A4_gamma_proteobacteria...,0.02763,0.87,../output/PIC_results/A4_gamma_proteobacteria_...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
3,gamma_proteobacteria_dominance_vs_disease,observed_features_all,perc_dis,0.2123,0.0543,.,0.4344,-0.015302,most_abundant_class_all,D_0__Bacteria;D_1__Proteobacteria;D_2__Alphapr...,../output/PIC_results/A4_gamma_proteobacteria...,0.007368,-2.077,../output/PIC_results/A4_gamma_proteobacteria_...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
4,gamma_proteobacteria_dominance_vs_disease,gini_index_all,perc_dis,0.0121,0.609,n.s,0.9135,8.776,most_abundant_class_all,D_0__Bacteria;D_1__Proteobacteria;D_2__Gammapr...,../output/PIC_results/A4_gamma_proteobacteria...,16.908,0.519,../output/PIC_results/A4_gamma_proteobacteria_...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
5,gamma_proteobacteria_dominance_vs_disease,gini_index_all,perc_dis,0.000487,0.931,n.s,0.962,-0.4473,most_abundant_class_all,D_0__Bacteria;D_1__Proteobacteria;D_2__Alphapr...,../output/PIC_results/A4_gamma_proteobacteria...,5.0648,-0.088,../output/PIC_results/A4_gamma_proteobacteria_...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
6,gamma_proteobacteria_dominance_vs_disease,dominance_mucus,perc_dis,0.004694,0.719,n.s,0.962,1.438,most_abundant_class_mucus,D_0__Bacteria;D_1__Proteobacteria;D_2__Gammapr...,../output/PIC_results/A4_gamma_proteobacteria...,3.956,0.363,../output/PIC_results/A4_gamma_proteobacteria_...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
7,gamma_proteobacteria_dominance_vs_disease,dominance_mucus,perc_dis,0.000299,0.962,n.s,0.962,-0.7054,most_abundant_class_mucus,D_0__Bacteria;D_1__Proteobacteria;D_2__Alphapr...,../output/PIC_results/A4_gamma_proteobacteria...,14.4148,-0.049,../output/PIC_results/A4_gamma_proteobacteria_...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
8,gamma_proteobacteria_dominance_vs_disease,observed_features_mucus,perc_dis,0.01168,0.57,n.s,0.912,-0.004811,most_abundant_class_mucus,D_0__Bacteria;D_1__Proteobacteria;D_2__Gammapr...,../output/PIC_results/A4_gamma_proteobacteria...,0.008362,-0.575,../output/PIC_results/A4_gamma_proteobacteria_...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
9,gamma_proteobacteria_dominance_vs_disease,observed_features_mucus,perc_dis,0.09333,0.391,n.s,0.675429,-0.02716,most_abundant_class_mucus,D_0__Bacteria;D_1__Proteobacteria;D_2__Alphapr...,../output/PIC_results/A4_gamma_proteobacteria...,0.02993,-0.907,../output/PIC_results/A4_gamma_proteobacteria_...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick


# Analysis 5. Test Endozoicomonas vs. dominance in tissue microbiomes

In [284]:

metrics = ["dominance_tissue","observed_features_tissue","gini_index_tissue","perc_dis"]
analysis_label = "Endozoicomonas_vs_dominance"   
analysis_output_dir = join(results_dir,"PIC_results",f"A5_{analysis_label}")
# Make a dataframe to hold all the results
from pandas import DataFrame
results_df = DataFrame({},columns = ["analysis_label","pic_x_trait","pic_y_trait","R2","p","sig_marker","FDR_q","slope","pic_filter_column","pic_filter_value","results_dir","slope_std_error","T_stat"])


for metric in metrics:
    analysis_label = "Endozoicomonas_vs_dominance_and_disease"
    pic_trait_table = trait_table
    pic_tree = tree
    pic_x_trait = 'tissue_D_0__Bacteria___D_1__Proteobacteria___D_2__Gammaproteobacteria___D_3__Oceanospirillales___D_4__Endozoicomonadaceae___D_5__Endozoicomonas'
    pic_y_trait = metric
    pic_filter_column = 'None'
    pic_filter_value = 'None'
    pic_suffix = ''

    result = phylogenetic_independent_contrasts(pic_trait_table,pic_tree,pic_x_trait,pic_y_trait,pic_filter_column,\
          pic_filter_value,analysis_output_dir,pic_suffix,verbose = False)
    result["analysis_label"] = analysis_label
    results_df = results_df.append(result,ignore_index=True)   
    results_df["FDR_q"] = get_FDR(results_df)


print("Done!")

Rscript phylomorphospace_r14.r ../output/GCMP_trait_table_with_abundances_and_adiv_and_metadata_zeros.tsv ../output/huang_roy_genus_tree.newick tissue_D_0__Bacteria___D_1__Proteobacteria___D_2__Gammaproteobacteria___D_3__Oceanospirillales___D_4__Endozoicomonadaceae___D_5__Endozoicomonas dominance_tissue None None  ../output/PIC_results/A5_Endozoicomonas_vs_dominance
R2: 0.5177
p: 2.56e-08
Rscript phylomorphospace_r14.r ../output/GCMP_trait_table_with_abundances_and_adiv_and_metadata_zeros.tsv ../output/huang_roy_genus_tree.newick tissue_D_0__Bacteria___D_1__Proteobacteria___D_2__Gammaproteobacteria___D_3__Oceanospirillales___D_4__Endozoicomonadaceae___D_5__Endozoicomonas observed_features_tissue None None  ../output/PIC_results/A5_Endozoicomonas_vs_dominance
R2: 0.006276
p: 0.605
Rscript phylomorphospace_r14.r ../output/GCMP_trait_table_with_abundances_and_adiv_and_metadata_zeros.tsv ../output/huang_roy_genus_tree.newick tissue_D_0__Bacteria___D_1__Proteobacteria___D_2__Gammaproteobact

In [285]:
results_df.to_csv(join(analysis_output_dir,"PIC_results_summary.tsv"),sep="\t")
results_df

Unnamed: 0,analysis_label,pic_x_trait,pic_y_trait,R2,p,sig_marker,FDR_q,slope,pic_filter_column,pic_filter_value,results_dir,slope_std_error,T_stat,pic_suffix,trait_table,tree
0,Endozoicomonas_vs_dominance_and_disease,tissue_D_0__Bacteria___D_1__Proteobacteria___D...,dominance_tissue,0.5177,2.56e-08,***,1.024e-07,0.00063,,,../output/PIC_results/A5_Endozoicomonas_vs_do...,9.274e-05,6.794,../output/PIC_results/A5_Endozoicomonas_vs_dom...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
1,Endozoicomonas_vs_dominance_and_disease,tissue_D_0__Bacteria___D_1__Proteobacteria___D...,observed_features_tissue,0.006276,0.605,n.s,0.8066667,-0.04392,,,../output/PIC_results/A5_Endozoicomonas_vs_do...,0.08428,-0.521,../output/PIC_results/A5_Endozoicomonas_vs_dom...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
2,Endozoicomonas_vs_dominance_and_disease,tissue_D_0__Bacteria___D_1__Proteobacteria___D...,gini_index_tissue,0.000386,0.898,n.s,0.898,-2.104e-05,,,../output/PIC_results/A5_Endozoicomonas_vs_do...,0.0001633,-0.129,../output/PIC_results/A5_Endozoicomonas_vs_dom...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
3,Endozoicomonas_vs_dominance_and_disease,tissue_D_0__Bacteria___D_1__Proteobacteria___D...,perc_dis,0.2415,0.00128,**,0.00256,0.011994,,,../output/PIC_results/A5_Endozoicomonas_vs_do...,0.003448,3.479,../output/PIC_results/A5_Endozoicomonas_vs_dom...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick


# Analysis 6. Test for correlations between pathogen abundance in healthy corals and disease susceptibility

In [290]:
from pandas import DataFrame
results_df = DataFrame({},columns = ["analysis_label","pic_x_trait","pic_y_trait","R2","p","sig_marker","FDR_q","slope","pic_filter_column","pic_filter_value","results_dir","slope_std_error","T_stat"])

analysis_label = "opportunists_vs_disease"   
analysis_output_dir = join(results_dir,"PIC_results",f"A6_{analysis_label}")

compartments = ["tissue"]
metrics = ["perc_disease","dominance","observed_features","gini_index"]
putative_pathogens =\
  ["D_0__Bacteria___D_1__Proteobacteria___D_2__Gammaproteobacteria___D_3__Vibrionales",
   "D_0__Bacteria___D_1__Cyanobacteria___D_2__Oxyphotobacteria___D_3__Nostocales",
   "D_0__Bacteria___D_1__Proteobacteria___D_2__Alphaproteobacteria___D_3__Rickettsiales___D_4__Midichloriaceae___D_5__MD3_55"]

for compartment in compartments:
    for metric in metrics:
        for microbial_taxon in putative_pathogens:
            
            pic_trait_table = trait_table
            pic_tree = tree
            pic_x_trait = f'{compartment}_{microbial_taxon}'
            pic_y_trait = 'perc_dis'
            
            pic_filter_column = 'None'
            pic_filter_value = 'None'
      
            pic_suffix = ''

            result = phylogenetic_independent_contrasts(pic_trait_table,pic_tree,pic_x_trait,pic_y_trait,pic_filter_column,\
              pic_filter_value,analysis_output_dir,pic_suffix,verbose = False)
            
            result["analysis_label"] = analysis_label
            results_df = results_df.append(result,ignore_index=True)   
            results_df["FDR_q"] = get_FDR(results_df)


print("Done!")

Rscript phylomorphospace_r14.r ../output/GCMP_trait_table_with_abundances_and_adiv_and_metadata_zeros.tsv ../output/huang_roy_genus_tree.newick tissue_D_0__Bacteria___D_1__Proteobacteria___D_2__Gammaproteobacteria___D_3__Vibrionales perc_dis None None  ../output/PIC_results/A6_opportunists_vs_disease
R2: 0.003987
p: 0.699
Rscript phylomorphospace_r14.r ../output/GCMP_trait_table_with_abundances_and_adiv_and_metadata_zeros.tsv ../output/huang_roy_genus_tree.newick tissue_D_0__Bacteria___D_1__Cyanobacteria___D_2__Oxyphotobacteria___D_3__Nostocales perc_dis None None  ../output/PIC_results/A6_opportunists_vs_disease
R2: 0.007824
p: 0.587
Rscript phylomorphospace_r14.r ../output/GCMP_trait_table_with_abundances_and_adiv_and_metadata_zeros.tsv ../output/huang_roy_genus_tree.newick tissue_D_0__Bacteria___D_1__Proteobacteria___D_2__Alphaproteobacteria___D_3__Rickettsiales___D_4__Midichloriaceae___D_5__MD3_55 perc_dis None None  ../output/PIC_results/A6_opportunists_vs_disease
R2: 0.001065
p: 

In [291]:
results_df.to_csv(join(analysis_output_dir,"PIC_results_summary.tsv"),sep="\t")
results_df

Unnamed: 0,analysis_label,pic_x_trait,pic_y_trait,R2,p,sig_marker,FDR_q,slope,pic_filter_column,pic_filter_value,results_dir,slope_std_error,T_stat,pic_suffix,trait_table,tree
0,opportunists_vs_disease,tissue_D_0__Bacteria___D_1__Proteobacteria___D...,perc_dis,0.003987,0.699,n.s,0.842,-0.01385,,,../output/PIC_results/A6_opportunists_vs_dise...,0.03552,-0.39,../output/PIC_results/A6_opportunists_vs_disease,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
1,opportunists_vs_disease,tissue_D_0__Bacteria___D_1__Cyanobacteria___D_...,perc_dis,0.007824,0.587,n.s,0.842,0.01792,,,../output/PIC_results/A6_opportunists_vs_dise...,0.03274,0.547,../output/PIC_results/A6_opportunists_vs_disease,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
2,opportunists_vs_disease,tissue_D_0__Bacteria___D_1__Proteobacteria___D...,perc_dis,0.001065,0.842,n.s,0.842,-0.002881,,,../output/PIC_results/A6_opportunists_vs_dise...,0.014316,-0.201,../output/PIC_results/A6_opportunists_vs_disease,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
3,opportunists_vs_disease,tissue_D_0__Bacteria___D_1__Proteobacteria___D...,perc_dis,0.003987,0.699,n.s,0.842,-0.01385,,,../output/PIC_results/A6_opportunists_vs_dise...,0.03552,-0.39,../output/PIC_results/A6_opportunists_vs_disease,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
4,opportunists_vs_disease,tissue_D_0__Bacteria___D_1__Cyanobacteria___D_...,perc_dis,0.007824,0.587,n.s,0.842,0.01792,,,../output/PIC_results/A6_opportunists_vs_dise...,0.03274,0.547,../output/PIC_results/A6_opportunists_vs_disease,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
5,opportunists_vs_disease,tissue_D_0__Bacteria___D_1__Proteobacteria___D...,perc_dis,0.001065,0.842,n.s,0.842,-0.002881,,,../output/PIC_results/A6_opportunists_vs_dise...,0.014316,-0.201,../output/PIC_results/A6_opportunists_vs_disease,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
6,opportunists_vs_disease,tissue_D_0__Bacteria___D_1__Proteobacteria___D...,perc_dis,0.003987,0.699,n.s,0.842,-0.01385,,,../output/PIC_results/A6_opportunists_vs_dise...,0.03552,-0.39,../output/PIC_results/A6_opportunists_vs_disease,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
7,opportunists_vs_disease,tissue_D_0__Bacteria___D_1__Cyanobacteria___D_...,perc_dis,0.007824,0.587,n.s,0.842,0.01792,,,../output/PIC_results/A6_opportunists_vs_dise...,0.03274,0.547,../output/PIC_results/A6_opportunists_vs_disease,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
8,opportunists_vs_disease,tissue_D_0__Bacteria___D_1__Proteobacteria___D...,perc_dis,0.001065,0.842,n.s,0.842,-0.002881,,,../output/PIC_results/A6_opportunists_vs_dise...,0.014316,-0.201,../output/PIC_results/A6_opportunists_vs_disease,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
9,opportunists_vs_disease,tissue_D_0__Bacteria___D_1__Proteobacteria___D...,perc_dis,0.003987,0.699,n.s,0.842,-0.01385,,,../output/PIC_results/A6_opportunists_vs_dise...,0.03552,-0.39,../output/PIC_results/A6_opportunists_vs_disease,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick


# Analysis 7. Test  *Endozoicomonas* abundance by life history strategy

In [292]:
from pandas import DataFrame
results_df = DataFrame({},columns = ["analysis_label","pic_x_trait","pic_y_trait","R2","p","sig_marker","FDR_q","slope","pic_filter_column","pic_filter_value","results_dir","slope_std_error","T_stat"])

analysis_label = "Life_history_strategy_vs_Endozoicomonas"
analysis_output_dir = join(results_dir,"PIC_results",f"A7_{analysis_label}")

compartments = ["tissue"]
life_history_strategy = ["Weedy","Stress_tolerant","Generalist"]

for compartment in compartments:
    for strategy in life_history_strategy:
            
            
            pic_trait_table = trait_table
            pic_tree = tree
            pic_x_trait = strategy
            pic_y_trait = f'{compartment}_D_0__Bacteria___D_1__Proteobacteria___D_2__Gammaproteobacteria___D_3__Oceanospirillales___D_4__Endozoicomonadaceae___D_5__Endozoicomonas'
            
            
            pic_filter_column = 'None'
            pic_filter_value = 'None'
      
            pic_suffix = ''

            result = phylogenetic_independent_contrasts(pic_trait_table,pic_tree,pic_x_trait,pic_y_trait,pic_filter_column,\
              pic_filter_value,analysis_output_dir,pic_suffix,verbose = False)
            
            result["analysis_label"] = analysis_label
            results_df = results_df.append(result,ignore_index=True)   
            results_df["FDR_q"] = get_FDR(results_df)


print("Done!")

Rscript phylomorphospace_r14.r ../output/GCMP_trait_table_with_abundances_and_adiv_and_metadata_zeros.tsv ../output/huang_roy_genus_tree.newick Weedy tissue_D_0__Bacteria___D_1__Proteobacteria___D_2__Gammaproteobacteria___D_3__Oceanospirillales___D_4__Endozoicomonadaceae___D_5__Endozoicomonas None None  ../output/PIC_results/A7_Life_history_strategy_vs_Endozoicomonas
R2: 0.9452
p: <2e-16
Rscript phylomorphospace_r14.r ../output/GCMP_trait_table_with_abundances_and_adiv_and_metadata_zeros.tsv ../output/huang_roy_genus_tree.newick Stress_tolerant tissue_D_0__Bacteria___D_1__Proteobacteria___D_2__Gammaproteobacteria___D_3__Oceanospirillales___D_4__Endozoicomonadaceae___D_5__Endozoicomonas None None  ../output/PIC_results/A7_Life_history_strategy_vs_Endozoicomonas
R2: 0.04187
p: 0.178
Rscript phylomorphospace_r14.r ../output/GCMP_trait_table_with_abundances_and_adiv_and_metadata_zeros.tsv ../output/huang_roy_genus_tree.newick Generalist tissue_D_0__Bacteria___D_1__Proteobacteria___D_2__Gam

In [293]:
results_df.to_csv(join(analysis_output_dir,"PIC_results_summary.tsv"),sep="\t")
results_df

Unnamed: 0,analysis_label,pic_x_trait,pic_y_trait,R2,p,sig_marker,FDR_q,slope,pic_filter_column,pic_filter_value,results_dir,slope_std_error,T_stat,pic_suffix,trait_table,tree
0,Life_history_strategy_vs_Endozoicomonas,Weedy,tissue_D_0__Bacteria___D_1__Proteobacteria___D...,0.9452,<2e-16,***,6e-16,278.47,,,../output/PIC_results/A7_Life_history_strateg...,10.23,27.23,../output/PIC_results/A7_Life_history_strategy...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
1,Life_history_strategy_vs_Endozoicomonas,Stress_tolerant,tissue_D_0__Bacteria___D_1__Proteobacteria___D...,0.04187,0.178,n.s,0.178,-51.59,,,../output/PIC_results/A7_Life_history_strateg...,37.63,-1.371,../output/PIC_results/A7_Life_history_strategy...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
2,Life_history_strategy_vs_Endozoicomonas,Generalist,tissue_D_0__Bacteria___D_1__Proteobacteria___D...,0.1133,0.0238,*,0.0357,98.78,,,../output/PIC_results/A7_Life_history_strateg...,42.15,2.344,../output/PIC_results/A7_Life_history_strategy...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick


# Analysis 8. Test  *Endozoicomonas* vs. disease correlation within Stress-tolerant corals

In [296]:
from pandas import DataFrame
results_df = DataFrame({},columns = ["analysis_label","pic_x_trait","pic_y_trait","R2","p","sig_marker","FDR_q","slope","pic_filter_column","pic_filter_value","results_dir","slope_std_error","T_stat"])

analysis_label = "Endozoicomonas_vs_disease_in_stress_tolerant_corals"
analysis_output_dir = join(results_dir,"PIC_results",f"A8_{analysis_label}")

compartments = ["tissue"]
life_history_strategy = ['Stress_tolerant']


for compartment in compartments:
    for strategy in life_history_strategy:
            
            
            pic_trait_table = trait_table
            pic_tree = tree
            
            pic_x_trait = f'{compartment}_D_0__Bacteria___D_1__Proteobacteria___D_2__Gammaproteobacteria___D_3__Oceanospirillales___D_4__Endozoicomonadaceae___D_5__Endozoicomonas'
            pic_y_trait = 'perc_dis'
            
            pic_filter_column = strategy
            pic_filter_value = '1'
      
            pic_suffix = ''

            result = phylogenetic_independent_contrasts(pic_trait_table,pic_tree,pic_x_trait,pic_y_trait,pic_filter_column,\
              pic_filter_value,analysis_output_dir,pic_suffix,verbose = False)
            
            result["analysis_label"] = analysis_label
            results_df = results_df.append(result,ignore_index=True)   
            results_df["FDR_q"] = get_FDR(results_df)


print("Done!")

Rscript phylomorphospace_r14.r ../output/GCMP_trait_table_with_abundances_and_adiv_and_metadata_zeros.tsv ../output/huang_roy_genus_tree.newick tissue_D_0__Bacteria___D_1__Proteobacteria___D_2__Gammaproteobacteria___D_3__Oceanospirillales___D_4__Endozoicomonadaceae___D_5__Endozoicomonas perc_dis Stress_tolerant 1  ../output/PIC_results/A8_Endozoicomonas_vs_disease_in_stress_tolerant_corals
R2: 0.5995
p: 0.000264
Done!


In [297]:
results_df.to_csv(join(analysis_output_dir,"PIC_results_summary.tsv"),sep="\t")
results_df

Unnamed: 0,analysis_label,pic_x_trait,pic_y_trait,R2,p,sig_marker,FDR_q,slope,pic_filter_column,pic_filter_value,results_dir,slope_std_error,T_stat,pic_suffix,trait_table,tree
0,Endozoicomonas_vs_disease_in_stress_tolerant_c...,tissue_D_0__Bacteria___D_1__Proteobacteria___D...,perc_dis,0.5995,0.000264,***,0.000264,0.021018,Stress_tolerant,1,../output/PIC_results/A8_Endozoicomonas_vs_di...,0.004435,4.739,../output/PIC_results/A8_Endozoicomonas_vs_dis...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick


# Analysis 9. Test Endozoicomonas vs. Growth Rate

In [299]:

from pandas import DataFrame
results_df = DataFrame({},columns = ["analysis_label","pic_x_trait","pic_y_trait","R2","p","sig_marker","FDR_q","slope","pic_filter_column","pic_filter_value","results_dir","slope_std_error","T_stat"])

analysis_label = "Endozoicomonas_vs_Growth_Rate"
analysis_output_dir = join(results_dir,"PIC_results",f"A9_{analysis_label}")


compartments = ["tissue"]
life_history_strategy = ['Stress_tolerant']

for compartment in compartments:
    for strategy in life_history_strategy:
            
            
            pic_trait_table = trait_table_growth_data
            pic_tree = tree
            
            pic_x_trait = f'{compartment}_D_0__Bacteria___D_1__Proteobacteria___D_2__Gammaproteobacteria___D_3__Oceanospirillales___D_4__Endozoicomonadaceae___D_5__Endozoicomonas'
            pic_y_trait = 'growth_rate_mm_per_year'
            
            pic_filter_column = 'None'
            pic_filter_value = 'None'
      
            pic_suffix = ''
            try:
                result = phylogenetic_independent_contrasts(pic_trait_table,pic_tree,pic_x_trait,pic_y_trait,pic_filter_column,\
                  pic_filter_value,analysis_output_dir,pic_suffix,verbose = False)
            
                result["analysis_label"] = analysis_label
            except TypeError:
                raise ValueError(f"Underlying R code errored out with x trait {pic_x_trait}. Likely bad column name.")
            results_df = results_df.append(result,ignore_index=True)   
            results_df["FDR_q"] = get_FDR(results_df)


print("Done!")

Rscript phylomorphospace_r14.r ../output/GCMP_trait_table_with_abundances_and_adiv_and_metadata_and_growth_data.tsv ../output/huang_roy_genus_tree.newick tissue_D_0__Bacteria___D_1__Proteobacteria___D_2__Gammaproteobacteria___D_3__Oceanospirillales___D_4__Endozoicomonadaceae___D_5__Endozoicomonas growth_rate_mm_per_year None None  ../output/PIC_results/A9_Endozoicomonas_vs_Growth_Rate
R2: 0.402
p: 0.00835
Done!


In [300]:
results_df.to_csv(join(analysis_output_dir,"PIC_results_summary.tsv"),sep="\t")
results_df

Unnamed: 0,analysis_label,pic_x_trait,pic_y_trait,R2,p,sig_marker,FDR_q,slope,pic_filter_column,pic_filter_value,results_dir,slope_std_error,T_stat,pic_suffix,trait_table,tree
0,Endozoicomonas_vs_Growth_Rate,tissue_D_0__Bacteria___D_1__Proteobacteria___D...,growth_rate_mm_per_year,0.402,0.00835,**,0.00835,0.0001684,,,../output/PIC_results/A9_Endozoicomonas_vs_Gr...,5.489e-05,3.067,../output/PIC_results/A9_Endozoicomonas_vs_Gro...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick


# Analysis 10. Endozoicomonas vs. Growth Rate in Non-Weedy Corals

In [301]:
from pandas import DataFrame
results_df = DataFrame({},columns = ["analysis_label","pic_x_trait","pic_y_trait","R2","p","sig_marker","FDR_q","slope","pic_filter_column","pic_filter_value","results_dir","slope_std_error","T_stat"])

analysis_label = "Endozoicomonas_vs_Growth_Rate_in_Non_Weedy_Corals"
analysis_output_dir = join(results_dir,"PIC_results",f"A10_{analysis_label}")

compartments = ["tissue"]
life_history_strategy = ['Stress_tolerant']

for compartment in compartments:
    for strategy in life_history_strategy:
            
            
            pic_trait_table = trait_table_growth_data
            pic_tree = tree
            
            pic_x_trait = f'{compartment}_D_0__Bacteria___D_1__Proteobacteria___D_2__Gammaproteobacteria___D_3__Oceanospirillales___D_4__Endozoicomonadaceae___D_5__Endozoicomonas'
            pic_y_trait = 'growth_rate_mm_per_year'
            
            pic_filter_column = 'Weedy'
            pic_filter_value = '0'
      
            pic_suffix = ''
            try:
                result = phylogenetic_independent_contrasts(pic_trait_table,pic_tree,pic_x_trait,pic_y_trait,pic_filter_column,\
                  pic_filter_value,analysis_output_dir,pic_suffix,verbose = False)
            
                result["analysis_label"] = analysis_label
            except TypeError:
                raise ValueError(f"Underlying R code errored out with x trait {pic_x_trait}. Likely bad column name.")
            results_df = results_df.append(result,ignore_index=True)   
            results_df["FDR_q"] = get_FDR(results_df)


print("Done!")

Rscript phylomorphospace_r14.r ../output/GCMP_trait_table_with_abundances_and_adiv_and_metadata_and_growth_data.tsv ../output/huang_roy_genus_tree.newick tissue_D_0__Bacteria___D_1__Proteobacteria___D_2__Gammaproteobacteria___D_3__Oceanospirillales___D_4__Endozoicomonadaceae___D_5__Endozoicomonas growth_rate_mm_per_year Weedy 0  ../output/PIC_results/A10_Endozoicomonas_vs_Growth_Rate_in_Non_Weedy_Corals
R2: 0.5425
p: 0.00629
Done!


In [302]:
results_df.to_csv(join(analysis_output_dir,"PIC_results_summary.tsv"),sep="\t")
results_df

Unnamed: 0,analysis_label,pic_x_trait,pic_y_trait,R2,p,sig_marker,FDR_q,slope,pic_filter_column,pic_filter_value,results_dir,slope_std_error,T_stat,pic_suffix,trait_table,tree
0,Endozoicomonas_vs_Growth_Rate_in_Non_Weedy_Corals,tissue_D_0__Bacteria___D_1__Proteobacteria___D...,growth_rate_mm_per_year,0.5425,0.00629,**,0.00629,0.000289,Weedy,0,../output/PIC_results/A10_Endozoicomonas_vs_G...,8.392e-05,3.444,../output/PIC_results/A10_Endozoicomonas_vs_Gr...,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick


# Analysis 11. Growth Rate vs. Disease

In [303]:
from pandas import DataFrame
results_df = DataFrame({},columns = ["analysis_label","pic_x_trait","pic_y_trait","R2","p","sig_marker","FDR_q","slope","pic_filter_column","pic_filter_value","results_dir","slope_std_error","T_stat"])


analysis_label = "growth_rate_vs_disease"
analysis_output_dir = join(results_dir,"PIC_results",f"A11_{analysis_label}")

pic_trait_table = trait_table_growth_data
pic_tree = tree
            
pic_x_trait = 'growth_rate_mm_per_year'
pic_y_trait = 'perc_dis'
            
pic_filter_column = 'None'
pic_filter_value = 'None'
      
pic_suffix = ''
try:
    result = phylogenetic_independent_contrasts(pic_trait_table,pic_tree,pic_x_trait,pic_y_trait,pic_filter_column,\
      pic_filter_value,analysis_output_dir,pic_suffix,verbose = False)
            
    result["analysis_label"] = analysis_label
except TypeError:
    raise ValueError(f"Underlying R code errored out with x trait {pic_x_trait}. Likely bad column name.")

results_df = results_df.append(result,ignore_index=True)   
results_df["FDR_q"] = get_FDR(results_df)


print("Done!")

Rscript phylomorphospace_r14.r ../output/GCMP_trait_table_with_abundances_and_adiv_and_metadata_and_growth_data.tsv ../output/huang_roy_genus_tree.newick growth_rate_mm_per_year perc_dis None None  ../output/PIC_results/A11_growth_rate_vs_disease
R2: 0.175
p: 0.0948
Done!


In [304]:
results_df.to_csv(join(analysis_output_dir,"PIC_results_summary.tsv"),sep="\t")
results_df

Unnamed: 0,analysis_label,pic_x_trait,pic_y_trait,R2,p,sig_marker,FDR_q,slope,pic_filter_column,pic_filter_value,results_dir,slope_std_error,T_stat,pic_suffix,trait_table,tree
0,growth_rate_vs_disease,growth_rate_mm_per_year,perc_dis,0.175,0.0948,.,0.0948,20.75,,,../output/PIC_results/A11_growth_rate_vs_dise...,11.63,1.783,../output/PIC_results/A11_growth_rate_vs_disease,../output/GCMP_trait_table_with_abundances_and...,../output/huang_roy_genus_tree.newick
