# Build a distance matrix from the tree of coral species

In order to extract a cophenetic (tip-to-tip) distance matrix from the tree, we'll need a way to read in tree files. DendroPy is one convenient lightweight way to do so. 

If you don't already have it installed, uncomment to install.

In [1]:
#!python3 -m pip install git+git://github.com/jeetsukumaran/DendroPy.git

#### Set up input and output directories 

In [49]:
from os.path import exists,join



input_dir = join("..","input")
huang_roy_tree_path = join(input_dir,"huang_roy_molecular.newick")
output_dm_filename = "huang_roy_coral_tree_distance_matrix.csv"
huang_roy_output_dm_path = join(output_dir,output_dm_filename)

output_dir = join("..","output")

pruned_tree_dir = join(output_dir,"harmonized_tree_and_trait_table")
pruned_tree_path = join(pruned_tree_dir,"pruned_tree.newick")
pruned_tree_output_dm_path = join(pruned_tree_dir,"pruned_tree_distance_matrix.csv")

huang_roy_species_list_output_path = join(output_dir,"huang_roy_coral_tree_full_species_list.txt")


#### Check that all required files are present

In [50]:
required = [output_dir,huang_roy_tree_path,pruned_tree_path]

for r in required:  
    if not exists(r):
        raise FileNotFoundError(f"The specified tree file {r} does not seem to be at the specified path.")
    print(f"Confirmed that required file or directory {r} exists.")

Confirmed that required file or directory ../output exists.
Confirmed that required file or directory ../input/huang_roy_molecular.newick exists.
Confirmed that required file or directory ../output/harmonized_tree_and_trait_table/pruned_tree.newick exists.


#### Calculate the host cophenetic distance matrix

In [51]:
import dendropy

def get_dm_from_coral_tree(tree_path,output_filepath):
    tree = dendropy.Tree.get(path=tree_path, schema='newick') # or whatever relevant format if not newick
    pdm = tree.phylogenetic_distance_matrix()
    pdm.write_csv(output_filepath)
    
get_dm_from_coral_tree(pruned_tree_path,pruned_tree_output_dm_path)
get_dm_from_coral_tree(huang_roy_tree_path,huang_roy_output_dm_path)

#### Load the host cophenetic distance matrix in pandas

In [52]:
import pandas as pd

#Set the input csv to huang_roy_coral_tree_dm_
cophenetic_dm = pd.read_csv(huang_roy_output_dm_path)
curr_species_col_name = cophenetic_dm.columns[0]
cophenetic_dm = cophenetic_dm.rename(columns={curr_species_col_name: 'species_name'})
cophenetic_dm.set_index('species_name')

species_list = list(cophenetic_dm['species_name'])

#### Write the Huang Roy species list to a file

We'll now write out the Huang Roy species list to a file, one id per line

In [55]:
f = open(huang_roy_species_list_output_path,"w")
for species in species_list:
    f.write(species+"\n")
f.close
print(f"Done writing species list file: {huang_roy_species_list_output_path}")

Done writing species list file: ../output/huang_roy_coral_tree_full_species_list.txt
