## Representing Traits on Trees

Now that we have a handle on phylogenetic trees and an idea of how to represent them, we can add in traits to those trees.

## Species can share traits due to homology or convergence

When species share a trait due to inheritance from a common ancestor, that is called **homology** and the shared trait is called a **homologous trait**. When species instead share a trait because it evolved more than once on the tree, that is called **convergent evolution** and the shared trait is known as a **homoplasy**. 

An example of homologous traits vs. convergently evolved traits is shown below:

<figure caption-side="bottom">
<img src="./resources/homology_vs_convergence.png" width="500" style="margin:5px 25px" description="A tree showing a homologous trait arising in an ancestor and being inherited by several of its descendants, and another tree showing the same trait evolving separately in two totally different parts of the tree." > 
<figcaption align="left"><b>Fig. 1</b> - Homologous vs. convergently evolved traits. <br></figcaption>
</figure>


Whether a particular traits is a homologous trait or a homoplasy depends on the tree. Indeed, modern methods for inferring trees test millions of possible trees to find the ones that do the best job of explaining as many shared traits as possible by homology.

## Simulating Trait Evolution

We can add traits to our birth-death model of tree evolution by adding a dict of BinaryTraits to each tree node, and then modifying it as necessary over time


In [8]:
from copy import deepcopy
from random import random,choice

class PhyloNode():
    """A node on a phylogenetic tree"""

    def __init__(self,children=None,parent=None,\
      name = None,binary_traits=None):
      """Initiate a node on a phylogenetic tree
      children -- a list of PhyloNode objects
        descending immediately from this node
      name -- a string for the name of the node
      parent -- a single PhyloNode object for the parent
      binary_traits -- a dict of True/False traits
      """
      self.Name = name
      self.Children = []
      if children:
        self.Children.extend(children)
      self.Parent = parent
      self.BinaryTraits = deepcopy(binary_traits) or {}
      self.Extinct = False

    def isTip(self):
      """Return True if the node is a tip"""
      if not self.Children: #capture None or []
        return True
      else:
        return False

    def isRoot(self):
      """Return True if the node is the root of the whole tree"""
      if not self.Parent:
        return True
      else:
        return False

    def getDescendants(self):
        """Return a list of PhyloNodes descending from the current node"""

        if self.isTip():
            print(self.Name," is a tip ... returning []")
            return []

        descendants = self.Children or []
        for c in self.Children:
            #The set of descendants is described
            #by the descendants of all the nodes
            #immediate children.
            if not c.isTip():
                child_descendants = c.getDescendants()
                descendants.extend([c for c in child_descendants if c not in descendants])

            #Side note: this will fail on enormous trees
            #due to the recursion limit. Not normally a problem though.
        return descendants

    def getAncestors(self):
        """Return the ancestors of the given node"""
        if self.isRoot():
            return None

        ancestors = [self.Parent]
        parents_ancestors = self.Parent.getAncestors()
        if parents_ancestors:
            ancestors.extend(parents_ancestors)
        return ancestors

    def getRoot(self):
        """Return the root node"""
        curr_node = self
        while not curr_node.isRoot():
            curr_node = curr_node.Parent
        return curr_node

    def addChild(self,child):
        """Attach a child node"""
        if child not in self.Children:
            self.Children.append(child)
        child.Parent = self
 
    def addParent(self,parent):
        """Attach a parent node"""
        self.Parent = parent
        parent.Children.append(self)
    
    def getMRCA(self,other):
        """Return PhyloNode for the most recent common ancestor with other
        other -- another PhyloNode object in the same tree
        """
        most_recent_common_ancestor = None
        #Cache the ancestors of other since
        #we'll refer to it often
        other_ancestors = other.getAncestors()
        self_ancestors = self.getAncestors()
        #First check for the trivial case
        #where one node is root (but be sure they're on the same tree)
        if self.isRoot() and self in other_ancestors:
            return self

        if other.isRoot() and other in self_ancestors:
            return other

        for a in self_ancestors:
            #Note these will be in order of relatedness
            if a in other_ancestors:
                #End the loop the first time this happens
                #since we want the most
                #recent common ancestor
                most_recent_common_ancestor = a
                break

        if not most_recent_common_ancestor:
            raise ValueError("No common ancestor found for ",self.Name," and ",other.Name,\
            ". Are they on the same tree?")

        return most_recent_common_ancestor
    
    def speciate(self):
        """Add two children descending from this node"""
        if self.Children:
            raise ValueError("Internal nodes can't speciate")
        if self.Extinct:
            raise ValueError("Extinct nodes can't speciate")

        child1_name = self.Name + "A"
        child1=PhyloNode(name=child1_name,binary_traits=self.BinaryTraits)
        self.addChild(child1)
        child2_name = self.Name + "B"
        child2=PhyloNode(name=child2_name,binary_traits=self.BinaryTraits)
        self.addChild(child2)


    def update(self,speciation_chance=0.25,trait_change_chance=0.25,\
        extinction_chance=0.25):
        """Update the node by speciation, extinction or trait gain or loss"""
        if not self.isTip():
            print("Not updating node {name} - it's not a tip".format(name=self.Name))
            return None

        if random() < extinction_chance:
            print(self.Name," goes extinct!")
            self.Extinct = True

        if self.Extinct:
            print("Not updating node {name} - it's extinct".format(name=self.Name))
            return None
        print("Updating node:{name}".format(name=self.Name))


        for trait,value in self.BinaryTraits.items():
            if random() < trait_change_chance:
                self.BinaryTraits[trait] = not self.BinaryTraits[trait]
                print(self.Name," has a new trait value for ",\
                  trait,": ",self.BinaryTraits[trait],"!")

        if random() < speciation_chance:
            #Speciate
            print("Node {name} speciates!".format(name=self.Name))
            self.speciate()

In [9]:
#Define traits
binary_traits = {"Blubber":False,"Red coat":True,"Big Ears":False,\
  "Horn":False,"Wings":False}
#Build the tree!
root = PhyloNode(name="root",binary_traits=binary_traits)
A = PhyloNode(name="A",binary_traits=binary_traits)
B = PhyloNode(name="B",binary_traits=binary_traits)
root.addChild(A)
root.addChild(B)

time = 5
#Demo our tree functions
print("Root node name:",root.Name)

for t in range(1,time):
    print("="*80)
    print("Beginning simulation of time:",t)
    print("="*80)
    #I want to keep the original set of nodes to update
    #so that new species don't immediately get simulated
    nodes_to_update = [n for n in root.getDescendants()]
    for node in nodes_to_update:

        if not node.isTip():
            continue
        print("About to update node:",node.Name)
        node.update()
    print("Nodes in tree at time ",t,":",len([n.Name for n in root.getDescendants()]))
    print("Tree tips:",[n.Name for n in root.getDescendants() if n.isTip()])


Root node name: root
Beginning simulation of time: 1
About to update node: A
Updating node:A
A  has a new trait value for  Horn :  True !
A  has a new trait value for  Wings :  True !
Node A speciates!
About to update node: B
Updating node:B
Nodes in tree at time  1 : 4
Tree tips: ['B', 'AA', 'AB']
Beginning simulation of time: 2
About to update node: B
Updating node:B
B  has a new trait value for  Blubber :  True !
B  has a new trait value for  Big Ears :  True !
About to update node: AA
Updating node:AA
AA  has a new trait value for  Big Ears :  True !
About to update node: AB
Updating node:AB
AB  has a new trait value for  Red coat :  False !
AB  has a new trait value for  Big Ears :  True !
AB  has a new trait value for  Horn :  False !
AB  has a new trait value for  Wings :  False !
Nodes in tree at time  2 : 4
Tree tips: ['B', 'AA', 'AB']
Beginning simulation of time: 3
About to update node: B
Updating node:B
B  has a new trait value for  Red coat :  False !
Node B speciates!
Abo

## Ancestral State Reconstruction

The goal of ancestral state reconstruction (ASR) is to determine the traits of ancestral organisms by comparative analysis of modern organisms. More formally, given annotations for the traits of tip nodes, ancestral state reconstruction attempts to reconstruct the traits at internal nodes.


#### The Fitch Parsimony algorithm for Ancestral State Reconstruction.

## Molecular Clock Analysis

Molecular clock analysis uses the rate at which traits change over time as a sort of clock. For example, we might expect that if we compare the same snippet of DNA across many species, a pair of species that are 99% similar in DNA sequence in that gene probably diverged much more recently than a pair that are only 76% similar. However, such methods must be careful to account for several factors. 

First, some changes in DNA sequence will be hidden. For example, imagine a given nucleotide mutated from A to T and then, some years later, back from T to A. This was two changes in the DNA sequence, but a later comparative analysis would see 0 changes! For similar reasons, even two random sequences with equal base frequencies will appear to share 25% sequence identity. 

A second point where caution is warrented is that the rate of evolution may change in different parts of the tree. For example, if a mutation renders the proof-reading machinery that tries to correct mutations more or less strict, it can decrease or increase the mutation rate in that part of the tree. **Relaxed molecular clocks** try to account for this by allowing different rates of evolution to occur in different parts of the tree.

## TODO:


#### Different styles of phylogenetic tree


#### The definition of relatedness on a phylogenetic tree

The key piece of information that trees convey is the *phylogenetic relatedness* of different organisms. Evolutionary biologists define relatedness in relative terms based on recency of common ancestry. Two organisms A and B are *more related* to one another than they are to another organism C if they share a *more recent common ancestor* with one another than either does to C.  This is a somewhat long definition, but in practice it allows relative relatedness to be tested using a simple procedure.

#### Finding the most recent common ancestor (MRCA) of two species by hand
To find the common ancestor of A and B, place your fingers on those two nodes on the tree. Then trace backwards along the tree from each to their ancestors. Stop when you hit a node that is an ancestor of both A and B (e.g. the first node that both fingers touch when tracing backwards in the tree from A and B. Mark that node as the most recent common ancestor (MRCA) of A and B.

##### A simple test for relatedness on a tree. 
This test compares whether some species A is more related to B or C. Start with a tree that has on it A, B and C. Trace back along the tree using one's fingers, starting at A and B. Stop when you hit a node that is an ancestor of both A and B (e.g. the first node that both fingers touch when tracing backwards in the tree from A and B. Mark that node as the *most recent common ancestor (MRCA) of A and B*. Now repeat the procedure for A and C and mark that node as *the most recent common ancestor (MRCA) of A and C*. The MRCA of A and B will be one of three things:

- the same node as the MRCA of A and C. In that case AB and AC are equally related.
- a descendant of the MRCA of A and C. In that case A and B have a more recent common ancestor and therefore A and B are more closely related than A and C
- an ancestor of the MRCA of A and C. In that case A and C have a more recent common ancestor and therefore A and C are more closely related than A and B.

## Monophyly and Tree Testing

##### The Snip Test
Have some dogs become unicellular obligate parasites?

Hypothesis testing using trees: Canine Transmissible Venereal Tumors 

##### Tracing zoonotic pathogen outbreaks

Zoonotic transmission occurs when a pathogen moves between another species into human hosts.