### Week 5 - Building a Phylogenetic Tree from a Multiple Sequence Alignment
- October 2023
- [https://https://github.com/tisimpson/bioinformatics1](https://github.com/tisimpson/bioinformatics1)
- [ian.simpson@ed.ac.uk](mailto:ian.simpson@ed.ac.uk)
In this notebook we are going to walk through an experiment where we retreive sequences, do BLAST searches and then use the results to format and then execute a multiple sequence alignment using the MUSCLE software package. At the end we even create a basic phylogenetic tree from the alignment and then visualise it.

- To setup the MUSCLE aligner using conda use the command ``conda install -c bioconda muscle``
- To setup the RAxML phykogenetic inference tool using conda use the command ``conda install -c bioconda raxml``

In [None]:
# %pip install biopython

### Part 1 - Fetch Distantly Related Protein Sequences from NCBI

In [None]:
# import required Biophython functions 
from Bio import Entrez
from Bio import SeqIO

# distant globin accession numbers
globinAccessions = ['NP_000509','NP_005359','NP_067080','NP_001049476','NP_001235928']

# set email address for NCBI
Entrez.email = 'ian.simpson@ed.ac.uk'

# fetch the sequences from NCBI
handle = Entrez.efetch(db="protein", id=globinAccessions, rettype="gb", retmode="text")
records = SeqIO.parse(handle, "gb")

# # print out the sequence names
# for record in records:
#     print(record.name)

# write to fasta file
SeqIO.write(records, "globins.fa", "fasta")

### Part 2 - Perform a MUSCLE Multiple Sequence Alignment

In [None]:
# module to manipulate alignments
from Bio import AlignIO
# module to allow command line calls
import os

# run Muscle MSA
cmdLine = 'muscle -align globins.fa -output distant_globins.aln'
os.popen(cmdLine)

In [None]:
# read in the MUSCLE alignment
alignment = AlignIO.read('distant_globins.aln','fasta')

### Part 3 - Calculate a Basic UPGMA Tree (NB this is not a true phylogenetic tree)

In [None]:
# import modules for tree building
from Bio import Phylo
from Bio.Phylo.TreeConstruction import DistanceTreeConstructor
from Bio.Phylo.TreeConstruction import DistanceCalculator

# select the distance matrix and tree building method
calculator = DistanceCalculator('dayhoff')
constructor = DistanceTreeConstructor(calculator, 'upgma')

# build the tree
tree = constructor.build_tree(alignment)

# plot the tree
Phylo.draw(tree)

### Part 4 - Use RAxML to Build a True Phylogenetic Tree from the MSA

In [None]:
# import the Phylo applications module to run RaxML command line
# https://cme.h-its.org/exelixis/web/software/raxml/
# https://biopython.org/docs/1.75/api/Bio.Phylo.Applications.html

# NB create a directory for the RaxML output called 'raxml' in the same directory as this script

from Bio.Phylo.Applications import RaxmlCommandline

# convert the alignment to phylip format
AlignIO.write(alignment, 'distant_globins.phy', 'phylip-relaxed')

# set the working directory
current_dir = os.getcwd()
working_dir = current_dir+'/raxml/'
print(working_dir)

# NB RAxML will not overwrite existing files, so delete the old info file and it will create a new one
if os.path.exists(working_dir+'RAxML_info.*'):
    os.remove(working_dir+'RAxML_info.*')

# set up the RaxML commandline call
raxml_cline = RaxmlCommandline(sequences='distant_globins.phy', model="PROTCATWAG", name="distant_globins", working_dir=working_dir)

#run raxml
raxml_cline()

In [None]:
# read in the resulting tree
tree = Phylo.read(working_dir+'/RAxML_result.distant_globins','newick')

# plot the tree
Phylo.draw(tree)