Our goal is to take a Newick format tree file, which has unique accession IDs on each leaf node, and replace those accession IDs with a more human-readable name.

In [1]:
from pathlib import Path

from Bio import Phylo

In [2]:
# Our initial tree
treefile = Path("03_bootstrap.raxml.support")

# Load the tree into Bio.Phylo
tree = Phylo.read(treefile, "newick")

Read in the alternative names

In [3]:
with open("dictionary.tsv") as ifh:
    namedict = {k[:-1]:f"{v.strip()}|{k.split('|')[1]}" for k, v in (line.strip().split("\t") for line in ifh.readlines())}
    
list(namedict.items())[:10]

[('tr|A0A258M961|A0A258M961_9BURK',
  'Ammonium transporter Polynucleobacter sp. 35-46-11|A0A258M961'),
 ('tr|A0A2U9T2W9|A0A2U9T2W9_9GAMM',
  'Ammonium transporter Lysobacter maris|A0A2U9T2W9'),
 ('tr|A0A259IH62|A0A259IH62_9BURK',
  'Ammonium transporter Polynucleobacter sp. 39-46-10|A0A259IH62'),
 ('tr|A0A1H3EZE1|A0A1H3EZE1_LYSEN',
  'Ammonium transporter Lysobacter enzymogenes|A0A1H3EZE1'),
 ('tr|A0A1G3CU56|A0A1G3CU56_9BURK',
  'Ammonium transporter Polynucleobacter sp. GWA2_45_21|A0A1G3CU56'),
 ('tr|A0A0A0EY24|A0A0A0EY24_9GAMM',
  'Ammonium transporter Lysobacter daejeonensis GH1-9|A0A0A0EY24'),
 ('tr|A0A4R8F1T3|A0A4R8F1T3_9ENTR',
  'Ammonium transporter Buttiauxella sp. BIGb0552|A0A4R8F1T3'),
 ('tr|A0A1B7HUF3|A0A1B7HUF3_9ENTR',
  'Ammonium transporter Buttiauxella gaviniae ATCC 51604|A0A1B7HUF3'),
 ('tr|A0A1B7I1P1|A0A1B7I1P1_9ENTR',
  'Ammonium transporter Buttiauxella noackiae ATCC 51607|A0A1B7I1P1'),
 ('tr|A0A3S6EVI2|A0A3S6EVI2_YERET',
  'Ammonium transporter Yersinia entomophaga

We need to make sure all leaf node names are unique. One way to do this is to add arbitrary indexing (1, 2, 3, ..., etc.) to the end of the name.

Another approach is to append the unique accession to the leaf name in an unobtrusive way.

Now let's swap the accession IDs out, replacing them with the human-readable names

In [4]:
for leaf in tree.get_terminals():
    try:
        leaf.name = namedict[leaf.name]
    except KeyError:
        pass
    
tree.get_terminals()[:5]

[Clade(branch_length=0.001254, name='Ammonium transporter Saccharomyces cerevisiae (strain Lal...'),
 Clade(branch_length=1e-06, name='Ammonium transporter Saccharomyces cerevisiae x Saccharom...'),
 Clade(branch_length=1e-06, name='Ammonium transporter Saccharomyces boulardii (nom. inval....'),
 Clade(branch_length=0.07042, name='Ammonium transporter Saccharomyces eubayanus|A0A0L8RCS7'),
 Clade(branch_length=0.057576, name='Ammonium transporter Saccharomyces arboricola (strain H-6...')]

In [5]:
Phylo.draw_ascii(tree)

 , Ammonium transporter Saccharomyces ce...
 |
 | Ammonium transporter Saccharomyces ce...
 |
 | Ammonium transporter Saccharomyces bo...
 |
 , Ammonium transporter Saccharomyces eu...
 |
 | Ammonium transporter Saccharomyces ar...
 |
 | Ammonium transporter Saccharomyces ce...
 |
 , Ammonium transporter Torulaspora sp. ...
 |
 , Ammonium transporter Zygosaccharomyce...
 |
 | Ammonium transporter Zygosaccharomyce...
 |
 | Ammonium transporter Zygosaccharomyce...
 |
 |, Ammonium transporter Debaryomyces han...
 ||
 || Ammonium transporter Debaryomyces fab...
 ||
 |, Ammonium transporter Meyerozyma guill...
 ||
 || Ammonium transporter Meyerozyma sp. J...
 ||
 || Ammonium transporter Suhomyces tanzaw...
 ||
 |, Ammonium transporter Lodderomyces elo...
 ||
 |, Ammonium transporter Candida orthopsi...
 ||
 || Ammonium transporter Candida parapsil...
 ||
 |, Ammonium transporter Candida maltosa ...
 ||
 || Ammonium transporter Candida albicans...
 ,|
 || Ammonium transporter Candida tropica

Write out the new, modified tree.

In [6]:
Phylo.write(tree, "transporter_tree_renamed.new", "newick")

1

In [7]:
!ls

03_bootstrap.raxml.bestTree
03_bootstrap.raxml.bootstraps
03_bootstrap.raxml.support
5-8-22-Harris-agenda_LP.docx
RAxML_bipartitionsBranchLabels.output-3_bootstrap.tre
dictionary.tsv
rename_tree_leaves.ipynb
transporter_tree_renamed.new
