Add average amino acid identity (AAI) #16

ninjatacoshell · 2016-03-11T23:14:19Z

An alternative to ANI for more distantly related genomes is average amino acid identity (AAI; see Konstantinidis and Tiedje 2005 and Rodrigues and Konstantinidis 2014). Instead of DNA FASTA files the user would need to supply protein FASTA files.

This web tool only lets you calculate AAI for two genomes at a time.

This web tool lets you calculate AAI for multiple genomes, but only the ones that are stored in its database (i.e. no user-generated genomes). And the database doesn't appear to have been updated since around 2012.

This web tool lets you calculate AAI for up to 10 genomes at a time, but you have to run them through RAST, first, which is inconvenient.

So being able to run AAI on your own machine, like pyani already does for ANI and tetranucleotide regression, would be very useful.

widdowquinn · 2016-03-30T15:13:14Z

I like the idea, but I'm inclined to leave this to a later version of pyani that integrates with pyrbbh.

For AAI we need to define equivalent proteins for comparison. That's something which can be done in several ways, and I'd like to hand that method choice off to the user's preference. I'm not sure that there's a data standard for specifying such equivalence for pairs of proteins, or for whole groups - I'll have to put some time into looking around for one (suggestions welcome) or devising one that works here.

ninjatacoshell · 2016-03-30T20:46:53Z

In the original paper by Konstantinidis and Tiedje 2005 they performed AAI by searching all protein-coding sequences from the query genome against the reference genome using TBLASTN, with cut-offs of at least 30% identity and at least 70% coverage. They called this one-way BLAST. Then they took the top matching segment and performed the reverse search using BLASTX (presumably with the same cut-offs). They called this two-way BLAST. In their analysis the two-way BLAST was slightly more reliable.

How would BBH compare to their two-way BLAST in terms of computation time? And would it be invulnerable to inconsistencies in the annotation between different genomes the way their two-way BLAST is?

widdowquinn · 2016-03-30T22:09:45Z

The method from Konstantinidis and Tiedje is one of several ways to define 'equivalent proteins/CDS'. It happens to be one that doesn't require a prior protein annotation on the 'reference', but it does require one on the query.

The two-way BLAST search is likely to be more reliable than the one-way analysis for the same reasons RBH/BBH matches are more reliable than one-way BLAST matches, in general (as described in, e.g. https://github.com/widdowquinn/Teaching-Dundee-BS32010/blob/master/workshop_2/06-RBBH.ipynb and https://github.com/widdowquinn/Teaching-Dundee-BS32010/blob/master/lecture/2016-03-21_BS32010_Pritchard.pdf).

In terms of differences in computation time, I don't know off-hand how it would work out. I'd expect reciprocal BLASTP of a query protein complement against protein database of a reference protein complement to be faster than BLASTX of query against untranslated genome, but I wouldn't be upset if that wasn't true ;) As for inconsistencies in annotation - given that you have one protein annotation already in the K&T method, then I wouldn't consider it invulnerable to "annotation inconsistency". You could try two-way TBLASTX if you want to ignore annotation altogether (but although you're then invulnerable to annotation inconsistency, you also do not gain any of its many advantages…)

ninjatacoshell · 2016-09-26T21:03:33Z

I don't know if it will help, but they've put their script for calculating AAI (using Ruby) on GitHub: https://github.com/lmrodriguezr/enveomics/blob/master/Scripts/aai.rb. Perhaps it (or part of it) can be rewritten for Python?

sbridel · 2017-03-29T12:20:35Z

Suggestion: https://github.com/dparks1134/CompareM using Diamond and Prodigal to find equivalent protein.

The AAI feature will be very nice in pyani

widdowquinn self-assigned this Mar 30, 2016

widdowquinn added the enhancement something we'd like pyani to do that it doesn't already label Mar 30, 2016

widdowquinn mentioned this issue May 1, 2020

Is it possible to add AAI? #185

Closed

widdowquinn added this to the 0.3.1 milestone May 28, 2020

widdowquinn added the method the issue relates to how results are calculated label May 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add average amino acid identity (AAI) #16

Add average amino acid identity (AAI) #16

ninjatacoshell commented Mar 11, 2016

widdowquinn commented Mar 30, 2016

ninjatacoshell commented Mar 30, 2016

widdowquinn commented Mar 30, 2016

ninjatacoshell commented Sep 26, 2016

sbridel commented Mar 29, 2017

Add average amino acid identity (AAI) #16

Add average amino acid identity (AAI) #16

Comments

ninjatacoshell commented Mar 11, 2016

widdowquinn commented Mar 30, 2016

ninjatacoshell commented Mar 30, 2016

widdowquinn commented Mar 30, 2016

ninjatacoshell commented Sep 26, 2016

sbridel commented Mar 29, 2017