# Analyze Accuracy of the Autocuration Pipeline

This notebook analyzes the performance of the Influenza Autocuration pipeline on sequences that were already flagged in IRD.  Sequences from species A-D were downloaded from legacy IRD and the pipeline was applied to each species' set of sequences. Results from each set of sequences were saved by generating a table that listed column for 'Accession', 'Actual Flag', 'My Flag', and 'Profile'.  The 'Actual Flag' and 'My Flag' columns could be compared to assess performance.  This said, there are five possible results of performance that need to be kept in mind.  The five possibilities are the following:

(1) actual = Pass | mine = Pass

(2) actual = Pass | mine = Flag_A

(3) actual = Flag_A | mine = Pass

(4) actual = Flag_A | mine = Flag_A

(5) actual = Flag_A | mine = Flag_B

To account for these five possiblities, this performance analysis measures (1) the precision of determining a 'pass' sequence (pass precision), (2) the precision of determing a 'flag' sequence (flag precision), (3) the rate at which actual pass sequences are labeled with pass (pass recall), (4) the rate at which actual flag sequences are labeled with a flag (flag recall), (5) the accuracy of assigning the correct anotation(s) to a flagged sequence (flag accuracy), and finally (6) the overall accuracy in determining the correct annotation.

MAFFT and MUSCLE are the alignment options in this pipeline. That said, performance gets analyzed in three stages, two stages for MAFFT (one with utilization of gap penalty and one without) and one stage for MUSCLE.

In [1]:
import os
import sys
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings("ignore")

# Functions for Performance

In [2]:
def pass_flag_confusion(theirs, mine):
    
    mat = pd.DataFrame([[0,0],[0,0]], index=['Pass_IRD', 'Flag_IRD'], columns=['Pass_Mine', 'Flag_Mine'])
    
    for i in range(len(theirs)):
        if theirs[i] == 'Pass' and mine[i] == 'Pass':
            mat.at['Pass_IRD', 'Pass_Mine'] += 1
        elif theirs[i] == 'Pass' and mine[i] != 'Pass':
            mat.at['Pass_IRD', 'Flag_Mine'] += 1
        elif theirs[i] != 'Pass' and mine[i] == 'Pass':
            mat.at['Flag_IRD', 'Pass_Mine'] += 1
        elif theirs[i] != 'Pass' and mine[i] != 'Pass':
            mat.at['Flag_IRD', 'Flag_Mine'] += 1
    
    return(mat)


def corr_incorr_confusion(theirs, mine):
    
    mat = pd.DataFrame([[0,0],[0,0]], index=['Pass_IRD', 'Flag_IRD'], columns=['Match', 'No Match'])
    
    for i in range(len(theirs)):
        if theirs[i] == 'Pass' and mine[i] == 'Pass':
            mat.at['Pass_IRD', 'Match'] += 1
        elif theirs[i] == 'Pass' and mine[i] != 'Pass':
            mat.at['Pass_IRD', 'No Match'] += 1
        elif theirs[i] != 'Pass' and theirs[i] == mine[i]:
            mat.at['Flag_IRD', 'Match'] += 1
        elif theirs[i] != 'Pass' and theirs[i] != mine[i]:
            mat.at['Flag_IRD', 'No Match'] += 1
    
    return(mat)    

In [3]:
# True Pass / True Pass + False Pass
def pass_precision(mat):
    
    return((mat['Pass_Mine']['Pass_IRD'])/(mat['Pass_Mine']['Pass_IRD'] + mat['Pass_Mine']['Flag_IRD']))

# True Pass / True Pass + False Flag (Also 'True Pass Rate' and 'Pass Accuracy')
def pass_recall(mat):
    
    return((mat['Pass_Mine']['Pass_IRD'])/(mat['Pass_Mine']['Pass_IRD'] + mat['Flag_Mine']['Pass_IRD']))

# True Flag / True Flag + False Flag
def flag_precision(mat):
    
    return((mat['Flag_Mine']['Flag_IRD'])/(mat['Flag_Mine']['Flag_IRD'] + mat['Flag_Mine']['Pass_IRD']))

# True Flag / True Flag + False Pass (Also 'True Flag Rate')
def flag_recall(mat):
    
    return((mat['Flag_Mine']['Flag_IRD'])/(mat['Flag_Mine']['Flag_IRD'] + mat['Pass_Mine']['Flag_IRD'])) 

# How well flagged sequences actually got designated with the correct flag
def flag_accuracy(mat):
    
    return((mat['Match']['Flag_IRD'])/(mat['Match']['Flag_IRD'] + mat['No Match']['Flag_IRD']))

# How well 'My Flag' matches 'Actual Flag'
def overall_accuracy(mat):
    
    return((mat['Match']['Pass_IRD'] + mat['Match']['Flag_IRD'])/(mat['Match']['Pass_IRD'] + mat['Match']['Flag_IRD'] + mat['No Match']['Pass_IRD'] + mat['No Match']['Flag_IRD']))

In [4]:
def performance(flu_results):
    flu_results = flu_results[(flu_results['Profile'] != 'Unknown')]
    actual = list(flu_results['Actual Flag'])
    pred = list(flu_results['My Flag'])
    confusion1 = pass_flag_confusion(actual, pred)
    confusion2 = corr_incorr_confusion(actual, pred)
    
    print("Pass precision:", round(pass_precision(confusion1), 3))
    print("Pass recall (Pass accuracy):", round(pass_recall(confusion1), 3))
    print('\n')
    print("Flag precision:", round(flag_precision(confusion1), 3))
    print("Flag recall:", round(flag_recall(confusion1), 3))
    print('\n')
    print("Flag accuracy:", round(flag_accuracy(confusion2), 3))
    print("Overall accuracy:", round(overall_accuracy(confusion2), 3))
    
    return(confusion1, confusion2)

In [5]:
def overall_results_df(mafft_p, mafft_np, muscle):
    
    mat = pd.DataFrame([[0.1,0.1,0.1],[0.1,0.1,0.1],[0.1,0.1,0.1],[0.1,0.1,0.1],[0.1,0.1,0.1],[0.1,0.1,0.1]], 
                        index=['Pass Precision','Pass Recall','Flag Precision','Flag Recall','Flag Accuracy','Overall Accuracy'], 
                        columns=['MAFFT Penalized', 'MAFFT No Penalty', 'MUSCLE'])
    
    confusion1, confusion2 = performance(mafft_p)
    mat.at['Pass Precision', 'MAFFT Penalized'] = float(round(pass_precision(confusion1), 3))
    mat.at['Pass Recall', 'MAFFT Penalized'] = float(round(pass_recall(confusion1), 3))
    mat.at['Flag Precision', 'MAFFT Penalized'] = float(round(flag_precision(confusion1), 3))
    mat.at['Flag Recall', 'MAFFT Penalized'] = float(round(flag_recall(confusion1), 3))
    mat.at['Flag Accuracy', 'MAFFT Penalized'] = float(round(flag_accuracy(confusion2), 3))
    mat.at['Overall Accuracy', 'MAFFT Penalized'] = float(round(overall_accuracy(confusion2), 3))
    
    confusion1, confusion2 = performance(mafft_np)
    mat.at['Pass Precision', 'MAFFT No Penalty'] = float(round(pass_precision(confusion1), 3))
    mat.at['Pass Recall', 'MAFFT No Penalty'] = float(round(pass_recall(confusion1), 3))
    mat.at['Flag Precision', 'MAFFT No Penalty'] = float(round(flag_precision(confusion1), 3))
    mat.at['Flag Recall', 'MAFFT No Penalty'] = float(round(flag_recall(confusion1), 3))
    mat.at['Flag Accuracy', 'MAFFT No Penalty'] = float(round(flag_accuracy(confusion2), 3))
    mat.at['Overall Accuracy', 'MAFFT No Penalty'] = float(round(overall_accuracy(confusion2), 3))
    
    confusion1, confusion2 = performance(muscle)
    mat.at['Pass Precision', 'MUSCLE'] = float(round(pass_precision(confusion1), 3))
    mat.at['Pass Recall', 'MUSCLE'] = float(round(pass_recall(confusion1), 3))
    mat.at['Flag Precision', 'MUSCLE'] = float(round(flag_precision(confusion1), 3))
    mat.at['Flag Recall', 'MUSCLE'] = float(round(flag_recall(confusion1), 3))
    mat.at['Flag Accuracy', 'MUSCLE'] = float(round(flag_accuracy(confusion2), 3))
    mat.at['Overall Accuracy', 'MUSCLE'] = float(round(overall_accuracy(confusion2), 3))
    
    return(mat)
    

# Data loading

In [6]:
fluA_mafft_p = pd.read_csv("InfluenzaA_mafft_1.csv")
fluB_mafft_p = pd.read_csv("InfluenzaB_mafft_1.csv")
fluC_mafft_p = pd.read_csv("InfluenzaC_mafft_1.csv")
fluD_mafft_p = pd.read_csv("InfluenzaD_mafft_1.csv")
fluA_mafft_np = pd.read_csv("InfluenzaA_mafft_2.csv")
fluB_mafft_np = pd.read_csv("InfluenzaB_mafft_2.csv")
fluC_mafft_np = pd.read_csv("InfluenzaC_mafft_2.csv")
fluD_mafft_np = pd.read_csv("InfluenzaD_mafft_2.csv")
fluA_muscle = pd.read_csv("InfluenzaA_muscle.csv")
fluB_muscle = pd.read_csv("InfluenzaB_muscle.csv")
fluC_muscle = pd.read_csv("InfluenzaC_muscle.csv")
fluD_muscle = pd.read_csv("InfluenzaD_muscle.csv")

# (1) Performance of MAFFT With Gap Penalty 

## Influenza A Results

In [7]:
confusion1, confusion2 = performance(fluA_mafft_p)

Pass precision: 0.94
Pass recall (Pass accuracy): 0.979


Flag precision: 0.971
Flag recall: 0.919


Flag accuracy: 0.843
Overall accuracy: 0.92


In [8]:
confusion1

Unnamed: 0,Pass_Mine,Flag_Mine
Pass_IRD,989,21
Flag_IRD,63,714


In [9]:
confusion2

Unnamed: 0,Match,No Match
Pass_IRD,989,21
Flag_IRD,655,122


## Influenza B Results

In [10]:
confusion1, confusion2 = performance(fluB_mafft_p)

Pass precision: 0.955
Pass recall (Pass accuracy): 0.802


Flag precision: 0.666
Flag recall: 0.912


Flag accuracy: 0.875
Overall accuracy: 0.824


In [11]:
confusion1

Unnamed: 0,Pass_Mine,Flag_Mine
Pass_IRD,401,99
Flag_IRD,19,197


In [12]:
confusion2

Unnamed: 0,Match,No Match
Pass_IRD,401,99
Flag_IRD,189,27


## Influenza C Results

In [13]:
confusion1, confusion2 = performance(fluC_mafft_p)

Pass precision: 1.0
Pass recall (Pass accuracy): 0.986


Flag precision: 0.0
Flag recall: nan


Flag accuracy: nan
Overall accuracy: 0.986


In [14]:
confusion1

Unnamed: 0,Pass_Mine,Flag_Mine
Pass_IRD,69,1
Flag_IRD,0,0


In [15]:
confusion2

Unnamed: 0,Match,No Match
Pass_IRD,69,1
Flag_IRD,0,0


## Influenza D Results

In [16]:
confusion1, confusion2 = performance(fluD_mafft_p)

Pass precision: 1.0
Pass recall (Pass accuracy): 1.0


Flag precision: 1.0
Flag recall: 1.0


Flag accuracy: 0.565
Overall accuracy: 0.964


In [17]:
confusion1

Unnamed: 0,Pass_Mine,Flag_Mine
Pass_IRD,253,0
Flag_IRD,0,23


In [18]:
confusion2

Unnamed: 0,Match,No Match
Pass_IRD,253,0
Flag_IRD,13,10


## Overall Results

In [19]:
overall_mafft_p = pd.concat([fluA_mafft_p, fluB_mafft_p, fluC_mafft_p, fluD_mafft_p], axis = 0)
confusion1, confusion2 = performance(overall_mafft_p)

Pass precision: 0.954
Pass recall (Pass accuracy): 0.934


Flag precision: 0.885
Flag recall: 0.919


Flag accuracy: 0.844
Overall accuracy: 0.902


In [20]:
confusion1

Unnamed: 0,Pass_Mine,Flag_Mine
Pass_IRD,1712,121
Flag_IRD,82,934


In [21]:
confusion2

Unnamed: 0,Match,No Match
Pass_IRD,1712,121
Flag_IRD,857,159


# (2) Performance of MAFFT Without Gap Penalty

## Influenza A Results

In [22]:
confusion1, confusion2 = performance(fluA_mafft_np)

Pass precision: 0.975
Pass recall (Pass accuracy): 0.936


Flag precision: 0.921
Flag recall: 0.969


Flag accuracy: 0.896
Overall accuracy: 0.918


In [23]:
confusion1

Unnamed: 0,Pass_Mine,Flag_Mine
Pass_IRD,945,65
Flag_IRD,24,753


In [24]:
confusion2

Unnamed: 0,Match,No Match
Pass_IRD,945,65
Flag_IRD,696,81


## Influenza B Results

In [25]:
confusion1, confusion2 = performance(fluB_mafft_np)

Pass precision: 0.997
Pass recall (Pass accuracy): 0.77


Flag precision: 0.652
Flag recall: 0.995


Flag accuracy: 0.954
Overall accuracy: 0.825


In [26]:
confusion1

Unnamed: 0,Pass_Mine,Flag_Mine
Pass_IRD,385,115
Flag_IRD,1,215


In [27]:
confusion2

Unnamed: 0,Match,No Match
Pass_IRD,385,115
Flag_IRD,206,10


## Influenza C Results

In [28]:
confusion1, confusion2 = performance(fluC_mafft_np)

Pass precision: 1.0
Pass recall (Pass accuracy): 0.971


Flag precision: 0.0
Flag recall: nan


Flag accuracy: nan
Overall accuracy: 0.971


In [29]:
confusion1

Unnamed: 0,Pass_Mine,Flag_Mine
Pass_IRD,68,2
Flag_IRD,0,0


In [30]:
confusion2

Unnamed: 0,Match,No Match
Pass_IRD,68,2
Flag_IRD,0,0


## Influenza D Results

In [31]:
confusion1, confusion2 = performance(fluD_mafft_np)

Pass precision: 1.0
Pass recall (Pass accuracy): 0.996


Flag precision: 0.958
Flag recall: 1.0


Flag accuracy: 0.565
Overall accuracy: 0.96


In [32]:
confusion1

Unnamed: 0,Pass_Mine,Flag_Mine
Pass_IRD,252,1
Flag_IRD,0,23


In [33]:
confusion2

Unnamed: 0,Match,No Match
Pass_IRD,252,1
Flag_IRD,13,10


## Overall Results

In [34]:
overall_mafft_np = pd.concat([fluA_mafft_np, fluB_mafft_np, fluC_mafft_np, fluD_mafft_np], axis = 0)
confusion1, confusion2 = performance(overall_mafft_np)

Pass precision: 0.985
Pass recall (Pass accuracy): 0.9


Flag precision: 0.844
Flag recall: 0.975


Flag accuracy: 0.901
Overall accuracy: 0.9


In [35]:
confusion1

Unnamed: 0,Pass_Mine,Flag_Mine
Pass_IRD,1650,183
Flag_IRD,25,991


In [36]:
confusion2

Unnamed: 0,Match,No Match
Pass_IRD,1650,183
Flag_IRD,915,101


# (3) Performance of MUSCLE

## Influenza A Results

In [37]:
confusion1, confusion2 = performance(fluA_muscle)

Pass precision: 0.996
Pass recall (Pass accuracy): 0.992


Flag precision: 0.99
Flag recall: 0.995


Flag accuracy: 0.955
Overall accuracy: 0.976


In [38]:
confusion1

Unnamed: 0,Pass_Mine,Flag_Mine
Pass_IRD,1002,8
Flag_IRD,4,773


In [39]:
confusion2

Unnamed: 0,Match,No Match
Pass_IRD,1002,8
Flag_IRD,742,35


## Influenza B Results

In [40]:
confusion1, confusion2 = performance(fluB_muscle)

Pass precision: 0.963
Pass recall (Pass accuracy): 0.992


Flag precision: 0.98
Flag recall: 0.912


Flag accuracy: 0.889
Overall accuracy: 0.961


In [41]:
confusion1

Unnamed: 0,Pass_Mine,Flag_Mine
Pass_IRD,496,4
Flag_IRD,19,197


In [42]:
confusion2

Unnamed: 0,Match,No Match
Pass_IRD,496,4
Flag_IRD,192,24


## Influenza C Results

In [43]:
confusion1, confusion2 = performance(fluC_muscle)

Pass precision: 1.0
Pass recall (Pass accuracy): 0.971


Flag precision: 0.0
Flag recall: nan


Flag accuracy: nan
Overall accuracy: 0.971


In [44]:
confusion1

Unnamed: 0,Pass_Mine,Flag_Mine
Pass_IRD,68,2
Flag_IRD,0,0


In [45]:
confusion2

Unnamed: 0,Match,No Match
Pass_IRD,68,2
Flag_IRD,0,0


## Influenza D Results

In [46]:
confusion1, confusion2 = performance(fluD_muscle)

Pass precision: 1.0
Pass recall (Pass accuracy): 1.0


Flag precision: 1.0
Flag recall: 1.0


Flag accuracy: 1.0
Overall accuracy: 1.0


In [47]:
confusion1

Unnamed: 0,Pass_Mine,Flag_Mine
Pass_IRD,253,0
Flag_IRD,0,23


In [48]:
confusion2

Unnamed: 0,Match,No Match
Pass_IRD,253,0
Flag_IRD,23,0


## Overall Results 

In [49]:
overall_muscle = pd.concat([fluA_muscle, fluB_muscle, fluC_muscle, fluD_muscle], axis = 0)
confusion1, confusion2 = performance(overall_muscle)

Pass precision: 0.988
Pass recall (Pass accuracy): 0.992


Flag precision: 0.986
Flag recall: 0.977


Flag accuracy: 0.942
Overall accuracy: 0.974


In [50]:
confusion1

Unnamed: 0,Pass_Mine,Flag_Mine
Pass_IRD,1819,14
Flag_IRD,23,993


In [51]:
confusion2

Unnamed: 0,Match,No Match
Pass_IRD,1819,14
Flag_IRD,957,59


# EXTRA: View Overall Results in Single Table

In [52]:
old_stdout = sys.stdout
sys.stdout = open(os.devnull, "w")
overall_df = overall_results_df(overall_mafft_p, overall_mafft_np, overall_muscle)
sys.stdout = old_stdout
overall_df.to_csv("overall_performance.csv")
overall_df

Unnamed: 0,MAFFT Penalized,MAFFT No Penalty,MUSCLE
Pass Precision,0.954,0.985,0.988
Pass Recall,0.934,0.9,0.992
Flag Precision,0.885,0.844,0.986
Flag Recall,0.919,0.975,0.977
Flag Accuracy,0.844,0.901,0.942
Overall Accuracy,0.902,0.9,0.974
