## In order to evaluate the performance of the model, there needs to be a definitive way to measure accuracy.

One possible evaluation metric involves using an "intersection over union" measurement. It takes the overlapping areas of bounding boxes and divides the total area of both bounding boxes. This produces an accuracy score that can be used to measure how close of a match the bounding boxes area. A score of 1.0 reflects a perfect match, where scores closer to 0 are likely incorrect matches.

However, it is also necessary to consider the case that the number of bounding boxes guessed is inaccurate. There are two ways this can happen. In the situation that the number of guessed bounding boxes is lower, the guesses should be matched to the closest real box to determine an error while the missing pairs are automatically considered errors (false negative). Should the number of guesses be higher, each existing box should determine its closest match that is not more proximal to any other box. The left over guesses are considered in the error count as false positives.

This is actually not as complex as it sounds. In order to implement this, the the distance is taken from the start of each bounding box in the true set and the test test. The distances are sorted and boxes that have not yet been paired are matched. If there are unmatched true values, the model missed some things that we know are there. If there are unmatched test values, the model found too many faces.

For the accurate pairings, the intersection over union calculation is performed to determine an accuracy metric. It's possible to tune the model to increase these ratings and the overall false positive and false negative count.



In [185]:
import math
import sqlite3
import pandas as pd
import cv2
import os
from matplotlib import pyplot as plt

# Consider box1, box2 as [x, y, width, height]
def iou(box1, box2):
    xa = max(box1[0], box2[0])
    ya = max(box1[1], box2[1])
    xb = min(box1[0] + box1[2], box2[0] + box2[2])
    yb = min(box1[1] + box1[3], box2[1] + box2[3])
    
    i_area = max(0, xb - xa + 1) * max(0, yb - ya + 1)
    
    a_area = box1[2] * box1[3]
    b_area = box2[2] * box2[3]
    
    return i_area / float(a_area + b_area - i_area)

def box_distance(box1, box2):
    # sqrt for real cartesian distance but here it doesn't matter, just performance garbler
    return (box1[0] - box2[0]) ** 2 + (box1[1] - box2[1]) ** 2


In [136]:
# [ [image_path, image_hash, [bbox1, bbox2, ...]], ... ]
def evaluate_performance(ytrue, yhat):
    details_structure = {
        'image_path': [],
        'true_count':[],
        'guessed_count':[],
        'matches':[],
        'false_positives': [],
        'false_negatives': [],
        'category': [],
        'avg_score': [],
        
    }
    results = {
        'true_boxes': 0,
        'guessed_boxes': 0,
        'false_positives': 0,
        'false_negatives': 0,
        'bad_guesses': 0,        # < .05
        'unlikely_guesses': 0,   # < .3
        'okay_guesses': 0,       # < .5
        'good_guesses': 0,       # < .75
        'great_guesses': 0,      # else
    }
    
    for image_path in yhat.keys():
        real_boxes = ytrue[image_path][1]
        guessed_boxes = yhat[image_path][1]
        category = ytrue[image_path][0]
        count_real = len(real_boxes)
        count_guess = len(guessed_boxes)
        distances = []
        i = 0
        results['true_boxes'] += count_real
        results['guessed_boxes'] += count_guess
        for b1 in real_boxes:
            j = 0
            for b2 in guessed_boxes:
                distances.append((i, j, box_distance(b1, b2)))
                j += 1
            i += 1
        distances.sort(key = lambda x: x[2])
        
        assigned_r = [ False for _ in range(count_real)]
        assigned_g = [ False for _ in range(count_guess)]
        assignments = []
        
        for d in distances:
            if assigned_r[d[0]] or assigned_g[d[1]]:
                pass
            else:
                assigned_r[d[0]] = True
                assigned_g[d[1]] = True
                assignments.append(d)
        
        fns = count_real - sum(assigned_r)
        fps = count_guess - sum(assigned_g)
        scores = [ iou(real_boxes[a[0]], guessed_boxes[a[1]]) for a in assignments ]
        score_count = len(scores)
        avg_score = score_count > 0 and sum(scores) / score_count or 0
        details_structure['image_path'].append(image_path)
        details_structure['matches'].append(score_count)
        details_structure['true_count'].append(count_real)
        details_structure['guessed_count'].append(count_guess)
        details_structure['false_positives'].append(fps)
        details_structure['false_negatives'].append(fns)
        details_structure['category'].append(category)
        details_structure['avg_score'].append(avg_score)
        results['false_positives'] += fps
        results['false_negatives'] += fns
        for s in scores:
            if s < .05:
                results['bad_guesses'] += 1
            elif s < .3:
                results['unlikely_guesses'] +=1
            elif s < .5:
                results['okay_guesses'] += 1
            elif s < .75:
                results['good_guesses'] += 1
            else:
                results['great_guesses'] += 1
    results['details'] = pd.DataFrame(data = details_structure)
    return results

### Here are some dummy data structures for testing and illustration purposes.

In [137]:
fake_ytrue = {
    'a': [1, [[449, 330, 122, 149]]],
    'b': [2, [[361, 98, 263, 339]]],
    'c': [3, [[304, 265, 16, 17],[328, 295, 160, 200]]]
}

fake_yhat = {
    'a': [1, [[448, 330, 122, 149],[449, 330, 122, 149]]],
    'b': [2, [[361, 98, 263, 339]]],
    'c': [3, [[323, 292, 160, 200]]]
}

In [138]:
evaluate_performance(fake_ytrue, fake_yhat)

{'true_boxes': 4,
 'guessed_boxes': 4,
 'false_positives': 1,
 'false_negatives': 1,
 'bad_guesses': 0,
 'unlikely_guesses': 0,
 'okay_guesses': 0,
 'good_guesses': 0,
 'great_guesses': 3,
 'details':   image_path  true_count  guessed_count  matches  false_positives  \
 0          a           1              2        1                1   
 1          b           1              1        1                0   
 2          c           2              1        1                0   
 
    false_negatives  category  avg_score  
 0                0         1   1.030381  
 1                0         2   1.013619  
 2                1         3   0.932834  }

In [139]:
def transform_results(info, boxes):
    results = {}
    for f in info.keys():
        results[f] = [ int(f.split('--')[0]), [ [r.x, r.y, r.width, r.height] for r in 
            boxes[boxes['image_path'] == f].itertuples() ] ]
    return results

def get_trial_notes(trial):
    con = sqlite3.connect('results.db')
    res = pd.read_sql_query('select notes, model_name from trials where id = ?', con, params = (trial,))
    con.close()
    return (res.iloc[0].model_name, res.iloc[0].notes)

def fetch_results(trial):
    con = sqlite3.connect('results.db')
    bbox = pd.read_sql_query('select * from trials_bbx where trial_id = ?', con, params = (trial,))
    counts = pd.read_sql_query('select * from trials_counts where trial_id = ?', con, params = (trial,))
    counts_dict = pd.Series(counts.box_count.values, index = counts.image_path).to_dict()
    con.close()
    return transform_results(counts_dict, bbox)
    
def fetch_true(use_val = False, sample_pct = 1, sample_seed = 1):
    con = sqlite3.connect('widerface.db')
    db_str = ['val','train']
    bbox = pd.read_sql_query(f'select * from bbx_{use_val and db_str[0] or db_str[1]}  where invalid = 0 and width * height > 25', con)
    counts = bbox.image_path.value_counts()
    con.close()
    return transform_results(counts, bbox)
    

## Draws bounding boxes for a specific image given a path, true boxes, and guessed boxes. True boxes are green while guessed boxes are red.

In [213]:
def draw_boxes(image_path, true_boxes, guessed_boxes, path_prefix, out_dir):
    i = cv2.imread(f'{path_prefix}/{image_path}', cv2.IMREAD_COLOR)
    for b in true_boxes:
        x1 = b[0]
        y1 = b[1]
        x2 = b[0] + b[2]
        y2 = b[1] + b[3]
        cv2.rectangle(i, (x1, y1), (x2, y2), (0, 255, 0), 2)
    for b in guessed_boxes:
        x1 = b[0]
        y1 = b[1]
        x2 = b[0] + b[2]
        y2 = b[1] + b[3]
        cv2.rectangle(i, (x1, y1), (x2, y2), (0, 0, 255), 2)
    i2 = i[:,:,::-1]
    img_name = image_path.split('/')[1]
    save_path = f'{out_dir}/{img_name}'
    cv2.imwrite(save_path, i)

### Get real bounding box data in the correct format. This may take a bit. It is necessary to run this before doing any comparison.

In [42]:
ytrue = fetch_true() # Training boxes
# ytrue = fetch_true(True) # Validation boxes

### Get data for a given trial. Again, this may take a bit. You can get information about a trial number with get_trial_notes(#).

In [216]:
print(get_trial_notes(11))
yhat = fetch_results(11)

('haar', 'scale: 1.1, neighbors: 3, 5% sample')


### Note that the difference in guessed boxes and false positives should equal the sum of every guess rating. The sum of this and false negatives (missed boxes) should match the true total.

In [122]:
def basic_results_analysis(res):
    predicted = res['guessed_boxes'] - res['false_positives']
    guess_sum = res['bad_guesses'] + res['unlikely_guesses'] + res['okay_guesses'] + res['good_guesses'] + res['great_guesses']
    good_guesses = res['okay_guesses'] + res['good_guesses'] + res['great_guesses']
    assert(predicted == guess_sum)
    assert(predicted + res['false_negatives'] == res['true_boxes'])
    print(f'This model idenitified {res["guessed_boxes"]} total bounding boxes, though {res["false_positives"]} of them did not correspond to a real bounding box.\n'
    + f'{res["false_negatives"]} known boxes failed to be identified ({round(res["false_negatives"] / res["true_boxes"] * 100, 2)}%).\n'
    + f'Of the guessed boxes, {res["bad_guesses"]} were very unlikely to be a match and {res["unlikely_guesses"]} are probably not accurate.\n'
    + f'{good_guesses} ({round(good_guesses / res["true_boxes"] * 100, 2)}% of total) were identified with reasonably high confidence.')

## This will draw all bounding boxes for a test set. If the limit is set, it will stop after that many images. The *out_dir* must exist on your computer.

In [218]:
def draw_test_true_comparison(yhat, limit = -1, path_prefix = './data/train/images/', out_dir = './data/results'):
    count = 0
    for i in test.items():
        image_path = i[0]
        guessed_boxes = i[1][1]
        true_boxes = ytrue[image_path][1]
        draw_boxes(image_path, true_boxes, guessed_boxes, path_prefix, out_dir)
        count += 1
        if limit > 0 and count >= limit:
            break

## Basic performance analysis method:

In [219]:
res = evaluate_performance(ytrue, yhat)
basic_results_analysis(res)
res['details']

This model idenitified 2678 total bounding boxes, though 772 of them did not correspond to a real bounding box.
4914 known boxes failed to be identified (72.05%).
Of the guessed boxes, 750 were very unlikely to be a match and 85 are probably not accurate.
1071 (15.7% of total) were identified with reasonably high confidence.


Unnamed: 0,image_path,true_count,guessed_count,matches,false_positives,false_negatives,category,avg_score
0,35--Basketball/35_Basketball_basketballgame_ba...,3,4,3,1,0,35,0.000000
1,3--Riot/3_Riot_Riot_3_634.jpg,4,8,4,4,0,3,0.708929
2,43--Row_Boat/43_Row_Boat_Canoe_43_400.jpg,2,3,2,1,0,43,0.734443
3,40--Gymnastics/40_Gymnastics_Gymnastics_40_439...,1,7,1,6,0,40,0.732838
4,16--Award_Ceremony/16_Award_Ceremony_Awards_Ce...,1,5,1,4,0,16,0.742258
...,...,...,...,...,...,...,...,...
639,35--Basketball/35_Basketball_basketballgame_ba...,8,2,2,0,6,35,0.000000
640,39--Ice_Skating/39_Ice_Skating_Ice_Skating_39_...,1,4,1,3,0,39,0.420409
641,5--Car_Accident/5_Car_Accident_Accident_5_332.jpg,1,0,0,0,1,5,0.000000
642,5--Car_Accident/5_Car_Accident_Accident_5_810.jpg,1,5,1,4,0,5,0.000000


## Drawing boxes example:

In [220]:
draw_test_true_comparison(yhat)