## Introduction
In this post, I wanted to explore Perimeter Defense as a market inefficiency in NBA sports betting as of recent years. The idea was inspired by Ethan Sherwood Strauss' posts in [Strauss vs. The House](https://theathletic.com/745690/2019/01/02/strauss-vs-the-house-a-2019-hiatus/) posts and podcasts where he argues that Perimeter Defense is a market inefficiency.

## Data Aggregation

To test his theory, I aggregated all the Westgate odds from the 2013-2014 regular season to the 2017-2018 regular season and checked when Westgate **incorrectly** predicted the winner of an individual game.

Next, I gave each team a regular season *Perimeter Defensive Score*. This value was calculated by looking at the Player Grades data from [bball-index](https://www.bball-index.com/). For example, the 2013-2014 player grades can be found [here](https://www.bball-index.com/2013-14-player-grades/) along with explanation of how values are calculated.

## Quantifying Team Perimeter Defense

A *team's* regular season *Perimeter Defensive Score* was calculated as the average of all players classified as a *wing* or *guard* on an individual team, weighted by minutes played.

For example, here are the calculations for the 2017-2018 Golden State Warriors:

&#x200B;

|Player|Minutes Played|Score|
|:-|:-|:-|
|Klay Thompson|3300|0.447|
|Kevin Durant|3132|0.351|
|Stephen Curry|2186|0.76|
|Andre Iguodala|2023|0.718|
|Nick Young|1598|0.5|
|Shaun Livingston|1491|0.419|
|Patrick McCaw|977|0.639|
|Quinn Cook|915|0.382|
|Omri Casspi|740|0.424|
|Chris Boucher|1|0.0|


Let's perform this computation for all players in the 2017-18 season, for example.

In [2]:
import csv
d = {}
YEAR = '2017_18'
with open(f'{YEAR}.csv', 'r') as f:
    reader = csv.DictReader(f)
    for row in reader:
        if row['TEAM'] not in d:
            d[row['TEAM']] = []
        if row['ADVANCED_POSITION']=='Wing' or row['ADVANCED_POSITION']=='Guard':
            d[row['TEAM']].append({'minutes': int(row['NUMDATA']), 'grade': float(row['PERIMETER_DEFENSE'])})
            
print(d)

{'CLE': [{'minutes': 3948, 'grade': 0.5429999999999999}, {'minutes': 2950, 'grade': 0.445}, {'minutes': 2079, 'grade': 0.524}, {'minutes': 1018, 'grade': 0.6759999999999999}, {'minutes': 734, 'grade': 0.7170000000000001}, {'minutes': 276, 'grade': 0.605}, {'minutes': 174, 'grade': 0.564}, {'minutes': 66, 'grade': 0.542}], 'GSW': [{'minutes': 3300, 'grade': 0.447}, {'minutes': 3132, 'grade': 0.35100000000000003}, {'minutes': 2186, 'grade': 0.76}, {'minutes': 2023, 'grade': 0.718}, {'minutes': 1598, 'grade': 0.5}, {'minutes': 1491, 'grade': 0.419}, {'minutes': 977, 'grade': 0.639}, {'minutes': 915, 'grade': 0.382}, {'minutes': 740, 'grade': 0.424}, {'minutes': 1, 'grade': 0.0}], 'NOP': [{'minutes': 3275, 'grade': 0.835}, {'minutes': 2870, 'grade': 0.439}, {'minutes': 2106, 'grade': 0.33899999999999997}, {'minutes': 2007, 'grade': 0.7559999999999999}, {'minutes': 1645, 'grade': 0.433}, {'minutes': 301, 'grade': 0.33799999999999997}, {'minutes': 273, 'grade': 0.583}, {'minutes': 68, 'grade

So the *defensive score* calculated for the 2017-18 GSW was:
```
    score = (3300*0.447+3132*0.35100000000000003+2186*0.76+2023*0.718+1598*0.5+1491*0.419+977*0.639+915*0.382+740*0.424+1*0.0)/16363 = 0.5133305628552222
```

Let's perform the same computation for *all* teams.

In [4]:
abbr = {}
with open('abbr.csv', 'r') as f:
    reader = csv.DictReader(f)
    for row in reader:
        key = ''.join(row['FULL_NAME'].split(' ')[:-1])
        abbr[key] = row['ABBR']
abbr['LALakers'] = 'LAL'
abbr['LAClippers'] = 'LAC'
abbr['Portland'] = 'POR'
print(abbr)

{'Atlanta': 'ATL', 'Brooklyn': 'BKN', 'Boston': 'BOS', 'Charlotte': 'CHA', 'Chicago': 'CHI', 'Cleveland': 'CLE', 'Dallas': 'DAL', 'Denver': 'DEN', 'Detroit': 'DET', 'GoldenState': 'GSW', 'Houston': 'HOU', 'Indiana': 'IND', 'LosAngeles': 'LAL', 'Memphis': 'MEM', 'Miami': 'MIA', 'Milwaukee': 'MIL', 'Minnesota': 'MIN', 'NewOrleans': 'NOP', 'NewYork': 'NYK', 'OklahomaCity': 'OKC', 'Orlando': 'ORL', 'Philadelphia': 'PHI', 'Phoenix': 'PHX', 'PortlandTrail': 'POR', 'Sacramento': 'SAC', 'SanAntonio': 'SAS', 'Toronto': 'TOR', 'Utah': 'UTA', 'Washington': 'WAS', 'LALakers': 'LAL', 'LAClippers': 'LAC', 'Portland': 'POR'}


In [5]:
d2 = {}
s = ''
for t in d:
    minutes = [e['minutes'] for e in d[t]]
    grades = [e['grade'] for e in d[t]]
    v = sum([minutes[i]*grades[i]/sum(minutes) for i in range(len(minutes))])
    d2[t] = v

if 'BKN' not in d2:
    d2['BKN'] = d2['BRK'] 
    
if 'PHX' not in d2:
    d2['PHX'] = d2['PHO']

if 'CHA' not in d2:
    d2['CHA'] = d2['CHO']
print(d2)

{'CLE': 0.5390168074699865, 'GSW': 0.5133305628552222, 'NOP': 0.5740940719606911, 'MIL': 0.5635191657271702, 'WAS': 0.6554382834390208, 'HOU': 0.6132173179118474, 'OKC': 0.674630264717503, 'MIN': 0.6258844447843378, 'BOS': 0.6228089082081968, 'PHI': 0.6389277889214628, 'POR': 0.5267293605624374, 'TOR': 0.6292524014336919, 'UTA': 0.6788820360683895, 'IND': 0.668143035465159, 'MIA': 0.568429169249845, 'CHA': 0.5336705177879535, 'SAS': 0.5954422674529198, 'DEN': 0.5239761555075593, 'TOT': 0.5705035499457647, 'DAL': 0.4517409798708697, 'DET': 0.632708410067526, 'LAC': 0.6419989164086688, 'ATL': 0.5618165103879389, 'LAL': 0.5261833304602654, 'MEM': 0.6142030394640086, 'NYK': 0.5721603765993403, 'BKN': 0.43546427986236275, 'CHI': 0.6694730009096169, 'SAC': 0.5789433333333334, 'PHX': 0.5236721281795963, 'ORL': 0.5711635006221484}


Let's look at the aggregated Westgate odds from the 2013-2014 regular season to the 2017-2018 regular season and checked when Westgate **incorrectly** predicted the winner of an individual game.

In [7]:
import datetime
data = {}
with open(f'odds_{YEAR}.csv', 'r', newline='\r') as f:
    reader = csv.DictReader(f)
    i = 0
    r1 = None
    r2 = None
    for row in reader:
        if r1 is None:
            r1 = row
        elif r2 is None:
            r2 = row
        if r1 is not None and r2 is not None:
            t1 = r1['Team']
            t2 = r2['Team']
            ml1 = float(r1['ML'])
            ml2 = float(r2['ML'])
            if ml1<0:
                exp_winner = t1
                exp_loser = t2
            else:
                exp_winner = t2
                exp_loser = t1
            if float(r1['Final'])>float(r2['Final']):
                winner = t1
                loser = t2
            else:
                winner = t2
                loser = t1
            if exp_winner!=winner:
                winner = winner.replace(' ','')
                loser = loser.replace(' ','')
                w_key = abbr[winner]
                l_key = abbr[loser]
                v = d2[w_key] - d2[l_key]
                date_str = r1['\ufeffDate']
                if len(date_str)==4:
                    month = int(date_str[0:2])
                    day = int(date_str[2:])
                    year = int(YEAR[0:4])
                else:
                    month = int(date_str[0:1])
                    day = int(date_str[1:])
                    year = int('20'+YEAR[4:])
                date = datetime.date(year, month, day)
                data[date] = {'winner': abbr[winner], 'loser': abbr[loser], 'winner_val': d2[w_key], 'loser_val': d2[l_key]}
            r1 = None
            r2 = None
print(data)

{datetime.date(2017, 10, 17): {'winner': 'HOU', 'loser': 'GSW', 'winner_val': 0.6132173179118474, 'loser_val': 0.5133305628552222}, datetime.date(2017, 10, 18): {'winner': 'ATL', 'loser': 'DAL', 'winner_val': 0.5618165103879389, 'loser_val': 0.4517409798708697}, datetime.date(2017, 10, 20): {'winner': 'LAL', 'loser': 'PHX', 'winner_val': 0.5261833304602654, 'loser_val': 0.5236721281795963}, datetime.date(2017, 10, 21): {'winner': 'UTA', 'loser': 'OKC', 'winner_val': 0.6788820360683895, 'loser_val': 0.674630264717503}, datetime.date(2017, 10, 22): {'winner': 'MIN', 'loser': 'OKC', 'winner_val': 0.6258844447843378, 'loser_val': 0.674630264717503}, datetime.date(2017, 10, 23): {'winner': 'PHX', 'loser': 'SAC', 'winner_val': 0.5236721281795963, 'loser_val': 0.5789433333333334}, datetime.date(2017, 10, 24): {'winner': 'IND', 'loser': 'MIN', 'winner_val': 0.668143035465159, 'loser_val': 0.6258844447843378}, datetime.date(2017, 10, 25): {'winner': 'LAL', 'loser': 'WAS', 'winner_val': 0.526183

Let's save this data for the future.

In [9]:
with open(f'perimeter_defense_{YEAR}.csv', 'w') as f:
    f.write('date,winner,loser,winner_score,loser_score\n')
    for d in data:
        date = d.strftime('%Y-%m-%d')
        f.write(f"{date},{data[d]['winner']},{data[d]['loser']},{data[d]['winner_val']},{data[d]['loser_val']}\n")

## Visualization

I graphed the difference between the winning team's score and the losing team's score for each year and calculated the number of *positive* differences and *negative* differences. *Positive* differences indicate that the winning team's score was greater than the losing team's score, even though Westgate predicted the losing team would win. Hence, a greater number of *positive* differences indicates that Perimeter Defense is a market inefficiency. Obviously, the same applies for *negative* differences.

In [10]:
from bokeh.plotting import figure, output_notebook, show
from bokeh.io import export_png
from bokeh.models import Label, Title
import math

output_notebook()
def draw_chart(data, fname):
    x = []
    y = []
    seg1 = []
    seg2 = []
    colors = []
    aboveName = []
    belowName = []
    for d in data:
        date = d.strftime('%m/%d/%y')
        x.append(f"{date} ({data[d]['winner']} vs. {data[d]['loser']})")
        if data[d]['winner_val']>data[d]['loser_val']:
            colors.append('green')
            aboveName.append(data[d]['winner'])
            belowName.append(data[d]['loser'])
            seg1.append(data[d]['winner_val'])
            seg2.append(data[d]['loser_val'])
        else:
            colors.append('red')
            aboveName.append(data[d]['loser'])
            belowName.append(data[d]['winner'])
            seg2.append(data[d]['winner_val'])
            seg1.append(data[d]['loser_val'])


    p = figure(x_range=x, plot_width=800, plot_height=400, title='')
    p.xaxis.major_label_orientation = math.pi/2
    p.xaxis.axis_label = 'Westgate Incorrectly Predicted Game'
    p.yaxis.axis_label = 'Average Perimeter Defensive Score'
    for s in range(0, len(seg1)):
        p.segment(y0=seg1[s], y1=seg1[s], x0=s+0.25, x1=s+0.75, color=colors[s])
    for s in range(0, len(seg2)):
        p.segment(y0=seg2[s], y1=seg2[s], x0=s+0.25, x1=s+0.75, color=colors[s])
    for s in range(0, len(seg2)):
        p.segment(y0=seg1[s], y1=seg2[s], x0=s+0.5, x1=s+0.5, color=colors[s])
    for s in range(0, len(aboveName)):
        l = Label(x=s+0.25, y=seg1[s], text=aboveName[s])
        p.add_layout(l)
    for s in range(0, len(belowName)):
        l = Label(x=s+0.25, y=seg2[s]-0.02, text=belowName[s])
        p.add_layout(l)
    export_png(p, filename=f"{YEAR}_{fname}.png")
    show(p)
d2 = {}
fname = 1
for d in data:
    d2[d] = data[d]
    if len(d2.keys())>10:
        draw_chart(d2, fname)
        fname+=1
        d2 = {}

In [11]:
def draw_chart2(data):
    x = []
    y = []
    seg = []
    pos_count = 0
    neg_count = 0
    for d in data:
        date = d.strftime('%m/%d/%y')
        x.append(f"{date} ({data[d]['winner']} vs. {data[d]['loser']})")
        seg.append(data[d]['winner_val']-data[d]['loser_val'])
        if seg[-1]>0:
            pos_count+=1
        else:
            neg_count+=1
    p = figure(x_range=x, plot_width=800, plot_height=400, title=f'Difference in Perimeter Defense in Incorrectly Predicted Westgate {YEAR} NBA Regular Season Games')
    p.xaxis.major_label_text_font_size = '0pt'
    p.xaxis.major_tick_line_color = None  # turn off x-axis major ticks
    p.xaxis.minor_tick_line_color = None  # turn off x-axis minor ticks
    p.xaxis.major_label_text_color = None  # turn off x-axis tick labels leaving space
    p.xaxis.major_label_orientation = math.pi/2
    p.xaxis.axis_label = 'Westgate Incorrectly Predicted Games'
    p.yaxis.axis_label = 'Difference in Average Perimeter Defensive Score'
    for s in range(0, len(seg)):
        if seg[s]<0:
            color = 'red'
        else:
            color = 'green'
        p.segment(y0=0, y1=seg[s], x0=s+0.5, x1=s+0.5, color=color)
    p.add_layout(Title(text=f"Positive Differences: {pos_count}, Negative Differences {neg_count}", align="center"), "below")
    export_png(p, filename=f"{YEAR}.png")
    show(p)
    
draw_chart2(data)