# NBA Isolation Forests

I was curious how outlier detection methods from machine learning would do when applied to NBA rate stats using data from Basketball Reference.

The method we're going to use is called an isolation forest. It measures how easy a point is to separate from the underlying distribution. Points that are easier to separate are more likely to be outliers. Our training set is going to be the 2013 to 2016 seasons and we're going to test on the 2017 data.

In [1]:
from requests import get
from bs4 import BeautifulSoup
import numpy as np
import pandas as pd
from sklearn.ensemble import IsolationForest

In [2]:
columns = [
    'Player','Pos','Age','Tm','G','GS','MP',
    'FG','FGA','FG%','3P','3PA','3P%','2P','2PA','2P%',
    'FT','FTA','FT%','ORB','DRB','TRB','AST','STL','BLK',
    'TOV','PF','PTS','Year'
]

In [3]:
per36 = []
for year in range(2013, 2018):
    soup = BeautifulSoup(get("http://www.basketball-reference.com/leagues/NBA_{}_per_minute.html".format(year)).text,'lxml')
    data = map(lambda x: x+[year], filter(lambda x: bool(x), [[i.text for i in row.findAll('td')][:28] for row in soup.findAll('tr')][1:]))
    per36 += data

In [4]:
per100 = []
for year in range(2013, 2018):
    soup = BeautifulSoup(get("http://www.basketball-reference.com/leagues/NBA_{}_per_poss.html".format(year)).text,'lxml')
    data = map(lambda x: x+[year], filter(lambda x: bool(x), [[i.text for i in row.findAll('td')][:28] for row in soup.findAll('tr')][1:]))
    per100 += data

In [5]:
pergame = []
for year in range(2013, 2018):
    soup = BeautifulSoup(get("http://www.basketball-reference.com/leagues/NBA_{}_per_game.html".format(year)).text,'lxml')
    data = map(lambda x: x+[year], filter(lambda x: bool(x), [[i.text for i in row.findAll('td')][:16] + [i.text for i in row.findAll('td')][17:29] for row in soup.findAll('tr')][1:]))
    pergame += data

In [6]:
pd36 = pd.DataFrame(per36, columns = columns).convert_objects(convert_numeric=True)
pd100 = pd.DataFrame(per100, columns = columns).convert_objects(convert_numeric=True)
pdgame = pd.DataFrame(pergame, columns = columns).convert_objects(convert_numeric=True)

  if __name__ == '__main__':
  from ipykernel import kernelapp as app
  app.launch_new_instance()


In [41]:
train36 = pd36[(pd36['Year'] <= 2016) & (pd36['G']>=10)]
del train36['Player'], train36['Pos'], train36['Tm'], train36['G'], train36['GS']
del train36['MP'], train36['Year']
test36 = pd36[(pd36['Year'] > 2016) & (pd36['G']>=10)]
del test36['Player'], test36['Pos'], test36['Tm'], test36['G'], test36['GS'], test36['MP'], test36['Year']
test36.head()

train100 = pd100[(pd100['Year'] <= 2016) & (pd100['G']>=10)]
del train100['Player'], train100['Pos'], train100['Tm'], train100['G']
del train100['GS'], train100['MP'], train100['Year']
test100 = pd100[(pd100['Year'] > 2016) & (pd100['G']>=10)]
del test100['Player'], test100['Pos'], test100['Tm']
del test100['G'], test100['GS'], test100['MP'], test100['Year']
test100.head()

traingame = pdgame[(pdgame['Year'] <= 2016) & (pdgame['G']>=10)]
del traingame['Player'], traingame['Pos'], traingame['Tm'], traingame['G']
del traingame['GS'], traingame['MP'], traingame['Year']
testgame = pdgame[(pdgame['Year'] > 2016) & (pdgame['G']>=10)]
del testgame['Player'], testgame['Pos'], testgame['Tm']
del testgame['G'], testgame['GS'], testgame['MP'], testgame['Year']
testgame.head()

Unnamed: 0,Age,FG,FGA,FG%,3P,3PA,3P%,2P,2PA,2P%,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
2413,23,1.8,4.6,0.379,1.1,3.4,0.333,0.6,1.3,0.5,...,0.875,0.2,0.8,1.1,0.4,0.3,0.1,0.6,1.2,5.2
2415,23,4.8,8.1,0.596,0.0,0.0,0.0,4.8,8.1,0.598,...,0.723,3.0,4.6,7.5,0.9,1.3,0.9,2.0,2.4,12.0
2416,31,2.6,6.4,0.407,0.7,2.0,0.321,2.0,4.4,0.447,...,0.897,0.2,1.9,2.2,1.1,0.3,0.0,0.3,1.7,7.2
2417,28,2.0,4.3,0.465,0.0,0.2,0.0,2.0,4.2,0.482,...,0.714,1.1,3.4,4.5,0.4,0.4,0.8,0.8,1.8,4.5
2418,28,1.1,2.0,0.544,0.0,0.0,,1.1,2.0,0.544,...,0.7,1.3,2.4,3.7,0.5,0.6,0.6,0.3,1.8,2.7


The isolation forest has a tunable parameter called contamination, it's the fraction of outliers in the training set. Smaller values correspond to fewer outliers, larger corresponds to more outliers

With a small value, we find that Westbrook is the most outlier-y player

In [42]:
g36 = IsolationForest(contamination=0.001, bootstrap = True, n_jobs = -1).fit(train36.fillna(0))
g100 = IsolationForest(contamination=0.001, bootstrap = True, n_jobs = -1).fit(train100.fillna(0))
gg = IsolationForest(contamination=0.001, bootstrap = True, n_jobs = -1).fit(traingame.fillna(0))

In [43]:
pd36[(pd36['Year'] > 2016) & (pd36['G']>=10)][[True if i < 0 else False for i in g36.predict(test36.fillna(0))]]

Unnamed: 0,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,...,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,Year
2506,Anthony Davis,PF,23,NOP,32,32,1192,10.1,20.4,0.495,...,2.1,9.0,11.1,2.1,1.4,2.6,2.4,2.3,28.4,2017
2529,Joel Embiid,C,22,PHI,21,21,518,9.2,19.6,0.468,...,2.8,8.0,10.8,2.5,1.0,3.5,5.4,5.0,27.3,2017
2680,Thon Maker,PF,19,MIL,12,0,48,6.8,12.0,0.563,...,2.3,11.3,13.5,0.0,0.0,2.3,1.5,5.2,20.3,2017
2837,Russell Westbrook,PG,28,OKC,32,32,1123,10.7,24.8,0.433,...,2.0,8.9,10.9,11.2,1.3,0.3,5.6,2.4,32.5,2017


In [44]:
pd100[(pd100['Year'] > 2016) & (pd100['G']>=10)][[True if i < 0 else False for i in g100.predict(test100.fillna(0))]]

Unnamed: 0,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,...,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,Year
2837,Russell Westbrook,PG,28,OKC,32,32,1123,14.6,33.8,0.433,...,2.8,12.1,14.9,15.3,1.7,0.4,7.6,3.2,44.3,2017


In [45]:
pdgame[(pdgame['Year'] > 2016) & (pdgame['G']>=10)][[True if i < 0 else False for i in gg.predict(testgame.fillna(0))]]

Unnamed: 0,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,...,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,Year
2497,DeMarcus Cousins,C,26,SAC,31,31,34.7,9.9,21.6,0.457,...,2.1,8.2,10.4,3.5,1.3,1.5,3.1,3.7,29.1,2017
2837,Russell Westbrook,PG,28,OKC,32,32,35.1,10.5,24.2,0.433,...,2.0,8.7,10.6,10.9,1.3,0.3,5.5,2.3,31.7,2017


With a slightly larger value we find more outliers

In [46]:
g36 = IsolationForest(contamination=0.01, bootstrap = True, n_jobs = -1).fit(train36.fillna(0))
g100 = IsolationForest(contamination=0.01, bootstrap = True, n_jobs = -1).fit(train100.fillna(0))
gg = IsolationForest(contamination=0.01, bootstrap = True, n_jobs = -1).fit(traingame.fillna(0))

In [47]:
pd36[(pd36['Year'] > 2016) & (pd36['G']>=10)][[True if i < 0 else False for i in g36.predict(test36.fillna(0))]]

Unnamed: 0,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,...,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,Year
2497,DeMarcus Cousins,C,26,SAC,31,31,1076,10.2,22.4,0.457,...,2.2,8.5,10.7,3.7,1.3,1.5,3.2,3.9,30.1,2017
2506,Anthony Davis,PF,23,NOP,32,32,1192,10.1,20.4,0.495,...,2.1,9.0,11.1,2.1,1.4,2.6,2.4,2.3,28.4,2017
2526,Henry Ellenson,PF,20,DET,10,0,37,3.9,13.6,0.286,...,1.0,6.8,7.8,1.0,0.0,0.0,2.9,0.0,10.7,2017
2529,Joel Embiid,C,22,PHI,21,21,518,9.2,19.6,0.468,...,2.8,8.0,10.8,2.5,1.0,3.5,5.4,5.0,27.3,2017
2577,James Harden,PG,27,HOU,33,33,1198,8.0,18.1,0.443,...,1.3,6.4,7.7,11.8,1.3,0.3,5.5,2.6,27.5,2017
2680,Thon Maker,PF,19,MIL,12,0,48,6.8,12.0,0.563,...,2.3,11.3,13.5,0.0,0.0,2.3,1.5,5.2,20.3,2017
2693,JaVale McGee,C,29,GSW,29,2,241,10.5,16.9,0.619,...,3.9,7.0,10.9,1.2,0.9,2.7,3.0,5.5,24.2,2017
2757,Tim Quarterman,SG,22,POR,11,0,37,6.8,7.8,0.875,...,1.9,2.9,4.9,4.9,1.0,1.9,2.9,2.9,15.6,2017
2837,Russell Westbrook,PG,28,OKC,32,32,1123,10.7,24.8,0.433,...,2.0,8.9,10.9,11.2,1.3,0.3,5.6,2.4,32.5,2017


In [48]:
pd100[(pd100['Year'] > 2016) & (pd100['G']>=10)][[True if i < 0 else False for i in g100.predict(test100.fillna(0))]]

Unnamed: 0,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,...,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,Year
2497,DeMarcus Cousins,C,26,SAC,31,31,1076,14.4,31.6,0.457,...,3.1,12.0,15.1,5.2,1.9,2.2,4.5,5.5,42.5,2017
2506,Anthony Davis,PF,23,NOP,32,32,1192,13.8,27.8,0.495,...,2.9,12.3,15.2,2.8,1.9,3.5,3.3,3.1,38.7,2017
2526,Henry Ellenson,PF,20,DET,10,0,37,5.5,19.4,0.286,...,1.4,9.7,11.1,1.4,0.0,0.0,4.1,0.0,15.2,2017
2529,Joel Embiid,C,22,PHI,21,21,518,12.6,27.0,0.468,...,3.8,11.0,14.8,3.4,1.4,4.8,7.5,6.9,37.6,2017
2577,James Harden,PG,27,HOU,33,33,1198,10.9,24.6,0.443,...,1.8,8.7,10.5,16.0,1.8,0.4,7.5,3.5,37.4,2017
2680,Thon Maker,PF,19,MIL,12,0,48,9.4,16.7,0.563,...,3.1,15.6,18.8,0.0,0.0,3.1,2.1,7.3,28.1,2017
2757,Tim Quarterman,SG,22,POR,11,0,37,9.3,10.6,0.875,...,2.7,4.0,6.7,6.7,1.3,2.7,4.0,4.0,21.3,2017
2837,Russell Westbrook,PG,28,OKC,32,32,1123,14.6,33.8,0.433,...,2.8,12.1,14.9,15.3,1.7,0.4,7.6,3.2,44.3,2017


In [49]:
pdgame[(pdgame['Year'] > 2016) & (pdgame['G']>=10)][[True if i < 0 else False for i in gg.predict(testgame.fillna(0))]]

Unnamed: 0,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,...,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,Year
2428,Giannis Antetokounmpo,SG,22,MIL,30,30,34.9,8.3,15.9,0.523,...,1.9,7.1,9.0,5.9,2.0,1.8,3.0,3.6,23.4,2017
2497,DeMarcus Cousins,C,26,SAC,31,31,34.7,9.9,21.6,0.457,...,2.1,8.2,10.4,3.5,1.3,1.5,3.1,3.7,29.1,2017
2506,Anthony Davis,PF,23,NOP,32,32,37.3,10.4,21.1,0.495,...,2.2,9.3,11.5,2.2,1.5,2.7,2.5,2.3,29.3,2017
2514,DeMar DeRozan,SG,27,TOR,31,31,35.4,9.9,21.0,0.474,...,1.0,4.3,5.3,4.1,1.4,0.1,2.5,1.8,27.5,2017
2525,Kevin Durant,SF,28,GSW,33,33,34.2,9.2,17.1,0.538,...,0.6,8.2,8.8,4.6,1.2,1.6,2.2,2.2,26.1,2017
2577,James Harden,PG,27,HOU,33,33,36.3,8.1,18.3,0.443,...,1.3,6.5,7.8,11.9,1.3,0.3,5.5,2.6,27.7,2017
2608,Dwight Howard,C,31,ATL,28,28,29.3,5.8,8.9,0.647,...,4.8,8.5,13.3,1.3,1.0,1.4,2.5,3.2,14.5,2017
2623,LeBron James,SF,32,CLE,27,27,36.9,9.4,18.4,0.513,...,1.4,6.6,7.9,8.6,1.4,0.5,3.8,1.6,25.5,2017
2837,Russell Westbrook,PG,28,OKC,32,32,35.1,10.5,24.2,0.433,...,2.0,8.7,10.6,10.9,1.3,0.3,5.5,2.3,31.7,2017


In [50]:
g36 = IsolationForest(contamination=0.015).fit(train36.fillna(0))
g100 = IsolationForest(contamination=0.015).fit(train100.fillna(0))
gg = IsolationForest(contamination=0.015).fit(traingame.fillna(0))

In [51]:
pd36[(pd36['Year'] > 2016) & (pd36['G']>=10)][[True if i < 0 else False for i in g36.predict(test36.fillna(0))]]

Unnamed: 0,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,...,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,Year
2497,DeMarcus Cousins,C,26,SAC,31,31,1076,10.2,22.4,0.457,...,2.2,8.5,10.7,3.7,1.3,1.5,3.2,3.9,30.1,2017
2506,Anthony Davis,PF,23,NOP,32,32,1192,10.1,20.4,0.495,...,2.1,9.0,11.1,2.1,1.4,2.6,2.4,2.3,28.4,2017
2514,DeMar DeRozan,SG,27,TOR,31,31,1098,10.1,21.3,0.474,...,1.0,4.3,5.3,4.2,1.4,0.1,2.6,1.9,28.0,2017
2525,Kevin Durant,SF,28,GSW,33,33,1127,9.7,18.0,0.538,...,0.6,8.7,9.3,4.9,1.2,1.7,2.4,2.3,27.5,2017
2526,Henry Ellenson,PF,20,DET,10,0,37,3.9,13.6,0.286,...,1.0,6.8,7.8,1.0,0.0,0.0,2.9,0.0,10.7,2017
2529,Joel Embiid,C,22,PHI,21,21,518,9.2,19.6,0.468,...,2.8,8.0,10.8,2.5,1.0,3.5,5.4,5.0,27.3,2017
2577,James Harden,PG,27,HOU,33,33,1198,8.0,18.1,0.443,...,1.3,6.4,7.7,11.8,1.3,0.3,5.5,2.6,27.5,2017
2638,James Jones,SF,36,CLE,18,0,126,4.9,9.1,0.531,...,0.3,2.0,2.3,0.9,0.9,1.4,1.1,3.1,15.1,2017
2680,Thon Maker,PF,19,MIL,12,0,48,6.8,12.0,0.563,...,2.3,11.3,13.5,0.0,0.0,2.3,1.5,5.2,20.3,2017
2693,JaVale McGee,C,29,GSW,29,2,241,10.5,16.9,0.619,...,3.9,7.0,10.9,1.2,0.9,2.7,3.0,5.5,24.2,2017


In [52]:
pd100[(pd100['Year'] > 2016) & (pd100['G']>=10)][[True if i < 0 else False for i in g100.predict(test100.fillna(0))]]

Unnamed: 0,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,...,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,Year
2497,DeMarcus Cousins,C,26,SAC,31,31,1076,14.4,31.6,0.457,...,3.1,12.0,15.1,5.2,1.9,2.2,4.5,5.5,42.5,2017
2506,Anthony Davis,PF,23,NOP,32,32,1192,13.8,27.8,0.495,...,2.9,12.3,15.2,2.8,1.9,3.5,3.3,3.1,38.7,2017
2514,DeMar DeRozan,SG,27,TOR,31,31,1098,14.1,29.8,0.474,...,1.4,6.1,7.5,5.9,2.0,0.1,3.6,2.6,39.2,2017
2526,Henry Ellenson,PF,20,DET,10,0,37,5.5,19.4,0.286,...,1.4,9.7,11.1,1.4,0.0,0.0,4.1,0.0,15.2,2017
2529,Joel Embiid,C,22,PHI,21,21,518,12.6,27.0,0.468,...,3.8,11.0,14.8,3.4,1.4,4.8,7.5,6.9,37.6,2017
2575,A.J. Hammons,C,24,DAL,16,0,70,6.0,15.8,0.381,...,3.8,11.3,15.0,1.5,0.0,4.5,2.3,7.5,16.5,2017
2680,Thon Maker,PF,19,MIL,12,0,48,9.4,16.7,0.563,...,3.1,15.6,18.8,0.0,0.0,3.1,2.1,7.3,28.1,2017
2757,Tim Quarterman,SG,22,POR,11,0,37,9.3,10.6,0.875,...,2.7,4.0,6.7,6.7,1.3,2.7,4.0,4.0,21.3,2017
2759,Zach Randolph,PF,35,MEM,27,1,606,12.9,29.2,0.442,...,5.0,12.8,17.8,3.9,1.2,0.4,2.6,4.4,30.0,2017
2837,Russell Westbrook,PG,28,OKC,32,32,1123,14.6,33.8,0.433,...,2.8,12.1,14.9,15.3,1.7,0.4,7.6,3.2,44.3,2017


In [53]:
pdgame[(pdgame['Year'] > 2016) & (pdgame['G']>=10)][[True if i < 0 else False for i in gg.predict(testgame.fillna(0))]]

Unnamed: 0,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,...,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,Year
2428,Giannis Antetokounmpo,SG,22,MIL,30,30,34.9,8.3,15.9,0.523,...,1.9,7.1,9.0,5.9,2.0,1.8,3.0,3.6,23.4,2017
2497,DeMarcus Cousins,C,26,SAC,31,31,34.7,9.9,21.6,0.457,...,2.1,8.2,10.4,3.5,1.3,1.5,3.1,3.7,29.1,2017
2506,Anthony Davis,PF,23,NOP,32,32,37.3,10.4,21.1,0.495,...,2.2,9.3,11.5,2.2,1.5,2.7,2.5,2.3,29.3,2017
2514,DeMar DeRozan,SG,27,TOR,31,31,35.4,9.9,21.0,0.474,...,1.0,4.3,5.3,4.1,1.4,0.1,2.5,1.8,27.5,2017
2525,Kevin Durant,SF,28,GSW,33,33,34.2,9.2,17.1,0.538,...,0.6,8.2,8.8,4.6,1.2,1.6,2.2,2.2,26.1,2017
2577,James Harden,PG,27,HOU,33,33,36.3,8.1,18.3,0.443,...,1.3,6.5,7.8,11.9,1.3,0.3,5.5,2.6,27.7,2017
2608,Dwight Howard,C,31,ATL,28,28,29.3,5.8,8.9,0.647,...,4.8,8.5,13.3,1.3,1.0,1.4,2.5,3.2,14.5,2017
2623,LeBron James,SF,32,CLE,27,27,36.9,9.4,18.4,0.513,...,1.4,6.6,7.9,8.6,1.4,0.5,3.8,1.6,25.5,2017
2673,Kevin Love,PF,28,CLE,27,27,31.7,6.9,15.0,0.458,...,2.4,8.3,10.8,1.7,1.1,0.5,2.0,1.9,21.7,2017
2807,Isaiah Thomas,PG,27,BOS,28,28,33.3,8.3,18.9,0.439,...,0.5,2.0,2.5,6.3,0.8,0.1,2.2,2.3,26.8,2017


In [54]:
def ranked(train, test, show):
    a = rankings(train, test, c0 = 0.02)
    s = show.reset_index().ix[[i[0] for i in a]]
    s['Rank'] = [i[1] for i in a]
    del s['index']
    del s['Year']
    return s

def ranking(train, test, c0 = 0.015):
    model = IsolationForest(contamination = c0, random_state=42, n_jobs=-1, bootstrap=True).fit(train.fillna(0))
    outliers = [True if i < 0 else False for i in model.predict(test.fillna(0))]
    n_outliers = sum(1 for i in outliers if i)
    return n_outliers, np.argsort(outliers)[::-1][:n_outliers]

def rankings(train, test, c0 = 0.015):
    counts, i, players, rankings = [], 1, set(), []
    while c0 > 0:
        c, d = ranking(train, test, c0 = c0)
        if not c:
            break
        counts.append(d)
        c0 -= 0.001
    counts.reverse()
    for l in counts:
        for player in l:
            if player not in players:
                rankings.append((player, i))
                players.add(player)
        i = len(players) +1
    return sorted(rankings, key = lambda (i,j): (j, i))

In [55]:
a = rankings(train36, test36, c0 = 0.02)
b = rankings(train100, test100, c0 = 0.02)
c = rankings(traingame, testgame, c0 = 0.02)

In [56]:
pd36[(pd36['Year'] > 2016) & (pd36['G']>=10)].reset_index().ix[[i[0] for i in a]]

Unnamed: 0,index,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,...,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,Year
102,2529,Joel Embiid,C,22,PHI,21,21,518,9.2,19.6,...,2.8,8.0,10.8,2.5,1.0,3.5,5.4,5.0,27.3,2017
368,2837,Russell Westbrook,PG,28,OKC,32,32,1123,10.7,24.8,...,2.0,8.9,10.9,11.2,1.3,0.3,5.6,2.4,32.5,2017
82,2506,Anthony Davis,PF,23,NOP,32,32,1192,10.1,20.4,...,2.1,9.0,11.1,2.1,1.4,2.6,2.4,2.3,28.4,2017
73,2497,DeMarcus Cousins,C,26,SAC,31,31,1076,10.2,22.4,...,2.2,8.5,10.7,3.7,1.3,1.5,3.2,3.9,30.1,2017
245,2693,JaVale McGee,C,29,GSW,29,2,241,10.5,16.9,...,3.9,7.0,10.9,1.2,0.9,2.7,3.0,5.5,24.2,2017
99,2526,Henry Ellenson,PF,20,DET,10,0,37,3.9,13.6,...,1.0,6.8,7.8,1.0,0.0,0.0,2.9,0.0,10.7,2017
98,2525,Kevin Durant,SF,28,GSW,33,33,1127,9.7,18.0,...,0.6,8.7,9.3,4.9,1.2,1.7,2.4,2.3,27.5,2017
144,2577,James Harden,PG,27,HOU,33,33,1198,8.0,18.1,...,1.3,6.4,7.7,11.8,1.3,0.3,5.5,2.6,27.5,2017
298,2757,Tim Quarterman,SG,22,POR,11,0,37,6.8,7.8,...,1.9,2.9,4.9,4.9,1.0,1.9,2.9,2.9,15.6,2017
232,2680,Thon Maker,PF,19,MIL,12,0,48,6.8,12.0,...,2.3,11.3,13.5,0.0,0.0,2.3,1.5,5.2,20.3,2017


In [57]:
pd100[(pd100['Year'] > 2016) & (pd100['G']>=10)].reset_index().ix[[i[0] for i in b]]

Unnamed: 0,index,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,...,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,Year
368,2837,Russell Westbrook,PG,28,OKC,32,32,1123,14.6,33.8,...,2.8,12.1,14.9,15.3,1.7,0.4,7.6,3.2,44.3,2017
102,2529,Joel Embiid,C,22,PHI,21,21,518,12.6,27.0,...,3.8,11.0,14.8,3.4,1.4,4.8,7.5,6.9,37.6,2017
99,2526,Henry Ellenson,PF,20,DET,10,0,37,5.5,19.4,...,1.4,9.7,11.1,1.4,0.0,0.0,4.1,0.0,15.2,2017
144,2577,James Harden,PG,27,HOU,33,33,1198,10.9,24.6,...,1.8,8.7,10.5,16.0,1.8,0.4,7.5,3.5,37.4,2017
45,2461,Andrew Bogut,C,32,DAL,17,17,410,3.7,7.8,...,5.5,16.6,22.1,4.7,1.3,2.1,3.9,7.4,7.8,2017
73,2497,DeMarcus Cousins,C,26,SAC,31,31,1076,14.4,31.6,...,3.1,12.0,15.1,5.2,1.9,2.2,4.5,5.5,42.5,2017
298,2757,Tim Quarterman,SG,22,POR,11,0,37,9.3,10.6,...,2.7,4.0,6.7,6.7,1.3,2.7,4.0,4.0,21.3,2017
82,2506,Anthony Davis,PF,23,NOP,32,32,1192,13.8,27.8,...,2.9,12.3,15.2,2.8,1.9,3.5,3.3,3.1,38.7,2017
232,2680,Thon Maker,PF,19,MIL,12,0,48,9.4,16.7,...,3.1,15.6,18.8,0.0,0.0,3.1,2.1,7.3,28.1,2017
251,2699,Salah Mejri,C,30,DAL,28,11,397,4.2,6.5,...,4.8,11.8,16.6,1.2,2.3,3.7,2.3,8.9,9.8,2017


In [58]:
pdgame[(pdgame['Year'] > 2016) & (pdgame['G']>=10)].reset_index().ix[[i[0] for i in c]]

Unnamed: 0,index,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,...,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,Year
82,2506,Anthony Davis,PF,23,NOP,32,32,37.3,10.4,21.1,...,2.2,9.3,11.5,2.2,1.5,2.7,2.5,2.3,29.3,2017
368,2837,Russell Westbrook,PG,28,OKC,32,32,35.1,10.5,24.2,...,2.0,8.7,10.6,10.9,1.3,0.3,5.5,2.3,31.7,2017
144,2577,James Harden,PG,27,HOU,33,33,36.3,8.1,18.3,...,1.3,6.5,7.8,11.9,1.3,0.3,5.5,2.6,27.7,2017
73,2497,DeMarcus Cousins,C,26,SAC,31,31,34.7,9.9,21.6,...,2.1,8.2,10.4,3.5,1.3,1.5,3.1,3.7,29.1,2017
14,2428,Giannis Antetokounmpo,SG,22,MIL,30,30,34.9,8.3,15.9,...,1.9,7.1,9.0,5.9,2.0,1.8,3.0,3.6,23.4,2017
182,2623,LeBron James,SF,32,CLE,27,27,36.9,9.4,18.4,...,1.4,6.6,7.9,8.6,1.4,0.5,3.8,1.6,25.5,2017
348,2814,Karl-Anthony Towns,C,21,MIN,32,32,35.2,8.4,17.4,...,3.4,7.8,11.3,2.8,0.6,1.4,2.6,3.0,22.0,2017
370,2839,Hassan Whiteside,C,27,MIA,32,32,33.9,7.4,13.6,...,4.2,10.4,14.7,0.7,0.8,2.3,2.0,3.3,17.8,2017
98,2525,Kevin Durant,SF,28,GSW,33,33,34.2,9.2,17.1,...,0.6,8.2,8.8,4.6,1.2,1.6,2.2,2.2,26.1,2017
171,2608,Dwight Howard,C,31,ATL,28,28,29.3,5.8,8.9,...,4.8,8.5,13.3,1.3,1.0,1.4,2.5,3.2,14.5,2017


In [76]:
w = ranked(train100, test100, pd100[(pd100['Year'] > 2016) & (pd100['G']>=10)])

In [68]:
w.columns

Index([u'Player', u'Pos', u'Age', u'Tm', u'G', u'GS', u'MP', u'FG', u'FGA',
       u'FG%', u'3P', u'3PA', u'3P%', u'2P', u'2PA', u'2P%', u'FT', u'FTA',
       u'FT%', u'ORB', u'DRB', u'TRB', u'AST', u'STL', u'BLK', u'TOV', u'PF',
       u'PTS', u'Rank'],
      dtype='object')

In [77]:
w[['Rank', u'Player', u'Pos', u'Age', u'Tm', u'G', u'GS', u'MP', 'PTS', # u'2P%', u'3P%', 'FT%',
    u'3P', u'3PA', u'2P', u'2PA', u'FT', u'FTA',
       u'ORB', u'DRB', u'TRB', u'AST', u'STL', u'BLK', u'TOV', u'PF']][w['Rank']<=10].fillna(0)

Unnamed: 0,Rank,Player,Pos,Age,Tm,G,GS,MP,PTS,3P,...,FT,FTA,ORB,DRB,TRB,AST,STL,BLK,TOV,PF
368,1,Russell Westbrook,PG,28,OKC,32,32,1123,44.3,2.6,...,12.4,15.1,2.8,12.1,14.9,15.3,1.7,0.4,7.6,3.2
102,2,Joel Embiid,C,22,PHI,21,21,518,37.6,2.4,...,9.9,13.0,3.8,11.0,14.8,3.4,1.4,4.8,7.5,6.9
99,3,Henry Ellenson,PF,20,DET,10,0,37,15.2,4.1,...,0.0,0.0,1.4,9.7,11.1,1.4,0.0,0.0,4.1,0.0
144,4,James Harden,PG,27,HOU,33,33,1198,37.4,4.0,...,11.5,13.7,1.8,8.7,10.5,16.0,1.8,0.4,7.5,3.5
45,5,Andrew Bogut,C,32,DAL,17,17,410,7.8,0.0,...,0.4,1.4,5.5,16.6,22.1,4.7,1.3,2.1,3.9,7.4
73,5,DeMarcus Cousins,C,26,SAC,31,31,1076,42.5,2.7,...,10.9,14.1,3.1,12.0,15.1,5.2,1.9,2.2,4.5,5.5
298,5,Tim Quarterman,SG,22,POR,11,0,37,21.3,2.7,...,0.0,0.0,2.7,4.0,6.7,6.7,1.3,2.7,4.0,4.0
82,8,Anthony Davis,PF,23,NOP,32,32,1192,38.7,0.8,...,10.4,13.0,2.9,12.3,15.2,2.8,1.9,3.5,3.3,3.1
232,9,Thon Maker,PF,19,MIL,12,0,48,28.1,3.1,...,6.3,10.4,3.1,15.6,18.8,0.0,0.0,3.1,2.1,7.3
251,9,Salah Mejri,C,30,DAL,28,11,397,9.8,0.0,...,1.3,2.4,4.8,11.8,16.6,1.2,2.3,3.7,2.3,8.9


In [39]:
def Reddit_Print(df):
    col = df.columns
    print ' | '.join(col)
    print ' | '.join(['---' for _ in col])
    for row in df.as_matrix():
        print ' | '.join(map(str, row))

In [78]:
Reddit_Print(Out[77])

Rank | Player | Pos | Age | Tm | G | GS | MP | PTS | 3P | 3PA | 2P | 2PA | FT | FTA | ORB | DRB | TRB | AST | STL | BLK | TOV | PF
--- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | ---
1 | Russell Westbrook | PG | 28 | OKC | 32 | 32 | 1123 | 44.3 | 2.6 | 7.7 | 12.1 | 26.1 | 12.4 | 15.1 | 2.8 | 12.1 | 14.9 | 15.3 | 1.7 | 0.4 | 7.6 | 3.2
2 | Joel Embiid | C | 22 | PHI | 21 | 21 | 518 | 37.6 | 2.4 | 5.9 | 10.2 | 21.0 | 9.9 | 13.0 | 3.8 | 11.0 | 14.8 | 3.4 | 1.4 | 4.8 | 7.5 | 6.9
3 | Henry Ellenson | PF | 20 | DET | 10 | 0 | 37 | 15.2 | 4.1 | 12.4 | 1.4 | 6.9 | 0.0 | 0.0 | 1.4 | 9.7 | 11.1 | 1.4 | 0.0 | 0.0 | 4.1 | 0.0
4 | James Harden | PG | 27 | HOU | 33 | 33 | 1198 | 37.4 | 4.0 | 11.6 | 6.9 | 13.1 | 11.5 | 13.7 | 1.8 | 8.7 | 10.5 | 16.0 | 1.8 | 0.4 | 7.5 | 3.5
5 | Andrew Bogut | C | 32 | DAL | 17 | 17 | 410 | 7.8 | 0.0 | 0.1 | 3.7 | 7.7 | 0.4 | 1.4 | 5.5 | 16.6 | 22.1 | 4.7 | 1.3 | 2.1 | 3.9 | 7.4
5 | DeMa

In [None]:
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from lightning import Lightning
lgn = Lightning(ipython=True, local=True)

In [None]:
dist = TSNE(random_state=50).fit_transform(test36.fillna(0))
lgn.scatter(dist[:,0], dist[:,1], labels = pd36[(pd36['Year'] > 2016) & (pd36['G']>10)]['Player'], alpha=0.5)

In [None]:
dist = TSNE(random_state=20).fit_transform(test100.fillna(0))
lgn.scatter(dist[:,0], dist[:,1], labels = pd100[(pd100['Year'] > 2016) & (pd100['G']>10)]['Player'], alpha=0.5)

In [None]:
dist = TSNE(random_state=20).fit_transform(testgame.fillna(0))
lgn.scatter(dist[:,0], dist[:,1], labels = pdgame[(pdgame['Year'] > 2016) & (pdgame['G']>10)]['Player'], alpha=0.5)