<a href="https://colab.research.google.com/github/teellis/UnsupervisedLearning-Tutorial/blob/main/Unsupervised_Lesson3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lesson 3 - NBA Dataset

Now, let's look at a real-life complex dataset.

We have play-by-play NBA data for the year 2019-2020, meaning each attempt to shoot is recorded individually.

We want to predict how successful a shot will be, given other features, but do not know what features are best for this. So, we will create an Association Model to predict what factors play into making a successful shot!

In [None]:
# Handy imports
import pandas as pd
import numpy as np

In [None]:
# Load the NBA dataset from GitHub.
# dataset source: https://sports-statistics.com/sports-data/nba-basketball-datasets-csv-files/
df_nba_raw = pd.read_csv("https://raw.githubusercontent.com/teellis/UnsupervisedLearning-Tutorial/main/2019-20_pbp_data.csv")
df_nba_raw.drop(df_nba_raw.filter(regex="Unnamed"),axis=1, inplace=True)
df_nba_raw.head()

Unnamed: 0,GameType,Location,Date,Time,WinningTeam,Quarter,SecLeft,AwayTeam,AwayPlay,AwayScore,HomeTeam,HomePlay,HomeScore,Shooter,ShotType,ShotOutcome,ShotDist,Assister,Blocker,FoulType,Fouler,Fouled,Rebounder,ReboundType
0,regular,Scotiabank Arena Toronto Canada,October 22 2019,8:00 PM,TOR,1,720,NOP,Jump ball: D. Favors vs. M. Gasol (L. Ball gai...,0,TOR,,0,,,,,,,,,,,
1,regular,Scotiabank Arena Toronto Canada,October 22 2019,8:00 PM,TOR,1,708,NOP,L. Ball misses 2-pt jump shot from 11 ft,0,TOR,,0,L. Ball - balllo01,2-pt jump shot,miss,11.0,,,,,,,
2,regular,Scotiabank Arena Toronto Canada,October 22 2019,8:00 PM,TOR,1,707,NOP,Offensive rebound by D. Favors,0,TOR,,0,,,,,,,,,,D. Favors - favorde01,offensive
3,regular,Scotiabank Arena Toronto Canada,October 22 2019,8:00 PM,TOR,1,707,NOP,D. Favors makes 2-pt layup at rim,2,TOR,,0,D. Favors - favorde01,2-pt layup,make,0.0,,,,,,,
4,regular,Scotiabank Arena Toronto Canada,October 22 2019,8:00 PM,TOR,1,689,NOP,,2,TOR,O. Anunoby misses 2-pt layup from 3 ft,0,O. Anunoby - anunoog01,2-pt layup,miss,3.0,,,,,,,


In [None]:
df_nba_raw.shape

(451192, 24)

In this NBA play-by-play dataset, we have 451192 rows and 24 columns!

---
# Step 1: Data Cleanup
Currently, our dataset has these fields:
- *GameType, Location, Date, Time*
- **Fouls** (*FoulType, Fouler, Fouled*) 
- *WinningTeam, Quarter, SecLeft*
- *AwayTeam, AwayPlay, AwayScore*
- *HomeTeam, HomePlay, HomeScore*
- **Shots** (*Shooter, ShotType, ShotOutcome, ShotDist*)
- *Assister, Blocker*
- **Rebounds** (*Rebounder, ReboundType*)

#### **Stop and think:**
#### We want to predict what factors play into making a successful shot. So, considering that, what fields could we remove to make our prediction?



#### **Type the fields you think we can or should remove here:**
> 

> Enter Solution here.

##### **Hint #1**

Think, are any of these features irrelevant? Or could they actually mess with our results if not removed?

##### **Hint #2**

Do we care about fouls? And will features such as the location or date make any difference? What else?

#### ***Once ready, click to reveal the answer:***

Actually, we can remove these fields:

***GameType, Location, Date, Time***
> These can be removed, as they are often constant across a game and do not need to be factored in for our specific problem. 
> Remember, we want to determine what features may determine a hit or missed shot; removing these features allows us to focus more on the features that actually change across shots.

***WinningTeam, AwayTeam, AwayPlay, AwayScore, HomeTeam, HomePlay, HomeScore***
> These can also be removed as they aren't particularly helpful to know in the context of a single shot. Perhaps WinningTeam could be factored into the model (maybe if the team making the play is the winning team?) but for the purposes of this demo we will not dive into that. You are free to try it out, though!

***AwayPlay, HomePlay, SecLeft***
> These fields are already summarized nicely already by the ShotType and Quarter fields.

***FoulType, Fouler, Fouled***

> Again, we want to determine what features make a successful shot. Therefore, we should isolate only the columns where a shot attempt is recorded, and can remove any other recorded plays, including fouls.

> **Warning:** These entries require not only the removal of the columns, but **first the removal of the rows where exclusively fouls are recorded.**

**What about Rebounds (*Rebounder, ReboundType*)?**
> We could remove rebounds.. but there might be a way to use them for our analysis!
> We'll see how in a bit, so **let's leave them in for now.**

#### **Now, let's actually remove the fields!**

##### First, we need to remove the unnecessary rows, as explained in the previous solution. 
##### But, how do we find and remove them?

In [None]:
## *Your code here.*


##### **Hint #1**

We are trying to remove all **foul** entries. 

**Think:** if there is no foul in the row, will there any information in any of the foul columns (*FoulType, Fouler, Fouled*)?

##### **Hint #2**

This sample code I used to remove violations might help:

```
df_nba_raw = df_nba_raw.loc[df_nba_raw['ViolationType'].isna()]
```
This code finds and stores only the entries where the Violations fields are NaN, meaning no violation is being recorded.

How could you apply this to fouls?


##### **Solution:**

In [None]:
df_nba_raw = df_nba_raw.loc[df_nba_raw['FoulType'].isna()]

You could use any of the three features about fouls here (*FoulType, Fouler, Fouled*), but we used *FoulType* for demonstration.



##### Next, we need to remove all the columns we do not need anymore.

In [None]:
## *Your code here.*


##### **Hint #1**

What about making an array of all the column names we want to remove, and then dropping the array from the dataframe?

##### **Solution:**

In [None]:
colNames = ['GameType', 'Location', 'Date', 'Time', 'FoulType', 'Fouler', 'Fouled', 'WinningTeam', 'SecLeft', 'AwayTeam', 'AwayPlay', 'AwayScore', 'HomeTeam', 'HomePlay', 'HomeScore']
df_nba_raw = df_nba_raw.drop(colNames, axis=1)


** axis=1 drops the columns, while axis=0 would attempt to drop rows.



#### **Great job! Let's see, what does the dataframe look like now?**

In [None]:
df_nba_raw.head()

Unnamed: 0,Quarter,Shooter,ShotType,ShotOutcome,ShotDist,Assister,Blocker,Rebounder,ReboundType
0,1,,,,,,,,
1,1,L. Ball - balllo01,2-pt jump shot,miss,11.0,,,,
2,1,,,,,,,D. Favors - favorde01,offensive
3,1,D. Favors - favorde01,2-pt layup,make,0.0,,,,
4,1,O. Anunoby - anunoog01,2-pt layup,miss,3.0,,,,


### Considering Assisters and Blockers

Obviously, having an assister or blocker may affect the outcome of the shot. To make it work with our model, let's convert NaN's to False, and any other value to True.

#### Try it yourself:

In [None]:
# YOUR CODE HERE
#df_nba_raw['Assister'] = 
#df_nba_raw['Blocker'] = 

#### Hint


Try using [pandas.isnull()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.isnull.html) and [pandas.DataFrame.apply()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html).

#### Solution


In [None]:
df_nba_raw['Assister_bool'] = df_nba_raw['Assister'].apply(lambda x: not pd.isnull(x))
df_nba_raw['Blocker_bool'] = df_nba_raw['Blocker'].apply(lambda x: not pd.isnull(x))
df_nba_raw.drop(columns=['Assister', 'Blocker'], inplace=True)
df_nba_raw.head()

Unnamed: 0,Quarter,Shooter,ShotType,ShotOutcome,ShotDist,Rebounder,ReboundType,Assister_bool,Blocker_bool
0,1,,,,,,,False,False
1,1,L. Ball - balllo01,2-pt jump shot,miss,11.0,,,False,False
2,1,,,,,D. Favors - favorde01,offensive,False,False
3,1,D. Favors - favorde01,2-pt layup,make,0.0,,,False,False
4,1,O. Anunoby - anunoog01,2-pt layup,miss,3.0,,,False,False


### Considering Rebounds

Now, you may be thinking that rebounds are pretty important, so how do we factor those in? Currently, we have access to 'Rebounder' and 'ReboundType'. First, remember that in this play-by-play data, offensive rebounds are when shots are taken (defensive rebounds are when the opposing team gets possession of the ball after a shot). So, we would know that a row in this dataset is a rebound shot when the previous row has *ReboundType=offensive*. So, let's make a new column that holds True or False when the previous row says there is an offensive rebound coming after.

Try it yourself or take a look at our solution!

You can also view the note below for a hint.

In [None]:
# YOUR CODE HERE
df_nba = None

#### Hint #1


Pandas has a method called [shift()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.shift.html), which lets you see a shifted view of a dataframe extremely easily.

#### Solution

In [None]:
df_nba_raw['Rebound'] = (df_nba_raw['ReboundType'].shift() == 'offensive') & (df_nba_raw['Rebounder'].shift() == df_nba_raw['Shooter'])
df_nba = df_nba_raw.drop(columns=['ReboundType', 'Rebounder'])
df_nba.head()

Unnamed: 0,Quarter,Shooter,ShotType,ShotOutcome,ShotDist,Assister_bool,Blocker_bool,Rebound
0,1,,,,,False,False,False
1,1,L. Ball - balllo01,2-pt jump shot,miss,11.0,False,False,False
2,1,,,,,False,False,False
3,1,D. Favors - favorde01,2-pt layup,make,0.0,False,False,True
4,1,O. Anunoby - anunoog01,2-pt layup,miss,3.0,False,False,False


#### Finally, let's remove all rows that aren't shots.

In [None]:
df_nba.dropna(subset=['Shooter', 'ShotType'], inplace=True)
df_nba.head()

Unnamed: 0,Quarter,Shooter,ShotType,ShotOutcome,ShotDist,Assister_bool,Blocker_bool,Rebound
1,1,L. Ball - balllo01,2-pt jump shot,miss,11.0,False,False,False
3,1,D. Favors - favorde01,2-pt layup,make,0.0,False,False,True
4,1,O. Anunoby - anunoog01,2-pt layup,miss,3.0,False,False,False
6,1,J. Holiday - holidjr01,2-pt layup,miss,8.0,False,False,False
8,1,K. Lowry - lowryky01,3-pt jump shot,miss,25.0,False,False,False


#### Now, let's covert the remaining variables!

Quantitative variables that we want to use (such as shot distance) get converted to qualitative variables.

#### Converting ShotDist to a qualitative variable via binning

ShotDist is along a discrete scale right now, which is a little too granular for our uses. Let's bin it into bins of width 10 using [pandas.cut()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html). Try it yourself first, then look at the solution to check!

Hint: use -np.inf and np.inf to handle edge cases.

In [None]:
# YOUR CODE HERE
#df_nba['ShotDist_qual'] = 

#### Solution

In [None]:
df_nba['ShotDist_qual'] = pd.cut(df_nba['ShotDist'], [-np.inf, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, np.inf])
df_nba.head()

Unnamed: 0,Quarter,Shooter,ShotType,ShotOutcome,ShotDist,Assister_bool,Blocker_bool,Rebound,ShotDist_qual
1,1,L. Ball - balllo01,2-pt jump shot,miss,11.0,False,False,False,"(10.0, 20.0]"
3,1,D. Favors - favorde01,2-pt layup,make,0.0,False,False,True,"(-inf, 10.0]"
4,1,O. Anunoby - anunoog01,2-pt layup,miss,3.0,False,False,False,"(-inf, 10.0]"
6,1,J. Holiday - holidjr01,2-pt layup,miss,8.0,False,False,False,"(-inf, 10.0]"
8,1,K. Lowry - lowryky01,3-pt jump shot,miss,25.0,False,False,False,"(20.0, 30.0]"


### Let's drop the old ShotDist column and take another look at our dataset.

In [None]:
df_nba.drop(columns=['ShotDist'], inplace=True)
df_nba.head()

Unnamed: 0,Quarter,Shooter,ShotType,ShotOutcome,Assister_bool,Blocker_bool,Rebound,ShotDist_qual
1,1,L. Ball - balllo01,2-pt jump shot,miss,False,False,False,"(10.0, 20.0]"
3,1,D. Favors - favorde01,2-pt layup,make,False,False,True,"(-inf, 10.0]"
4,1,O. Anunoby - anunoog01,2-pt layup,miss,False,False,False,"(-inf, 10.0]"
6,1,J. Holiday - holidjr01,2-pt layup,miss,False,False,False,"(-inf, 10.0]"
8,1,K. Lowry - lowryky01,3-pt jump shot,miss,False,False,False,"(20.0, 30.0]"


### Now, convert any variable that isn't a true/false variable into one using [pandas.get_dummies()](https://pandas.pydata.org/docs/reference/api/pandas.get_dummies.html).

Hint: pandas.get_dummies() lets you specify multiple columns at a time.

#### Try it yourself:

In [None]:
# YOUR CODE HERE
#df_nba_dummy = 

#### Solution:

In [None]:
df_nba_dummy = pd.get_dummies(df_nba, columns=['Quarter', 'Shooter', 'ShotType', 'ShotOutcome', 'ShotDist_qual'])
df_nba_dummy.head()

Unnamed: 0,Assister_bool,Blocker_bool,Rebound,Quarter_1,Quarter_2,Quarter_3,Quarter_4,Quarter_5,Quarter_6,Shooter_A. Aminu - aminual01,Shooter_A. Baynes - baynear01,Shooter_A. Bradley - bradlav01,Shooter_A. Burks - burksal01,Shooter_A. Caruso - carusal01,Shooter_A. Cleveland - clevean01,Shooter_A. Coffey - coffeam01,Shooter_A. Crabbe - crabbal01,Shooter_A. Davis - davisan02,Shooter_A. Drummond - drumman01,Shooter_A. Gordon - gordoaa01,Shooter_A. Holiday - holidaa01,Shooter_A. Horford - horfoal01,Shooter_A. Iguodala - iguodan01,Shooter_A. Jefferson - jeffeam01,Shooter_A. Johnson - johnsal02,Shooter_A. Len - lenal01,Shooter_A. McKinnie - mckinal01,Shooter_A. Mokoka - mokokad01,Shooter_A. Nader - naderab01,Shooter_A. Pasečņiks - pasecan01,Shooter_A. Rivers - riverau01,Shooter_A. Roberson - roberan03,Shooter_A. Schofield - schofad01,Shooter_A. Simons - simonan01,Shooter_A. Smailagić - smailal01,Shooter_A. Tolliver - tollian01,Shooter_A. Trier - trieral01,Shooter_A. Wiggins - wiggian01,Shooter_A. Žižić - zizican01,Shooter_B. Adebayo - adebaba01,...,Shooter_V. Law - lawvi01,Shooter_V. Oladipo - oladivi01,Shooter_V. Poirier - poirivi01,Shooter_V. Čančar - cancavl01,Shooter_W. Barton - bartowi01,Shooter_W. Carter - cartewe01,Shooter_W. Cauley-Stein - caulewi01,Shooter_W. Chandler - chandwi01,Shooter_W. Ellington - ellinwa01,Shooter_W. Gabriel - gabriwe01,Shooter_W. Hernangómez - hernawi01,Shooter_W. Howard - howarwi01,Shooter_W. Iwundu - iwundwe01,Shooter_W. Matthews - matthwe02,Shooter_Y. Ferrell - ferreyo01,Shooter_Y. Watanabe - watanyu01,Shooter_Z. Cheatham - cheatzy01,Shooter_Z. Collins - colliza01,Shooter_Z. LaVine - lavinza01,Shooter_Z. Norvell - norveza01,Shooter_Z. Smith - smithzh01,Shooter_Z. Williamson - willizi01,ShotType_2-pt dunk,ShotType_2-pt hook shot,ShotType_2-pt jump shot,ShotType_2-pt layup,ShotType_3-pt jump shot,ShotOutcome_make,ShotOutcome_miss,"ShotDist_qual_(-inf, 10.0]","ShotDist_qual_(10.0, 20.0]","ShotDist_qual_(20.0, 30.0]","ShotDist_qual_(30.0, 40.0]","ShotDist_qual_(40.0, 50.0]","ShotDist_qual_(50.0, 60.0]","ShotDist_qual_(60.0, 70.0]","ShotDist_qual_(70.0, 80.0]","ShotDist_qual_(80.0, 90.0]","ShotDist_qual_(90.0, 100.0]","ShotDist_qual_(100.0, inf]"
1,False,False,False,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0
3,False,False,True,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,1,0,0,0,0,0,0,0,0,0,0
4,False,False,False,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,1,0,0,0,0,0,0,0,0,0,0
6,False,False,False,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,1,0,0,0,0,0,0,0,0,0,0
8,False,False,False,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,0,0,0,0


### Lastly, convert the integers to booleans. This is simple enough so we will do it for you!

In [None]:
df_nba_dummy = df_nba_dummy.astype(bool)
df_nba_dummy.head()

Unnamed: 0,Assister_bool,Blocker_bool,Rebound,Quarter_1,Quarter_2,Quarter_3,Quarter_4,Quarter_5,Quarter_6,Shooter_A. Aminu - aminual01,Shooter_A. Baynes - baynear01,Shooter_A. Bradley - bradlav01,Shooter_A. Burks - burksal01,Shooter_A. Caruso - carusal01,Shooter_A. Cleveland - clevean01,Shooter_A. Coffey - coffeam01,Shooter_A. Crabbe - crabbal01,Shooter_A. Davis - davisan02,Shooter_A. Drummond - drumman01,Shooter_A. Gordon - gordoaa01,Shooter_A. Holiday - holidaa01,Shooter_A. Horford - horfoal01,Shooter_A. Iguodala - iguodan01,Shooter_A. Jefferson - jeffeam01,Shooter_A. Johnson - johnsal02,Shooter_A. Len - lenal01,Shooter_A. McKinnie - mckinal01,Shooter_A. Mokoka - mokokad01,Shooter_A. Nader - naderab01,Shooter_A. Pasečņiks - pasecan01,Shooter_A. Rivers - riverau01,Shooter_A. Roberson - roberan03,Shooter_A. Schofield - schofad01,Shooter_A. Simons - simonan01,Shooter_A. Smailagić - smailal01,Shooter_A. Tolliver - tollian01,Shooter_A. Trier - trieral01,Shooter_A. Wiggins - wiggian01,Shooter_A. Žižić - zizican01,Shooter_B. Adebayo - adebaba01,...,Shooter_V. Law - lawvi01,Shooter_V. Oladipo - oladivi01,Shooter_V. Poirier - poirivi01,Shooter_V. Čančar - cancavl01,Shooter_W. Barton - bartowi01,Shooter_W. Carter - cartewe01,Shooter_W. Cauley-Stein - caulewi01,Shooter_W. Chandler - chandwi01,Shooter_W. Ellington - ellinwa01,Shooter_W. Gabriel - gabriwe01,Shooter_W. Hernangómez - hernawi01,Shooter_W. Howard - howarwi01,Shooter_W. Iwundu - iwundwe01,Shooter_W. Matthews - matthwe02,Shooter_Y. Ferrell - ferreyo01,Shooter_Y. Watanabe - watanyu01,Shooter_Z. Cheatham - cheatzy01,Shooter_Z. Collins - colliza01,Shooter_Z. LaVine - lavinza01,Shooter_Z. Norvell - norveza01,Shooter_Z. Smith - smithzh01,Shooter_Z. Williamson - willizi01,ShotType_2-pt dunk,ShotType_2-pt hook shot,ShotType_2-pt jump shot,ShotType_2-pt layup,ShotType_3-pt jump shot,ShotOutcome_make,ShotOutcome_miss,"ShotDist_qual_(-inf, 10.0]","ShotDist_qual_(10.0, 20.0]","ShotDist_qual_(20.0, 30.0]","ShotDist_qual_(30.0, 40.0]","ShotDist_qual_(40.0, 50.0]","ShotDist_qual_(50.0, 60.0]","ShotDist_qual_(60.0, 70.0]","ShotDist_qual_(70.0, 80.0]","ShotDist_qual_(80.0, 90.0]","ShotDist_qual_(90.0, 100.0]","ShotDist_qual_(100.0, inf]"
1,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,True,False,True,False,False,False,False,False,False,False,False,False
3,False,False,True,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,True,False,True,False,False,False,False,False,False,False,False,False,False
4,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,True,True,False,False,False,False,False,False,False,False,False,False
6,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,True,True,False,False,False,False,False,False,False,False,False,False
8,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,True,False,False,True,False,False,False,False,False,False,False,False


---
# Step 2: Find frequent item sets

### Try using the MLXtend apriori library to generate frequent itemsets yourself! You can find the documentation [here](http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/).

You are free to set min_support, but a low value of 0.001 is suggested at first. This will let you see *all* of the genereated itemsets. You can narrow them down further if you'd like later on.

In [None]:
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

In [None]:
# YOUR CODE HERE
freq_items = None

#### Solution:

In [None]:
# MLXtend's apriori function takes in a one-hot encoded dataframe, which we just made ourselves
freq_items = apriori(df_nba_dummy, min_support=0.001, use_colnames=True)
freq_items

Unnamed: 0,support,itemsets
0,0.274189,(Assister_bool)
1,0.054833,(Blocker_bool)
2,0.057654,(Rebound)
3,0.257968,(Quarter_1)
4,0.250972,(Quarter_2)
...,...,...
3363,0.005316,"(ShotDist_qual_(-inf, 10.0], ShotType_2-pt lay..."
3364,0.004590,"(ShotDist_qual_(-inf, 10.0], ShotType_2-pt lay..."
3365,0.001571,"(ShotDist_qual_(-inf, 10.0], ShotType_2-pt dun..."
3366,0.005287,"(ShotDist_qual_(-inf, 10.0], ShotType_2-pt lay..."


---
# Step 3: Build Association Model

### Now, use the MLXtend association rules library to generate the association rules. You can find the documentation [here](http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/association_rules/).

Once again, we recommend a min_threshold of 0.001 at first, but you can increase it later on to filter out rare rules.

In [None]:
# YOUR CODE HERE
rules = None

#### Solution:

In [None]:
rules = association_rules(freq_items, metric="confidence", min_threshold=0.001)
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(Assister_bool),(Rebound),0.274189,0.057654,0.001275,0.004649,0.080637,-0.014533,0.946747
1,(Rebound),(Assister_bool),0.057654,0.274189,0.001275,0.022110,0.080637,-0.014533,0.742222
2,(Assister_bool),(Quarter_1),0.274189,0.257968,0.072659,0.264997,1.027246,0.001927,1.009563
3,(Quarter_1),(Assister_bool),0.257968,0.274189,0.072659,0.281659,1.027246,0.001927,1.010400
4,(Assister_bool),(Quarter_2),0.274189,0.250972,0.069873,0.254834,1.015387,0.001059,1.005182
...,...,...,...,...,...,...,...,...,...
15697,"(ShotDist_qual_(-inf, 10.0])","(ShotOutcome_miss, ShotType_2-pt layup, Quarte...",0.450130,0.004575,0.004565,0.010142,2.216781,0.002506,1.005624
15698,(ShotType_2-pt layup),"(ShotOutcome_miss, Quarter_4, ShotDist_qual_(-...",0.273532,0.005316,0.004565,0.016690,3.139439,0.003111,1.011567
15699,(Quarter_4),"(ShotOutcome_miss, ShotType_2-pt layup, ShotDi...",0.237830,0.018523,0.004565,0.019196,1.036312,0.000160,1.000686
15700,(ShotOutcome_miss),"(ShotType_2-pt layup, Quarter_4, ShotDist_qual...",0.540117,0.009852,0.004565,0.008452,0.857945,-0.000756,0.998589


### Now, let's analyze our rules!

For starters, we're really only interested in rules where the consequent is *only* ShotOutcome_make or ShotOutcome_miss. You can look at these specifically with this bit of code:

```
rules[rules['consequents'] == frozenset({'ShotOutcome_make'})]
```
or
```
rules[rules['consequents'] == frozenset({'ShotOutcome_miss'})]
```

In [None]:
rules[rules['consequents'] == frozenset({'ShotOutcome_make'})]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
196,(Assister_bool),(ShotOutcome_make),0.274189,0.459883,0.274189,1.000000,2.174465,0.148094,inf
251,(Rebound),(ShotOutcome_make),0.057654,0.459883,0.032204,0.558574,1.214599,0.005690,1.223572
447,(Quarter_1),(ShotOutcome_make),0.257968,0.459883,0.120110,0.465602,1.012435,0.001475,1.010701
613,(Quarter_2),(ShotOutcome_make),0.250972,0.459883,0.116262,0.463245,1.007310,0.000844,1.006263
777,(Quarter_3),(ShotOutcome_make),0.246219,0.459883,0.112818,0.458201,0.996342,-0.000414,0.996895
...,...,...,...,...,...,...,...,...,...
15463,"(ShotType_2-pt layup, Quarter_2, ShotDist_qual...",(ShotOutcome_make),0.010450,0.459883,0.005771,0.552246,1.200839,0.000965,1.206280
15524,"(Quarter_3, ShotDist_qual_(-inf, 10.0], ShotTy...",(ShotOutcome_make),0.001611,0.459883,0.001428,0.886503,1.927670,0.000687,4.758866
15554,"(ShotType_2-pt layup, Quarter_3, ShotDist_qual...",(ShotOutcome_make),0.009906,0.459883,0.005316,0.536658,1.166945,0.000761,1.165699
15613,"(Quarter_4, ShotDist_qual_(-inf, 10.0], ShotTy...",(ShotOutcome_make),0.001739,0.459883,0.001571,0.903409,1.964431,0.000771,5.591796


Right away we can see that the quarter doesn't seem to have much of an affect on whether a shot was made (lift is around 1). Maybe you should go back and remove the quarter columns and re-run the frequent itemset and association rules analysis!

Now, take a look at the rules yourself and try to figure out what gets you the best or worst chance at making a shot!

####Hint:

In [None]:
rules[rules['consequents'] == frozenset({'ShotOutcome_make'})].sort_values(by=['lift'], ascending=False)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
196,(Assister_bool),(ShotOutcome_make),0.274189,0.459883,0.274189,1.000000,2.174465,0.148094,inf
3866,"(Assister_bool, Shooter_M. Turner - turnemy01)",(ShotOutcome_make),0.001156,0.459883,0.001156,1.000000,2.174465,0.000624,inf
3854,"(Assister_bool, Shooter_M. Kleber - klebima01)",(ShotOutcome_make),0.001131,0.459883,0.001131,1.000000,2.174465,0.000611,inf
3842,"(Assister_bool, Shooter_M. Harrell - harremo01)",(ShotOutcome_make),0.001581,0.459883,0.001581,1.000000,2.174465,0.000854,inf
3836,"(Assister_bool, Shooter_M. Bridges - bridgmi02)",(ShotOutcome_make),0.001018,0.459883,0.001018,1.000000,2.174465,0.000550,inf
...,...,...,...,...,...,...,...,...,...
4539,"(ShotDist_qual_(20.0, 30.0], Rebound)",(ShotOutcome_make),0.003533,0.459883,0.001215,0.344056,0.748138,-0.000409,0.823419
4516,"(ShotType_3-pt jump shot, Rebound)",(ShotOutcome_make),0.003459,0.459883,0.001181,0.341429,0.742424,-0.000410,0.820134
11660,"(ShotDist_qual_(20.0, 30.0], ShotType_3-pt jum...",(ShotOutcome_make),0.003389,0.459883,0.001156,0.341108,0.741727,-0.000403,0.819735
9592,"(ShotType_3-pt jump shot, ShotDist_qual_(30.0,...",(ShotOutcome_make),0.004812,0.459883,0.001294,0.268994,0.584918,-0.000919,0.738868
