In this blog we will be predicting NBA 2017 winners with Decision Trees in scikit learn.The National Basketball Association (NBA) is the major men's professional basketball league in North America, and is widely considered to be the premier men's professional basketball league in the world. It has 30 teams (29 in the United States and 1 in Canada). The data is available at 
https://www.basketball-reference.com/leagues/NBA_2017_games-october.html. I have assembled the data in a csv file and available in my github folder. This blog is influenced by [this book](https://www.packtpub.com/big-data-and-business-intelligence/learning-data-mining-python).

During the [regular season](https://en.wikipedia.org/wiki/National_Basketball_Association#Regular_season), each team plays 82 games, 41 each home and away. A team faces opponents in its own division four times a year (16 games). Each team plays six of the teams from the other two divisions in its conference four times (24 games), and the remaining four teams three times (12 games). Finally, each team plays all the teams in the other conference twice apiece (30 games). 


In [1]:
import pandas as pd
dataset = pd.read_csv("NBA_2017_regularGames.csv",parse_dates=["Date"])

In [2]:
dataset.head(2)

Unnamed: 0,Date,Start (ET),Visitor/Neutral,PTS,Home/Neutral,PTS.1,Unnamed: 6,Unnamed: 7,Notes
0,2016-10-25,7:30 pm,New York Knicks,88,Cleveland Cavaliers,117,Box Score,,
1,2016-10-25,10:30 pm,San Antonio Spurs,129,Golden State Warriors,100,Box Score,,


In [3]:
#Rename the columns
dataset.columns = ["Date","Time","Visitor Team","Visitor Points","Home Team","Home Points","Score Type","OT?","Notes"]

In [4]:
dataset.head(2)

Unnamed: 0,Date,Time,Visitor Team,Visitor Points,Home Team,Home Points,Score Type,OT?,Notes
0,2016-10-25,7:30 pm,New York Knicks,88,Cleveland Cavaliers,117,Box Score,,
1,2016-10-25,10:30 pm,San Antonio Spurs,129,Golden State Warriors,100,Box Score,,


From the description of how the matches are played
Now that we have our dataset, we can compute a baseline. A baseline is an accuracy
that indicates an easy way to get a good accuracy. 

In each match, we have two teams: a home team and a visitor team. An obvious
baseline, called the chance rate, is 50 percent. Choosing randomly will (over time)
result in an accuracy of 50 percent.

###### Prediction Class
We need to specify our class value, which will give
our classification algorithm something to compare against to see if its prediction
is correct or not. This could be encoded in a number of ways; however, for this
application, we will specify our class as 1 if the home team wins and 0 if the visitor
team wins. In basketball, the team with the most points wins. So, while the data set
doesn't specify who wins, we can compute it easily.

In [5]:
dataset["HomeWin"] = dataset["Visitor Points"] < dataset["Home Points"]

In [6]:
print("Home Win percentage: {0:.1f}%".format(100 * dataset["HomeWin"].sum() / dataset["HomeWin"].count()))

Home Win percentage: 58.4%


In [7]:
y_true = dataset["HomeWin"].values

In [8]:
#The array now holds our class values in a format that scikit-learn can read.
y_true

array([ True, False,  True, ...,  True, False,  True], dtype=bool)

##### Feature Engineering

The first two features we want to create to help us predict which team will win
are whether either of those two teams won their last game. This would roughly
approximate which team is playing well.

We will compute this feature by iterating through the rows in order and recording
which team won. When we get to a new row, we look up whether the team won the
last time we saw them.

Currently, this gives a false value to all teams (including the previous year's
champion!) when they are first seen.

In [9]:
dataset["HomeLastWin"] = False
dataset["VisitorLastWin"] = False
# This creates two new columns, all set to False
dataset.ix[:5]

Unnamed: 0,Date,Time,Visitor Team,Visitor Points,Home Team,Home Points,Score Type,OT?,Notes,HomeWin,HomeLastWin,VisitorLastWin
0,2016-10-25,7:30 pm,New York Knicks,88,Cleveland Cavaliers,117,Box Score,,,True,False,False
1,2016-10-25,10:30 pm,San Antonio Spurs,129,Golden State Warriors,100,Box Score,,,False,False,False
2,2016-10-25,10:00 pm,Utah Jazz,104,Portland Trail Blazers,113,Box Score,,,True,False,False
3,2016-10-26,7:30 pm,Brooklyn Nets,117,Boston Celtics,122,Box Score,,,True,False,False
4,2016-10-26,7:00 pm,Dallas Mavericks,121,Indiana Pacers,130,Box Score,OT,,True,False,False
5,2016-10-26,10:30 pm,Houston Rockets,114,Los Angeles Lakers,120,Box Score,,,True,False,False


In [10]:
# Now compute the actual values for these
# Did the home and visitor teams win their last game?
# We first create a (default) dictionary to store the team's last result:
from collections import defaultdict
won_last = defaultdict(int)

The key of this dictionary will be the team and the value will be whether they won
their previous game. We can then iterate over all the rows and update the current
row with the team's last result. 

Note that the preceding code relies on our dataset being in chronological order. Our
dataset is in order; however, if you are using a dataset that is not in order, you will
need to replace dataset.iterrows() with dataset.sort("Date").iterrows().

In [11]:
for index, row in dataset.iterrows():
    home_team = row["Home Team"]
    visitor_team = row["Visitor Team"]
    row["HomeLastWin"] = won_last[home_team]
    row["VisitorLastWin"] = won_last[visitor_team]
    dataset.ix[index] = row
    #We then set our dictionary with the each team's result (from this row) for the next
    #time we see these teams.
    #Set current Win
    won_last[home_team] = row["HomeWin"]
    won_last[visitor_team] = not row["HomeWin"]
    

There isn't much point in
looking at the first five games though. Due to the way our code runs, we didn't have
data for them at that point. Therefore, until a team's second game of the season, we
won't know their current form. We can instead look at different places in the list.
The following code will show the 20th to the 25th games of the season:

In [12]:
dataset.ix[20:25]

Unnamed: 0,Date,Time,Visitor Team,Visitor Points,Home Team,Home Points,Score Type,OT?,Notes,HomeWin,HomeLastWin,VisitorLastWin
20,2016-10-28,8:00 pm,Charlotte Hornets,97,Miami Heat,91,Box Score,,,False,True,True
21,2016-10-28,9:30 pm,Golden State Warriors,122,New Orleans Pelicans,114,Box Score,,,False,False,False
22,2016-10-28,8:00 pm,Phoenix Suns,110,Oklahoma City Thunder,113,Box Score,OT,,True,True,False
23,2016-10-28,7:00 pm,Cleveland Cavaliers,94,Toronto Raptors,91,Box Score,,,False,True,True
24,2016-10-28,9:00 pm,Los Angeles Lakers,89,Utah Jazz,96,Box Score,,,True,False,True
25,2016-10-29,8:00 pm,Indiana Pacers,101,Chicago Bulls,118,Box Score,,,True,True,False


The scikit-learn package implements the CART (Classification and Regression
Trees) algorithm as its default decision tree class, which can use both categorical and
continuous features.

The decision tree implementation in scikit-learn provides a method to stop the
building of a tree using the following options:

    • min_samples_split: This specifies how many samples are needed in order
to create a new node in the decision tree

    • min_samples_leaf: This specifies how many samples must be resultingfrom a node for it to stay
    
The first dictates whether a decision node will be created, while the second dictates whether a decision node will be kept.

Another parameter for decision tress is the criterion for creating a decision.
Gini impurity and Information gain are two popular ones:

    • Gini impurity: This is a measure of how often a decision node would incorrectly predict a sample's class

    •`Information gain: This uses information-theory-based entropy to indicate how much extra information is gained by the decision node


In [13]:
from sklearn.tree import DecisionTreeClassifier
clf = DecisionTreeClassifier(random_state=14)

We now need to extract the dataset from our pandas data frame in order to use
it with our scikit-learn classifier. We do this by specifying the columns we
wish to use and using the values parameter of a view of the data frame. The
following code creates a dataset using our last win values for both the home
team and the visitor team:

In [14]:
X_previouswins = dataset[["HomeLastWin", "VisitorLastWin"]].values

In [15]:
import numpy as np
from sklearn.model_selection import cross_val_score

clf = DecisionTreeClassifier(random_state=14)
scores = cross_val_score(clf, X_previouswins, y_true, scoring='accuracy')
print("Using just the last result from the home and visitor teams")
print("Accuracy: {0:.1f}%".format(np.mean(scores) * 100))

Using just the last result from the home and visitor teams
Accuracy: 58.4%


###### This scores 58.4% we are better than choosing randomly! 
We should be
able to do better

###### More Feature Engineering

We will try the following
questions:

    • Which team is considered better generally?
    • Which team won their last encounter?

We will also try putting the raw teams into the algorithm to check whether the
algorithm can learn a model that checks how different teams play against each other.

In [16]:
# What about win streaks?
dataset["HomeWinStreak"] = 0
dataset["VisitorWinStreak"] = 0
# Did the home and visitor teams win their last game?
from collections import defaultdict
win_streak = defaultdict(int)

for index, row in dataset.iterrows():  # Note that this is not efficient
    home_team = row["Home Team"]
    visitor_team = row["Visitor Team"]
    row["HomeWinStreak"] = win_streak[home_team]
    row["VisitorWinStreak"] = win_streak[visitor_team]
    dataset.ix[index] = row    
    # Set current win
    if row["HomeWin"]:
        win_streak[home_team] += 1
        win_streak[visitor_team] = 0
    else:
        win_streak[home_team] = 0
        win_streak[visitor_team] += 1

In [17]:
dataset.ix[100:105]

Unnamed: 0,Date,Time,Visitor Team,Visitor Points,Home Team,Home Points,Score Type,OT?,Notes,HomeWin,HomeLastWin,VisitorLastWin,HomeWinStreak,VisitorWinStreak
100,2016-11-08,7:00 pm,Atlanta Hawks,110,Cleveland Cavaliers,106,Box Score,,,False,True,True,6,1
101,2016-11-08,10:30 pm,Dallas Mavericks,109,Los Angeles Lakers,97,Box Score,,,False,True,True,3,1
102,2016-11-08,8:00 pm,Denver Nuggets,107,Memphis Grizzlies,108,Box Score,,,True,False,True,0,1
103,2016-11-08,10:00 pm,Phoenix Suns,121,Portland Trail Blazers,124,Box Score,,,True,True,False,2,0
104,2016-11-08,10:30 pm,New Orleans Pelicans,94,Sacramento Kings,102,Box Score,,,True,True,False,1,0
105,2016-11-09,7:30 pm,Chicago Bulls,107,Atlanta Hawks,115,Box Score,,,True,True,True,2,1


In [18]:
clf = DecisionTreeClassifier(random_state=14)
X_winstreak =  dataset[["HomeLastWin", "VisitorLastWin", "HomeWinStreak", "VisitorWinStreak"]].values
scores = cross_val_score(clf, X_winstreak, y_true, scoring='accuracy')
print("Using whether the home team is ranked higher")
print("Accuracy: {0:.1f}%".format(np.mean(scores) * 100))

Using whether the home team is ranked higher
Accuracy: 56.3%


In [19]:
# Let's try see which team is better on the ladder. Using the previous year's ladder
#https://www.basketball-reference.com/leagues/NBA_2016_standings.html
standing = pd.read_csv("ExapandedStanding.csv")


In [20]:
standing.head(3)

Unnamed: 0,Rk,Team,Overall,Home,Road,E,W,A,C,SE,...,Post,≤3,≥10,Oct,Nov,Dec,Jan,Feb,Mar,Apr
0,1,Golden State Warriors,73-9,39-2,34-7,27-Mar,46-6,09-Jan,08-Feb,10-0,...,25-May,07-Feb,44-5,3-0,16-0,11-Feb,14-Feb,09-Jan,15-Feb,05-Feb
1,2,San Antonio Spurs,67-15,40-1,27-14,24-Jun,43-9,09-Jan,07-Mar,08-Feb,...,22-Jul,04-Apr,44-6,01-Jan,13-Mar,14-Feb,11-Feb,11-Jan,13-Mar,04-Mar
2,3,Cleveland Cavaliers,57-25,33-8,24-17,35-17,22-Aug,14-Apr,08-Aug,13-May,...,19-Nov,04-Jul,32-8,02-Jan,11-Mar,08-May,13-Mar,08-May,11-May,04-Mar


In [21]:
row

Date                2017-04-12 00:00:00
Time                            9:00 pm
Visitor Team          San Antonio Spurs
Visitor Points                       97
Home Team                     Utah Jazz
Home Points                         101
Score Type                    Box Score
OT?                                 NaN
Notes                               NaN
HomeWin                            True
HomeLastWin                        True
VisitorLastWin                    False
HomeWinStreak                         1
VisitorWinStreak                      0
Name: 1229, dtype: object

In [22]:
# We can create a new feature -- HomeTeamRanksHigher\

dataset["HomeTeamRanksHigher"] = 0
for index , row in dataset.iterrows():
    home_team = row["Home Team"]
    visitor_team = row["Visitor Team"]
    home_rank = standing[standing["Team"] == home_team]["Rk"].values[0]
    visitor_rank = standing[standing["Team"] == visitor_team]["Rk"].values[0]
    row["HomeTeamRankHigher"] = int(home_rank > visitor_rank)
    dataset.ix[index] = row

In [23]:
standing['Rk'].values[1]

2

In [24]:
standing[standing["Team"] == "Utah Jazz"]["Rk"].values[0]

19

In [25]:
for index , row in dataset.iterrows():
    home_team = row["Home Team"]
    print(home_team)

Cleveland Cavaliers
Golden State Warriors
Portland Trail Blazers
Boston Celtics
Indiana Pacers
Los Angeles Lakers
Memphis Grizzlies
Milwaukee Bucks
New Orleans Pelicans
Orlando Magic
Philadelphia 76ers
Phoenix Suns
Toronto Raptors
Atlanta Hawks
Chicago Bulls
Portland Trail Blazers
Sacramento Kings
Brooklyn Nets
Dallas Mavericks
Detroit Pistons
Miami Heat
New Orleans Pelicans
Oklahoma City Thunder
Toronto Raptors
Utah Jazz
Chicago Bulls
Charlotte Hornets
Cleveland Cavaliers
Denver Nuggets
Milwaukee Bucks
New York Knicks
Philadelphia 76ers
Sacramento Kings
San Antonio Spurs
Detroit Pistons
Houston Rockets
Los Angeles Clippers
Memphis Grizzlies
Miami Heat
Oklahoma City Thunder
Phoenix Suns
Atlanta Hawks
Brooklyn Nets
Los Angeles Clippers
Toronto Raptors
Cleveland Cavaliers
Detroit Pistons
Indiana Pacers
Miami Heat
Minnesota Timberwolves
New Orleans Pelicans
Philadelphia 76ers
Portland Trail Blazers
San Antonio Spurs
Atlanta Hawks
Boston Celtics
Brooklyn Nets
Charlotte Hornets
Los Angeles 

In [26]:
dataset[:5]

Unnamed: 0,Date,Time,Visitor Team,Visitor Points,Home Team,Home Points,Score Type,OT?,Notes,HomeWin,HomeLastWin,VisitorLastWin,HomeWinStreak,VisitorWinStreak,HomeTeamRanksHigher
0,2016-10-25,7:30 pm,New York Knicks,88,Cleveland Cavaliers,117,Box Score,,,True,False,False,0,0,0
1,2016-10-25,10:30 pm,San Antonio Spurs,129,Golden State Warriors,100,Box Score,,,False,False,False,0,0,0
2,2016-10-25,10:00 pm,Utah Jazz,104,Portland Trail Blazers,113,Box Score,,,True,False,False,0,0,0
3,2016-10-26,7:30 pm,Brooklyn Nets,117,Boston Celtics,122,Box Score,,,True,False,False,0,0,0
4,2016-10-26,7:00 pm,Dallas Mavericks,121,Indiana Pacers,130,Box Score,OT,,True,False,False,0,0,0


In [27]:
dataset[500:505]

Unnamed: 0,Date,Time,Visitor Team,Visitor Points,Home Team,Home Points,Score Type,OT?,Notes,HomeWin,HomeLastWin,VisitorLastWin,HomeWinStreak,VisitorWinStreak,HomeTeamRanksHigher
500,2016-12-31,7:00 pm,Milwaukee Bucks,116,Chicago Bulls,96,Box Score,,,False,False,False,0,0,0
501,2016-12-31,7:00 pm,Cleveland Cavaliers,121,Charlotte Hornets,109,Box Score,,,False,True,True,2,1,0
502,2016-12-31,8:00 pm,New York Knicks,122,Houston Rockets,129,Box Score,,,True,True,False,3,0,0
503,2016-12-31,8:00 pm,Los Angeles Clippers,88,Oklahoma City Thunder,114,Box Score,,,True,False,False,0,0,0
504,2016-12-31,5:00 pm,Memphis Grizzlies,112,Sacramento Kings,98,Box Score,,,False,False,True,0,1,0


we use the cross_val_score function to test the result. First, we extract
the dataset.
we create a new DecisionTreeClassifier and run the evaluation

In [28]:
X_homehigher =  dataset[["HomeLastWin", "VisitorLastWin", "HomeTeamRanksHigher"]].values
clf = DecisionTreeClassifier(random_state=14)
scores = cross_val_score(clf, X_homehigher, y_true, scoring='accuracy')
print("Using whether the home team is ranked higher")
print("Accuracy: {0:.1f}%".format(np.mean(scores) * 100))

Using whether the home team is ranked higher
Accuracy: 58.4%


In [29]:
from sklearn.model_selection import GridSearchCV

parameter_space = {
                   "max_depth": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
                   }
clf = DecisionTreeClassifier(random_state=14)
grid = GridSearchCV(clf, parameter_space)
grid.fit(X_homehigher, y_true)
print("Accuracy: {0:.1f}%".format(grid.best_score_ * 100))

Accuracy: 58.4%


###### It is the same accuracy. Could we do any better?

Next, let's test which of the two teams won their last match. While rankings can give
some hints on who won (the higher ranked team is more likely to win), sometimes
teams play better against other teams. There are many reasons for this – for example,
some teams may have strategies that work against other teams really well. Following
our previous pattern, we create a dictionary to store the winner of the past game and
create a new feature in our data frame.

In [30]:
last_match_winner = defaultdict(int)
dataset["HomeTeamWonLast"] = 0

###### Then, we iterate over each row and get the home team and visitor team:

In [31]:
for index , row in dataset.iterrows():
    home_team = row["Home Team"]
    visitor_team = row["Visitor Team"]
#We want to see who won the last game between these two teams regardless of which
#team was playing at home. Therefore, we sort the team names alphabetically, giving
#us a consistent key for those two teams:
    teams = tuple(sorted([home_team, visitor_team]))  # Sort for a consistent ordering
    # Set in the row, who won the last encounter
    row["HomeTeamWonLast"] = 1 if last_match_winner[teams] == row["Home Team"] else 0
    dataset.ix[index] = row
    # Who won this one?
    winner = row["Home Team"] if row["HomeWin"] else row["Visitor Team"]
    last_match_winner[teams] = winner

In [32]:
dataset.ix[:5]

Unnamed: 0,Date,Time,Visitor Team,Visitor Points,Home Team,Home Points,Score Type,OT?,Notes,HomeWin,HomeLastWin,VisitorLastWin,HomeWinStreak,VisitorWinStreak,HomeTeamRanksHigher,HomeTeamWonLast
0,2016-10-25,7:30 pm,New York Knicks,88,Cleveland Cavaliers,117,Box Score,,,True,False,False,0,0,0,0
1,2016-10-25,10:30 pm,San Antonio Spurs,129,Golden State Warriors,100,Box Score,,,False,False,False,0,0,0,0
2,2016-10-25,10:00 pm,Utah Jazz,104,Portland Trail Blazers,113,Box Score,,,True,False,False,0,0,0,0
3,2016-10-26,7:30 pm,Brooklyn Nets,117,Boston Celtics,122,Box Score,,,True,False,False,0,0,0,0
4,2016-10-26,7:00 pm,Dallas Mavericks,121,Indiana Pacers,130,Box Score,OT,,True,False,False,0,0,0,0
5,2016-10-26,10:30 pm,Houston Rockets,114,Los Angeles Lakers,120,Box Score,,,True,False,False,0,0,0,0


In [33]:
X_home_higher =  dataset[["HomeTeamRanksHigher", "HomeTeamWonLast"]].values
clf = DecisionTreeClassifier(random_state=14)
scores = cross_val_score(clf, X_home_higher, y_true, scoring='accuracy')
print("Using whether the home team is ranked higher")
print("Accuracy: {0:.1f}%".format(np.mean(scores) * 100))

Using whether the home team is ranked higher
Accuracy: 58.4%


Finally, we will check what happens if we throw a lot of data at the decision tree, and
see if it can learn an effective model anyway.

In [34]:
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
encoding = LabelEncoder()
#We will fit this transformer to the home teams so that it learns an integer
#representation for each team
encoding.fit(dataset["Home Team"].values)

#We extract all of the labels for the home teams and visitor teams, and then join them
#(called stacking in NumPy) to create a matrix encoding both the home team and the
#visitor team for each game.
home_teams = encoding.transform(dataset["Home Team"].values)
visitor_teams = encoding.transform(dataset["Visitor Team"].values)
X_teams = np.vstack([home_teams, visitor_teams]).T

#we use the OneHotEncoder transformer to encode these
#integers into a number of binary features. Each binary feature will be a single value
#for the feature.

onehot = OneHotEncoder()
#We fit and transform on the same dataset, saving the results
X_teams = onehot.fit_transform(X_teams).todense()

#we run the decision tree as before on the new dataset
clf = DecisionTreeClassifier(random_state=14)
scores = cross_val_score(clf, X_teams, y_true, scoring='accuracy')
print("Accuracy: {0:.1f}%".format(np.mean(scores) * 100))

Accuracy: 58.0%


###### This scores an accuracy of 60 percent. The score is better than the baseline, but not as good as before.
It is possible that the larger number of features were not
handled properly by the decision trees. For this reason, we will try changing the
algorithm and see if that helps.

#### Random forests

The randomness inherent in Random forests may make it seem like we are leaving
the results of the algorithm up to chance. However, we apply the benefits of
averaging to nearly randomly built decision trees, resulting in an algorithm that
reduces the variance of the result.

As Random forests use many instances of DecisionTreeClassifier, they share many of the same
parameters such as the criterion (Gini Impurity or Entropy/Information Gain),
max_features, and min_samples_split.

Also, there are some new parameters that are used in the ensemble process:
    
    • n_estimators: This dictates how many decision trees should be built. A higher value will take longer to run, but will (probably) result in a higher accuracy.
    • oob_score: If true, the method is tested using samples that aren't in the random subsamples chosen for training the decision trees.
    • n_jobs: This specifies the number of cores to use when training the decision trees in parallel.

In [35]:
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(random_state=14)
scores = cross_val_score(clf, X_teams, y_true, scoring='accuracy')
print("Using full team labels is ranked higher")
print("Accuracy: {0:.1f}%".format(np.mean(scores) * 100))

Using full team labels is ranked higher
Accuracy: 60.2%


##### This results in an immediate benefit of 60.6 percent, up by 0.6 points by just swapping the classifier.


Random forests, using subsets of the features, should be able to learn more
effectively with more features than normal decision trees. We can test this by
throwing more features at the algorithm and seeing how it goes:

In [36]:
X_all = np.hstack([X_home_higher, X_teams])

In [37]:
clf = RandomForestClassifier(random_state=14)
scores = cross_val_score(clf, X_all, y_true, scoring='accuracy')
print("Accuracy: {0:.1f}%".format(np.mean(scores) * 100))

Accuracy: 61.1%


###### This results in 61.1 percent —even better!

In [38]:
parameter_space = {
"max_features": [2, 10, 'auto'],
"n_estimators": [100,],
"criterion": ["gini", "entropy"],
"min_samples_leaf": [2, 4, 6],
}
clf = RandomForestClassifier(random_state=14)
grid = GridSearchCV(clf, parameter_space)
grid.fit(X_all, y_true)
print("Accuracy: {0:.1f}%".format(grid.best_score_ * 100))

Accuracy: 63.0%


###### This has a much better accuracy of 64.2 percent!
If we wanted to see the parameters used, we can print out the best model that was
found in the grid search.

In [39]:
print(grid.best_estimator_)

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
            max_depth=None, max_features=2, max_leaf_nodes=None,
            min_impurity_split=1e-07, min_samples_leaf=4,
            min_samples_split=2, min_weight_fraction_leaf=0.0,
            n_estimators=100, n_jobs=1, oob_score=False, random_state=14,
            verbose=0, warm_start=False)


In [42]:
dataset[dataset["Home Team"] == "Utah Jazz"]["Date"].values[0]

numpy.datetime64('2016-10-28T00:00:00.000000000')

In [53]:
if dataset.loc[dataset['Home Team'] == ("Utah Jazz")]:
    print(dataset['Date'])
    

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().