This code is an introduction to supervised learning solving a classification problem using **decision trees**.
It follows [this tutorial](https://youtu.be/7eh4d6sabA0). 

# **Classification Problem**
We will follow these steps of solving a machine learning problem.


1. Import the Data
2. Clean the Data
3. split the Data into Training/ Test steps
4. Create a Model
5. Train the Model
6. Make Predictions
7. Evaluate and improve


# Problem description
Enter in the text cell below what you will be predicting in this classification problem (y) and which columns will be used in the prediction (X)

In [75]:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import joblib
from sklearn import tree
from pandas.api.types import is_numeric_dtype

1. Import the Data.

In [76]:
df = pd.read_csv('weatherAUS.csv').head(100) 

2. Display columns and describe the data set

In [77]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 23 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Date           100 non-null    object 
 1   Location       100 non-null    object 
 2   MinTemp        100 non-null    float64
 3   MaxTemp        100 non-null    float64
 4   Rainfall       99 non-null     float64
 5   Evaporation    0 non-null      float64
 6   Sunshine       0 non-null      float64
 7   WindGustDir    97 non-null     object 
 8   WindGustSpeed  97 non-null     float64
 9   WindDir9am     94 non-null     object 
 10  WindDir3pm     96 non-null     object 
 11  WindSpeed9am   99 non-null     float64
 12  WindSpeed3pm   99 non-null     float64
 13  Humidity9am    100 non-null    float64
 14  Humidity3pm    100 non-null    float64
 15  Pressure9am    100 non-null    float64
 16  Pressure3pm    100 non-null    float64
 17  Cloud9am       17 non-null     float64
 18  Cloud3pm   

3. Prepare Data

In [78]:
# Run this section to inspect X
X = df.drop(columns = ['WindGustDir','WindDir9am','WindDir3pm','RainToday','RainTomorrow','Date','Location','Evaporation','Sunshine','Humidity3pm','Pressure9am','Pressure3pm','Cloud9am','Cloud3pm','Temp9am','Temp3pm','Rainfall','WindGustSpeed','WindSpeed9am','WindSpeed3pm','Humidity9am'])
X


Unnamed: 0,MinTemp,MaxTemp
0,13.4,22.9
1,7.4,25.1
2,12.9,25.7
3,9.2,28.0
4,17.5,32.3
...,...,...
95,7.6,24.0
96,8.3,27.9
97,11.0,30.2
98,13.8,31.8


In [79]:
# Uncomment this section to inpect y
y = df['Location']
y

0     Albury
1     Albury
2     Albury
3     Albury
4     Albury
       ...  
95    Albury
96    Albury
97    Albury
98    Albury
99    Albury
Name: Location, Length: 100, dtype: object

In [80]:
check_for_nan = X.isnull().values.any()
print (check_for_nan)

False


4. Calculate accuracy

In [81]:
# Train 80% of the data set and use the rest to test
X_train, X_test, y_train, y_test = train_test_split(
                                    X, y, test_size=0.2)

model = DecisionTreeClassifier()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

# Compute model accuracy
score = accuracy_score(y_test, predictions)
score

1.0

5. Persisting Models

In [82]:
# Save the model to file
joblib.dump(model, 'MODELNAME.joblib')


['MODELNAME.joblib']

5.b. Import the model and make predictions

In [83]:
# Load saved model. Make sure that you have run the previous
# section at least once, and that the file exists.

model = joblib.load('MODELNAME.joblib')
predictions = model.predict(X_test)
predictions

array(['Albury', 'Albury', 'Albury', 'Albury', 'Albury', 'Albury',
       'Albury', 'Albury', 'Albury', 'Albury', 'Albury', 'Albury',
       'Albury', 'Albury', 'Albury', 'Albury', 'Albury', 'Albury',
       'Albury', 'Albury'], dtype=object)

6. (Optional) Visualize decision trees

In [84]:
tree.export_graphviz(model, out_file = 'MODELNAME.dot',
                    feature_names = X.columns, 
                    class_names = str(sorted(y.unique())), 
                    label = 'all',
                    rounded = True, 
                    filled = True)

#Download the file music-recommender.dot and open it in VS Code.
