# Classification

### The Dataset - Fisher's Irises

We are going to use the <b>`iris`</b> dataset which comes with `sklearn`.  It's fairly small as we'll see shortly.

### Labels (species names/classes):
<img border="0" alt="iris species" src="https://az712634.vo.msecnd.net/notebooks/python_course/v1/irises.png" width="500" height="500">

### Question
e.g. What type of flower is this (pictured below) closest to of the three given classes?

<img border="0" alt="iris species" src="https://az712634.vo.msecnd.net/notebooks/python_course/v1/iris-setosa.jpg" width="200">

### Get to know the data - explore
* Features (columns/measurements) are depicted in this diagram:
<img border="0" alt="iris data features" src="https://az712634.vo.msecnd.net/notebooks/python_course/v1/iris_petal_sepal.png" width="200" height="200">

Next, let's explore:
* Shape
* The actual data
* Summaries

In [None]:
# Familiar imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
from sklearn import datasets

# Other datasets in sklearn have similar "load" functions
dataset = datasets.load_iris()

# put dataset into a pandas dataframe
df = pd.DataFrame(dataset.data, columns = dataset.feature_names)
df['target'] = pd.DataFrame(dataset.target)

df.head()

<b>Shape and representation<b>

In [None]:
df.info()

In [None]:
df.describe()

#### Leave one value out from training set - that will be test later on

In [None]:
# Sperate train and test data
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(dataset.data, dataset.target, test_size=0.2)

#### Train Classification Model

Again, follow the 4-step modelling pattern.

**Step 1.** Import the model you want to use

In [None]:
from sklearn.linear_model import LogisticRegression

**Step 2.** Make an instance of the Model and define parameters (optional)

In [None]:
# Our model - a multiclass regression
model = LogisticRegression()

**Step 3.** Training the model on the Iris dataset, storing the information learned from the data

In [None]:
model.fit(X_train, y_train)

**Step 4.** Predict labels for test data

In [None]:
y_predict = model.predict(X_test)

### Evaluating the Model's Performance
While there are other ways of measuring model performance (precision, recall, F1 Score, ROC Curve, etc), we are going to keep this simple and use accuracy as our metric. 
To do this are going to see how the model performs on the new data (test set)

#### Accuracy
Accuracy is defined as: (fraction of correct predictions): correct predictions / total number of data points

In [None]:
score = model.score(X_test, y_test)
print(score)

In [None]:
print('Predicted class %s, real class %s' % (
        y_predict[1], y_test[1]))

In [None]:
print('Probabilities of membership in each class: %s' % 
      model.predict_proba(X_test))

## Visualize
There are many ways to visualize.

#### Boxplots
Boxplots are a simple form of visualization

In [None]:
# A bit of rearrangement for plotting
df2 = pd.DataFrame()
for i in range(0,3):
    df2[i] = df.loc[(df['target'] == i), 'sepal length (cm)'].dropna().values

In [None]:
# Convert back to an array
data = np.array(df2)

# Plot a boxplot
plt.boxplot(data)
plt.title('sepal length (cm)')

#### Confusion Matrix
A confusion matrix is a table that is often used to describe the performance of a classification model (or “classifier”) on a set of test data for which the true values are known. In this section, I am just showing two python packages (Seaborn and Matplotlib) for making confusion matrices more understandable and visually appealing.

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import metrics

In [None]:
cm = metrics.confusion_matrix(y_test, y_predict)
print(cm)

In [None]:
plt.figure(figsize=(3,3))
sns.heatmap(cm, annot=True, fmt=".3f", linewidths=.5, square = True, cmap = 'Blues_r');
plt.ylabel('Actual label');
plt.xlabel('Predicted label');
all_sample_title = 'Accuracy Score: {0}'.format(score)
plt.title(all_sample_title, size = 15);