In this notebook we will be looking at features and tools that Scikit has to offer, but first, some basics need to be established.

The five main steps in training a machine learning algorith is:
    1. Selection of features.
    2. Choosing a performance metric.
    3. Choosing a classifier and optimization algorithm.
    4. Evaluating the performance of the model.
    5. Tuning the algorithm.

We will skip the first two and last steps for now, and only focus on 3 and 4.  We will be training a Perceptron with the Iris dataset again, both of which are provided in the scikit libraries. 

In [2]:
from sklearn import datasets
import numpy as np
iris = datasets.load_iris()
X = iris.data[:, [2,3]]
y = iris.target

# We are going to split our datasets into training and testing
# sets for performance measures
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=0)

# Here we are going to apply feature scaling with scikit's
# libraries. Need to look over this part in previous notebook
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
sc.fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)

# The Perceptron algorithm supports multi-class-classification,
# so us feeding all three classes at once is perfectly fine
from sklearn.linear_model import Perceptron
ppn = Perceptron(n_iter = 40, eta0 = 0.1, random_state = 0)
ppn.fit(X_train_std, y_train)

# Having trained our Perceptron, we can predict with it with
# the following syntax
y_pred = ppn.predict(X_test_std)
print('Misclassified samples: %d' % (y_test != y_pred).sum())

Misclassified samples: 4


With the above code, we can see that we mis-classified four out of the 45 test samples, which is roughly 8.9 percent error

We can use Scikit's metrics module to calculate the accuracy with the following code

In [3]:
from sklearn.metrics import accuracy_score
print('Accuracy: %.2f' % accuracy_score(y_test, y_pred))

Accuracy: 0.91


We are now going to visualize our decision boundary to see how well our perceptron performed. For this function we are going to highlight the test datasets via small circles just for visual analysis.

In [None]:
from matplotlib.colors import ListedColormap
import matplotlib.pyplot as plt

def plot_decision_regions(X, y, classifier, test_idx=None,
                         resolution=0.02):
    # setup marker generator and color map
    markers = ('s', 'x', 'o', '^', 'v')
    colors = ('red', 'blue', 'lightgreen', 'gray', 'cyan')
    