# Decision Boundaries

Once we have designed and trained a classifier, how do we analyze or debug it?
Of course, we can look over metrics it produces (like F1 or RMSE), but those are summaries/aggregates of it's performance.
They can tell you a lot, but sometimes you just need to visually see what your classifier is doing.

One way we can visualize a classifier's performance is by looking at it's [decision boundary](https://en.wikipedia.org/wiki/Decision_boundary) (also called a *decision surface*).
The decision boundary is boundary that a classifier uses in feature space to separate data points into different labels.
This is most easily conceptualized with a linear decision boundary with a binary classifier.

TODO: IMAGE

TODO: DESCRIBE IMAGE

feature space

We generally only use two dimensions, but you can try using three.






But we can also look at decision boundaries for classifiers with more than two labels.

<center><img src="ternary-decision-boundary.png" /></center>
<center style='font-size: small'>Image from <a href='https://scikit-learn.org/stable/modules/generated/sklearn.inspection.DecisionBoundaryDisplay.html'>scikit-learn</a>.</center>

sklearn
iris dataset with three labels.




Decision boundaries do not have to be actual lines with a known formula.
They are more just representations of what our classifier will tend to do.
For example, we can visualize decision boundaries for an algorithm like KNN where no actual lines/curves are estimated.

TODO: IMAGE

<center><img src="clustering-decision-boundary.png" /></center>
<center style='font-size: small'>Image from <a href='https://scikit-learn.org/stable/auto_examples/neighbors/plot_nca_classification.html'>scikit-learn</a>.</center>

(Note that the decision boundary will look a bit strange in some places because it is an approximation.)



Now with the theory out of the way, let's actually work with some decision boundaries.




TODO

Classification

Binary

Decision Boundary

Multiclass

linear
 - kernel

Untrained

Improve Per Iteration

Improve Per Data

Non-linear
 - Good
 - Bad


https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html

## Setup

Let's make a function to help us plot a decision boundary given a classifier, features, and labels.
You may recognize a similar function to this in HO4.
This function will take in a trained classifier and train/test data.
It will plot the classifier's decision boundary using [sklearn.inspection.DecisionBoundaryDisplay()](https://scikit-learn.org/stable/modules/generated/sklearn.inspection.DecisionBoundaryDisplay.html) function,
the training data with white outlines,
and the test data with black outlines.

In [None]:
import matplotlib.pyplot
import pandas
import sklearn.datasets
import sklearn.svm

FIGURE_SIZE = 5
FIGURE_RESOLUTION = 500

def visualize_decision_boundary(classifier,
                                train_features, train_labels,
                                test_features, test_labels,
                                title = None):
    """
    Visualize the decision boundary of a trained binary classifier
    using the FIRST TWO columns of the passed in features.

    The train data will be plotted in a lighter shade than the background with a white outline.
    The test data will be plotted in a darker shade than the background with a black outline.
    """

    figure, axis = matplotlib.pyplot.subplots(1, 1, figsize = (FIGURE_SIZE, FIGURE_SIZE))
                                    
    matplotlib.pyplot.suptitle(title)

    # Score the classifier.
    train_accuracy = classifier.score(train_features, train_labels)
    test_accuracy = classifier.score(test_features, test_labels)
    matplotlib.pyplot.title("Train Accuracy: %3.2f, Test Accuracy: %3.2f" % (train_accuracy, test_accuracy))

    all_features = pandas.concat([train_features, test_features])
                                    
    # Draw the decision boundary.
    decision_boundary = sklearn.inspection.DecisionBoundaryDisplay.from_estimator(
        classifier, all_features,
        response_method = "predict", ax = axis,
        xlabel = all_features.columns[0], ylabel = all_features.columns[1],
        grid_resolution = FIGURE_RESOLUTION,
        cmap = 'RdBu', alpha = 0.50,
    )

    # Display the train data points.
    axis.scatter(
        train_features[train_features.columns[0]], train_features[train_features.columns[1]],
        c = train_labels,
        cmap = 'RdBu', alpha = 0.25, edgecolor = 'w',
    )

    # Display the test data points.
    axis.scatter(
        test_features[test_features.columns[0]], test_features[test_features.columns[1]],
        c = test_labels,
        cmap = 'RdBu', alpha = 0.75, edgecolor = 'k',
    )

We will also need to get some data before for us to classify.

For this example, we will be generating some fake data using [sklearn.datasets.make_classification](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html).
This function is useful for quickly generating some classification test data.
The data will be pretty simple, but works well as a starting point.

In [None]:
# n_samples = 100 -- Make 200 data points.
# n_features = 2 -- Generate two feature columns (perfect for plotting decision boundaries).
# n_redundant = 0 -- No redundant features (features with the same information as other features).
# n_informative = 2 -- Make our two features useful (and not just random).
# random_state = 4 -- The seed for the random number generator.
#                     The exact number doesn't matter, the same seed will generate the same data.
# n_clusters_per_class = 1 -- Make the data simple.
all_features, all_labels = sklearn.datasets.make_classification(
    n_samples = 200, n_features = 2,
    n_redundant = 0, n_informative = 2,
    random_state = 4, n_clusters_per_class = 1
)

# Turn the features into a frame, the labels can stay as a list.
all_features = pandas.DataFrame(all_features, columns = ['A', 'B'])

# Split the data into train and test data.
# Note that we are not being rigorous and making sure the splits have the same label breakdown.

train_features = all_features[:100]
train_labels = all_labels[:100]

test_features = all_features[100:]
test_labels = all_labels[100:]

print(train_features[0:10])
print('---')
print(train_labels[0:10])

For our classifier in this example, we will be using a [Support Vector Machine](https://en.wikipedia.org/wiki/Support_vector_machine) (SVM).
SVMs are a classic and popular family of classifiers.
We will not be getting into the details of SVMs here,
but we will be using it because it tends to make clean decision boundaries that are easy for us to visualize.
(It can also do a cool trick that we will see later in this example.)

In [None]:
# Make a basic SVM classifier (no need to tweak any parameters).
classifier = sklearn.svm.LinearSVC()

# Fit the classifier on the training data.
classifier.fit(train_features, train_labels)

# Visualize the decision boundary.
visualize_decision_boundary(classifier, train_features, train_labels, test_features, test_labels)

## Decision Boundary vs Amount of Data

When we have a fully trained classifier that gets 100% accuracy on the train data and 99% accuracy on the test data,
we can see a very clear decision boundary with all the training points on their respctive sides.
We can see a single test point on the wrong side, but even that point is really close to the decision boundary.

But what about for a classifier that may not be trained all the way (or one that does not have nice clear data like ours)?
How does the decision boundary change as we increase the amount of training data?

In [None]:
num_points_range = [4, 8, 16, 32, 64, 100]

for num_points in num_points_range:
    classifier = sklearn.svm.LinearSVC()

    train_sub_features = train_features[0:num_points]
    train_sub_labels = train_labels[0:num_points]
    
    classifier.fit(train_sub_features, train_sub_labels)
    visualize_decision_boundary(classifier,
                                train_sub_features, train_sub_labels,
                                test_features, test_labels,
                                title = "%d Points" % (num_points))
    
    print("Accuracy: ", classifier.score(test_features, test_labels))

We can see that generally, more training points leads to a decision boundary that is more general and works better on the test data.

There is an interesting situation at 32 points where it creates a decision boundary that is worse than the one created at 16 points.
With so few points, this behavior is not unexpected.
We just got lucky at 16 points and happened to make a better decision boundary.
As we add more points, the data makes up for the luck and eventually we get an even better decision boundary at 100 points.

## "Linear" Decision Boundaries

Let's make some new data.
This time, we will use the [sklearn.datasets.make_moons()](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_moons.html) function.
This function makes two-dimensional data that looks like two interleaving half circles.

In [None]:
# n_samples = 100 -- Make 200 data points.
# noise = 0.3 -- Control how random the data looks..
# random_state = 4 -- The seed for the random number generator.
#                     The exact number doesn't matter, the same seed will generate the same data.
all_features, all_labels = sklearn.datasets.make_moons(n_samples = 200, noise = 0.3, random_state = 5)

# Turn the features into a frame, the labels can stay as a list.
all_features = pandas.DataFrame(all_features, columns = ['A', 'B'])

# Split the data into train and test data.
# Note that we are not being rigorous and making sure the splits have the same label breakdown.

train_features = all_features[:100]
train_labels = all_labels[:100]

test_features = all_features[100:]
test_labels = all_labels[100:]

print(train_features[0:10])
print('---')
print(train_labels[0:10])

Now that we have new data, let's use the exact same classifier and process we did before.

In [None]:
# Make a basic SVM classifier (no need to tweak any parameters).
classifier = sklearn.svm.LinearSVC()

# Fit the classifier on the training data.
classifier.fit(train_features, train_labels)

# Visualize the decision boundary.
visualize_decision_boundary(classifier, train_features, train_labels, test_features, test_labels)

With our previous data, we were able to get 99% accuracy,
but with this data we only get 83% accuracy (which it pretty bad for a toy dataset).
Here we can see one of the reason we have so many different classifiers and parameters for each classifier.
How the data looks can dramatically affect the performance of a classifier and different classifiers can handle different types of data differently.

In [None]:
# Make a SVM classifier using an RBF kernel.
classifier = sklearn.svm.SVC()

# Fit the classifier on the training data.
classifier.fit(train_features, train_labels)

# Visualize the decision boundary.
visualize_decision_boundary(classifier, train_features, train_labels, test_features, test_labels)

Using a different member of the SVM family (one using a [Radial bias function](https://en.wikipedia.org/wiki/Radial_basis_function) (RBF) kernel),
we can now get 91% accuracy without any other tweaks (which is pretty good).

You may be asking why our decision boundary is no longer a line.
Our first SVM classifier made a linear decision boundary,
but this one (which is still an SVM) now makes some sort of blob-like decision boundary.

The trick here is that this SVM changed it's definition of "linear" when classifying the data.
To see this, think about Cartesian vs polar coordinate systems.

In Cartesian coordinates, one of the most simple lines would be a horizontal line:
$$
    y = 2
$$

<center><img src="linear-cartesian.png" width=400px /></center>
<center style='font-size: small'>Image generated from <a href='https://www.desmos.com/calculator/mmoevgjbqg'>desmos</a>.</center>

In polar coordinates, an equally simple line would be a circle:
$$
    r = 2
$$

<center><img src="linear-polar.png" width=400px /></center>
<center style='font-size: small'>Image generated from <a href='https://www.desmos.com/calculator/erzlxmnumw'>desmos</a>.</center>

Both of these lines are considered "linear" in their respective spaces (coordinate systems),
but look much more complex in other coordinate systems.
Using these same tactics, our SVM transformed its feature space so that it can create decision boundaries that may look non-linear, but actually are linear.

## More Classifiers

For a look at decision boundaries from several different classifiers,
see scikit-learn's [classifier comparison](https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html#sphx-glr-auto-examples-classification-plot-classifier-comparison-py).

<center><img src="classifier-comparison.png"/></center>
<center style='font-size: small'>Image from <a href='https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html#sphx-glr-auto-examples-classification-plot-classifier-comparison-py'>scikit-learn</a>.</center>