In [None]:
%reload_ext nb_black

In [None]:
import numpy as np
import pandas as pd

from sklearn.datasets import make_circles

from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import KernelPCA, PCA

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline

import seaborn as sns
import matplotlib.pyplot as plt

%matplotlib inline

In [None]:
n = 30
np.random.seed(42)

a1 = np.random.normal(-10, 1, n // 3)
a2 = np.random.normal(10, 1, n // 3)
b = np.random.normal(0, 1, n // 3)

x = np.hstack((a1, b, a2))

labels = ["a"] * (n // 3)
labels += ["b"] * (n // 3)
labels += ["a"] * (n // 3)

df = pd.DataFrame({"x": x, "y": 0, "label": labels})

Plot `x` by `y` and color by `label`

<p align='center'>
  <img src='https://i.imgur.com/xcRD0xC.png' width=75%>
</p>

Lets make a homemade 'kernel' to map our data to a higher dimension.

* How are we able to tell how to separate the classes?
* How can we make the numbers reflect what we're seeing?

In [None]:
df["kernel_y"] = 

Replot the data using `kernel_y` instead of `y`

Boom, kerneled

This worked here, but it'd be nice if there was a preset selection of kernels that work in a lot of cases..................

Apply PCA to the `X`.  Only ask for a single principal component.

In [None]:
X = df[["x", "kernel_y"]]


Replot the data, this time use:
* the 1st principal component as your x axis data
* 0 as the y axis data
* color by label

This is an unrealistic example to show the concept of what `KernelPCA` (and what kernels in SVM are doing).  The overall process:

* We map our data into a higher dimension using a kernel (aka data is mapped to kernel space)
* We then apply our analysis on this higher dimensional data (in `KernelPCA` we would apply PCA; in Kernel SVM we would apply a linear SVM)

In a more realistic example, we would likely have more than one feature to start and the application of the kernel would be less straightfoward then squaring the feature.  In practice we'll use one of the predefined kernels that `sklearn` provides.

Let's see how `sklearn.decomposition.KernelPCA` would treat the same problem with different kernels/kernel parameters.

Apply `KernelPCA` to the `X` and replot.  Try different parameters for the kernel. How is it doing?

In [None]:
X = df[["x"]]


Another toy data example to show `KernelPCA` succeeding before looking at real data.

In [None]:
X, y = make_circles(n_samples=1000, random_state=123, noise=0.1, factor=0.2)

df = pd.DataFrame(X)
df.columns = ["x", "y"]
df["label"] = y

sns.scatterplot("x", "y", hue="label", data=df)
plt.show()

* Apply `PCA` to the `X` to reduce to 1 dimension
* Plot the resulting first principal component and color by `y`

* Apply `KernelPCA` to the `X` to reduce to 1 dimension
* Plot the resulting first principal component and color by `y`
* Play with parameters. Are we able to make the data linearly separable?

If you're wanting to use `KernelPCA` in a supervised learning pipeline.  I suggest to use `sklearn.pipeline.Pipeline` and optimize these paramaters with `sklearn.model_selection.GridSearchCV` (or a different search like `BayesSearchCV` to speed things up).

Let's apply it to some boring real data to see how a pipeline might look.

In [None]:
iris = sns.load_dataset("iris")

# Restricting to just sepal data
X = iris[["sepal_length", "sepal_width"]]
y = iris["species"]

sns.scatterplot("sepal_length", "sepal_width", hue="species", data=iris)
plt.show()

How this might look manually.

In [None]:
scaler = StandardScaler()
scaled = scaler.fit_transform(X)

kpca = KernelPCA(2, kernel="rbf", gamma=10)
pcs = kpca.fit_transform(X)

pc_df = pd.DataFrame({"pc1": pcs[:, 0], "pc2": pcs[:, 1]})
pc_df["label"] = iris["species"]

sns.scatterplot("pc1", "pc2", hue="label", data=pc_df)
plt.show()

In [None]:
no_pc_model = LogisticRegression()
no_pc_model.fit(scaled, y)
no_pc_acc = no_pc_model.score(scaled, y)

pc_model = LogisticRegression()
pc_model.fit(pcs, y)
pc_acc = pc_model.score(pcs, y)

print(f"No KernalPCA Accuracy: {no_pc_acc}")
print(f"KernalPCA Accuracy: {pc_acc}")

Well those parameters didn't work too well..  Lets try and optimize

Make a `Pipeline` and grid of parameters to optimize `KernelPCA`.  Run a `GridSearchCV` to find the best ones.

In [None]:
# We're gonna get (and ignore) ConvergenceWarnings
pipeline = Pipeline(
    [
#       ('step_name', Step()),
    ]
)

params = {}

pipeline_cv = GridSearchCV(pipeline, params)
pipeline_cv.fit(X, y)

In [None]:
pipeline_cv.score(X, y)

That accuracy looks familiar.... lets look at the selected parameters...

In [None]:
pipeline_cv.best_params_

We ended up choosing a linear kernel with 2 components.  This means that our KernelPCA step didn't add anything to our pipeline (we just rotated our data and passed it to the classification step).

Like any method we've seen, this isn't a silver bullet.  Try some things and find out what works, and use a grid search to help you out along the way.