-
Notifications
You must be signed in to change notification settings - Fork 22
Adding Kernel PCovC Code #254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
28 commits
Select commit
Hold shift + click to select a range
888a37b
Adding KPCovC to docs
rvasav26 4d537a1
Changing assertTrue to assertEqual for correctness
rvasav26 de226db
Investigating into KPCovC inconsistencies
rvasav26 6dada5d
Trying out some things for KPCovC problems
rvasav26 315f358
Changing KPCovC's test_precomputed_classification
rvasav26 e653076
Continuing KPCovC investigation
rvasav26 844c16e
Changing _BasePCov and _BaseKPCov to be abstract base classes
rvasav26 3616619
Cleaning up print statements
rvasav26 004499a
Merging PCovC update
rvasav26 c676e10
Removing KPCovC experiment
rvasav26 7d04666
Trying mixing=1.0 for KPCovC/PCovC match
rvasav26 115a224
Switching KPCovC back to using SVC
rvasav26 bed1b4b
Minor edits after cleaning up KPCovC branch
rvasav26 e93b86f
Checking scaling and LinearSVC match
rvasav26 69dd1b6
Working on docstrings
rvasav26 b701e23
Adding example drafts
rvasav26 9e4e3d8
Switching from KPCovC w/SVC back to KPCovC w/linear classifiers
rvasav26 07aba83
Finalizing examples
rvasav26 fe6b0c7
Modifying tests
rvasav26 623bc1f
Modifying docstrings and minor edits
rvasav26 7cef97c
Updating CHANGELOG
rvasav26 e0ecb03
Formatting
rvasav26 26a246e
More formatting and cleaning
rvasav26 15579d6
Minor edits
rvasav26 993b215
CHANGELOG suggestion
rvasav26 4962500
Example suggestions
rvasav26 5e619ae
Docstring and other skmatter/decomposition suggestions
rvasav26 a1a316a
Christian's suggestions and decision_function tests
rvasav26 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,265 @@ | ||
#!/usr/bin/env python | ||
# coding: utf-8 | ||
|
||
""" | ||
Comparing KPCovC with KPCA | ||
====================================== | ||
""" | ||
# %% | ||
# | ||
|
||
import numpy as np | ||
|
||
import matplotlib.pyplot as plt | ||
import matplotlib as mpl | ||
from matplotlib.colors import ListedColormap | ||
|
||
from sklearn import datasets | ||
from sklearn.preprocessing import StandardScaler | ||
from sklearn.svm import LinearSVC | ||
from sklearn.decomposition import PCA, KernelPCA | ||
from sklearn.inspection import DecisionBoundaryDisplay | ||
from sklearn.model_selection import train_test_split | ||
from sklearn.linear_model import ( | ||
LogisticRegressionCV, | ||
RidgeClassifierCV, | ||
SGDClassifier, | ||
) | ||
|
||
from skmatter.decomposition import PCovC, KernelPCovC | ||
|
||
plt.rcParams["scatter.edgecolors"] = "k" | ||
cm_bright = ListedColormap(["#d7191c", "#fdae61", "#a6d96a", "#3a7cdf"]) | ||
|
||
random_state = 0 | ||
n_components = 2 | ||
|
||
# %% | ||
# | ||
# For this, we will combine two ``sklearn`` datasets from | ||
# :func:`sklearn.datasets.make_moons`. | ||
|
||
X1, y1 = datasets.make_moons(n_samples=750, noise=0.10, random_state=random_state) | ||
X2, y2 = datasets.make_moons(n_samples=750, noise=0.10, random_state=random_state) | ||
|
||
X2, y2 = X2 + 2, y2 + 2 | ||
R = np.array( | ||
[ | ||
[np.cos(np.pi / 2), -np.sin(np.pi / 2)], | ||
[np.sin(np.pi / 2), np.cos(np.pi / 2)], | ||
] | ||
) | ||
# rotate second pair of moons | ||
X2 = X2 @ R.T | ||
|
||
X = np.vstack([X1, X2]) | ||
y = np.concatenate([y1, y2]) | ||
|
||
# %% | ||
# | ||
# Original Data | ||
# ------------- | ||
|
||
fig, ax = plt.subplots(figsize=(5.5, 5)) | ||
ax.scatter(X[:, 0], X[:, 1], c=y, cmap=cm_bright) | ||
ax.set_title("Original Data") | ||
|
||
|
||
# %% | ||
# | ||
# Scale Data | ||
|
||
X_train, X_test, y_train, y_test = train_test_split( | ||
X, y, test_size=0.25, stratify=y, random_state=random_state | ||
) | ||
|
||
scaler = StandardScaler() | ||
X_train_scaled = scaler.fit_transform(X_train) | ||
X_test_scaled = scaler.transform(X_test) | ||
|
||
# %% | ||
# | ||
# PCA and PCovC | ||
# ------------- | ||
# | ||
# Both PCA and PCovC fail to produce linearly separable latent space | ||
# maps. We will need a kernel method to effectively separate the moon classes. | ||
|
||
mixing = 0.10 | ||
alpha_d = 0.5 | ||
alpha_p = 0.4 | ||
|
||
models = { | ||
PCA(n_components=n_components): "PCA", | ||
PCovC( | ||
n_components=n_components, | ||
random_state=random_state, | ||
mixing=mixing, | ||
classifier=LinearSVC(), | ||
): "PCovC", | ||
} | ||
|
||
fig, axs = plt.subplots(1, 2, figsize=(10, 4)) | ||
|
||
for ax, model in zip(axs, models): | ||
t_train = model.fit_transform(X_train_scaled, y_train) | ||
t_test = model.transform(X_test_scaled) | ||
|
||
ax.scatter(t_test[:, 0], t_test[:, 1], alpha=alpha_d, cmap=cm_bright, c=y_test) | ||
ax.scatter(t_train[:, 0], t_train[:, 1], cmap=cm_bright, c=y_train) | ||
|
||
ax.set_title(models[model]) | ||
plt.tight_layout() | ||
|
||
# %% | ||
# | ||
# Kernel PCA and Kernel PCovC | ||
# --------------------------- | ||
PicoCentauri marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# | ||
# A comparison of the latent spaces produced by KPCA and KPCovC is shown. | ||
# A logistic regression classifier is trained on the KPCA latent space (this is also | ||
# the default classifier used in KPCovC), and we see the comparison of the respective | ||
# decision boundaries and test data accuracy scores. | ||
|
||
fig, axs = plt.subplots(1, 2, figsize=(13, 6)) | ||
|
||
center = True | ||
resolution = 1000 | ||
|
||
kernel_params = {"kernel": "rbf", "gamma": 2} | ||
|
||
models = { | ||
KernelPCA(n_components=n_components, **kernel_params): { | ||
"title": "Kernel PCA", | ||
"eps": 0.1, | ||
}, | ||
KernelPCovC( | ||
n_components=n_components, | ||
random_state=random_state, | ||
mixing=mixing, | ||
center=center, | ||
**kernel_params, | ||
): {"title": "Kernel PCovC", "eps": 2}, | ||
} | ||
|
||
for ax, model in zip(axs, models): | ||
t_train = model.fit_transform(X_train_scaled, y_train) | ||
t_test = model.transform(X_test_scaled) | ||
|
||
if isinstance(model, KernelPCA): | ||
t_classifier = LinearSVC(random_state=random_state).fit(t_train, y_train) | ||
score = t_classifier.score(t_test, y_test) | ||
else: | ||
t_classifier = model.classifier_ | ||
score = model.score(X_test_scaled, y_test) | ||
|
||
DecisionBoundaryDisplay.from_estimator( | ||
estimator=t_classifier, | ||
X=t_test, | ||
ax=ax, | ||
response_method="predict", | ||
cmap=cm_bright, | ||
alpha=alpha_d, | ||
eps=models[model]["eps"], | ||
grid_resolution=resolution, | ||
) | ||
ax.scatter(t_test[:, 0], t_test[:, 1], alpha=alpha_p, cmap=cm_bright, c=y_test) | ||
ax.scatter(t_train[:, 0], t_train[:, 1], cmap=cm_bright, c=y_train) | ||
ax.set_title(models[model]["title"]) | ||
|
||
ax.text( | ||
0.82, | ||
0.03, | ||
f"Score: {round(score, 3)}", | ||
fontsize=mpl.rcParams["axes.titlesize"], | ||
transform=ax.transAxes, | ||
) | ||
ax.set_xticks([]) | ||
ax.set_yticks([]) | ||
|
||
fig.subplots_adjust(wspace=0.04) | ||
plt.tight_layout() | ||
|
||
|
||
# %% | ||
# | ||
# Effect of KPCovC Classifier on KPCovC Maps and Decision Boundaries | ||
# ------------------------------------------------------------------------------ | ||
# | ||
# Based on the evidence :math:`\mathbf{Z}` generated by the underlying classifier fit | ||
# on a computed kernel :math:`\mathbf{K}` and :math:`\mathbf{Y}`, Kernel PCovC will | ||
# produce varying latent space maps. Hence, the decision boundaries produced by the | ||
# linear classifier fit between :math:`\mathbf{T}` and :math:`\mathbf{Y}` to make | ||
# predictions will also vary. | ||
|
||
names = ["Logistic Regression", "Ridge Classifier", "Linear SVC", "SGD Classifier"] | ||
|
||
models = { | ||
LogisticRegressionCV(random_state=random_state): { | ||
"kernel_params": {"kernel": "rbf", "gamma": 12}, | ||
"title": "Logistic Regression", | ||
}, | ||
RidgeClassifierCV(): { | ||
"kernel_params": {"kernel": "rbf", "gamma": 1}, | ||
"title": "Ridge Classifier", | ||
"eps": 0.40, | ||
}, | ||
LinearSVC(random_state=random_state): { | ||
"kernel_params": {"kernel": "rbf", "gamma": 15}, | ||
"title": "Support Vector Classification", | ||
}, | ||
SGDClassifier(random_state=random_state): { | ||
"kernel_params": {"kernel": "rbf", "gamma": 15}, | ||
"title": "SGD Classifier", | ||
"eps": 10, | ||
}, | ||
} | ||
|
||
fig, axs = plt.subplots(1, len(models), figsize=(4 * len(models), 4)) | ||
|
||
for ax, name, model in zip(axs.flat, names, models): | ||
kpcovc = KernelPCovC( | ||
n_components=n_components, | ||
random_state=random_state, | ||
mixing=mixing, | ||
classifier=model, | ||
center=center, | ||
**models[model]["kernel_params"], | ||
) | ||
t_kpcovc_train = kpcovc.fit_transform(X_train_scaled, y_train) | ||
t_kpcovc_test = kpcovc.transform(X_test_scaled) | ||
kpcovc_score = kpcovc.score(X_test_scaled, y_test) | ||
|
||
DecisionBoundaryDisplay.from_estimator( | ||
estimator=kpcovc.classifier_, | ||
X=t_kpcovc_test, | ||
ax=ax, | ||
response_method="predict", | ||
cmap=cm_bright, | ||
alpha=alpha_d, | ||
eps=models[model].get("eps", 1), | ||
grid_resolution=resolution, | ||
) | ||
|
||
ax.scatter( | ||
t_kpcovc_test[:, 0], | ||
t_kpcovc_test[:, 1], | ||
cmap=cm_bright, | ||
alpha=alpha_p, | ||
c=y_test, | ||
) | ||
ax.scatter(t_kpcovc_train[:, 0], t_kpcovc_train[:, 1], cmap=cm_bright, c=y_train) | ||
ax.text( | ||
0.70, | ||
0.03, | ||
f"Score: {round(kpcovc_score, 3)}", | ||
fontsize=mpl.rcParams["axes.titlesize"], | ||
transform=ax.transAxes, | ||
) | ||
|
||
ax.set_title(name) | ||
ax.set_xticks([]) | ||
ax.set_yticks([]) | ||
fig.subplots_adjust(wspace=0.04) | ||
|
||
plt.tight_layout() |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I LOVE this example.