# SVC for MNIST

This notebook trains and evaluates a random Fourier feature SVM classifier model (RFFSVC) on the MNIST dataset.
This notebook has two sections:

1. Train and evaluate a RFF/ORF/QRF SVC model on the MNIST dataset.
2. As a comparison with Kernel SVM, this notebook can also train and evaluate a kernel SVM under the same conditions as RFFSVC.
3. Hyperparameter tuning using Optuna interface in `rfflearn`.

By changing the parameters and repeating the learning and evaluation based on the code in (1), the following results can be obtained.

Notes:
* The accuracy may vary slightly due to the influence of random numbers. The results shown in the figure
  are run on the author's computer with the random number seed fixed as `rfflearn.seed(111)`.
* The GPU models in the figure below were trained on CPU, and only inference was performed on GPU
  (hence the accuracy is completely consistent with the CPU models).
  For the method of converting a CPU model to a GPU model, see `svc_train_cpu_predict_gpu.ipynb`.

<div align="center">
    <img src="./figures/Inference_time_and_acc_on_MNIST_svc.svg" width="600" alt="Inference time and acc on MNIST"/>
    &nbsp;&nbsp;&nbsp;
    <img src="./figures/Inference_time_vs_test_accuracy.svg" width="600" alt="Inference time vs test accuracy">
</div>

In [1]:
import numpy as np
import sklearn.datasets
import sklearn.svm

# Import rfflearn.
import rfflearn.cpu as rfflearn

# If you want to enable GPU, please import rfflearn like the following instead.
#import rfflearn.gpu as rfflearn

## Prepare MNIST dataset

### Load MNIST dataset

Load the MNIST dataset using `sklearn.fetch_openml` and standardize it.
Also the loaded label data is expressed as a string, so it is converted to an integer type.

In [2]:
%%time

# Load MNIST.
Xs, ys = sklearn.datasets.fetch_openml("mnist_784", version=1, return_X_y=True, as_frame=False, data_home="./scikit_learn_data")

# Split to training and test data.
Xs_train, Xs_test, ys_train, ys_test = sklearn.model_selection.train_test_split(Xs, ys, test_size=10000, shuffle=False)

# Standardize the data (convert the input data range from [0, 255] to [0, 1]).
Xs_train = Xs_train.astype(np.float64) / 255.0
Xs_test  = Xs_test.astype(np.float64)  / 255.0

# Convert the string label to integer label.
ys_train = ys_train.astype(np.int32)
ys_test  = ys_test.astype(np.int32)

CPU times: user 4.06 s, sys: 657 ms, total: 4.72 s
Wall time: 19.1 s


### Data dimension reduction

Reduce data dimension using PCA.
This step is not necessary, but contribute to the test accuracy.

In [4]:
%%time

dim_pca = 128

# Create matrix for principal component analysis.
_, V = np.linalg.eig(Xs_train.T @ Xs_train)
T = np.real(V[:, :dim_pca])

CPU times: user 54.2 s, sys: 44.8 ms, total: 54.2 s
Wall time: 13.2 s


## Train and evaluate a SVM model

### Instanciate SVM model

Instanciate one of the following SVM model:

* `RFFSVC`: SVM classifier with random Fourier features. It shows slightly better test accuracy and pretty fast inference speed than `SVC`.
* `ORFSVC`: Similar to `RFFSVC`, but orthogonal random features are used.
* `QRFSVC`: Similar to `RFFSVC`, but quasi-random numbers are used.
* `SVC`: Kernel SVM classifier (not RFF) from Scikit-learn. It gives a little bad test accuracy to the others, and further, the inference speed is very slow.

In [3]:
# SVM classifier with random Fourier features.
svc = rfflearn.RFFSVC(dim_kernel=1024, std_kernel=0.05)

# SVM classifier with orthogonal random features.
# svc = rfflearn.ORFSVC(dim_kernel=1024, std_kernel=0.05)

# SVM classifier with quasi-random Fourier features.
# svc = rfflearn.QRFSVC(dim_kernel=1024, std_kernel=0.05)

# Kernel SVM classifier (not RFF). 
# svc = sklearn.svm.SVC(kernel="rbf", gamma="auto")

### Train the model

Train the SVM model.

In [5]:
%%time

rfflearn.seed(111)

# Train SVM.
svc.fit(Xs_train @ T, ys_train)

CPU times: user 18 s, sys: 893 ms, total: 18.9 s
Wall time: 1min 23s


<rfflearn.cpu.rfflearn_cpu_svc.RFFSVC at 0x7a00b0fcfec0>

### Evaluate on the test data

In [6]:
%%time

# Calculate score for test data.
score = 100 * svc.score(Xs_test @ T, ys_test)
print(f"Score = {score:.2f} [%]")

Score = 97.34 [%]
CPU times: user 2.98 s, sys: 394 ms, total: 3.37 s
Wall time: 1.57 s


## Hyperparameter tuning using Optuna

Re-split the training data to training and validation data. Then run the hyperparameter tuning.

In [7]:
Xs_opt_train, Xs_opt_valid, ys_opt_train, ys_opt_valid = sklearn.model_selection.train_test_split(Xs_train, ys_train, test_size=1/6)

print("Xs_opt_train.shape =", Xs_opt_train.shape)
print("Xs_opt_valid.shape =", Xs_opt_valid.shape)

Xs_opt_train.shape = (50000, 784)
Xs_opt_valid.shape = (10000, 784)


In [8]:
study = rfflearn.RFFSVC_tuner(train_set=(Xs_opt_train @ T, ys_opt_train),
                              valid_set=(Xs_opt_valid @ T, ys_opt_valid),
                              verbose=0, n_trials=10, n_jobs=-1)

In [9]:
# Show the result of the hyper parameter tuning.
print("- study.best_params:", study.best_params)
print("- study.best_value:",  study.best_value)
print("- study.best_model:",  study.user_attrs["best_model"])

- study.best_params: {'dim_kernel': 999, 'std_kernel': 0.03761772371488599}
- study.best_value: 0.9708
- study.best_model: <rfflearn.cpu.rfflearn_cpu_svc.RFFSVC object at 0x7a00b0fbc110>
