## Demo 3: HKR classifier on MNIST dataset
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deel-ai/deel-lip/blob/master/docs/notebooks/demo3.ipynb)

This notebook will demonstrate learning a binary task on the MNIST0-8 dataset.

In [1]:
import os
os.environ["KERAS_BACKEND"] = "tensorflow"

In [2]:
# pip install deel-lip -qqq

In [2]:
import keras
import keras.ops as K
from keras.layers import Input, Flatten, Dense
from keras.optimizers import Adam
from keras.metrics import BinaryAccuracy

# from keras.models import Sequential
from deel.lip.model import Sequential

from deel.lip.layers import (
    SpectralDense,
    SpectralConv2D,
    ScaledL2NormPooling2D,
    FrobeniusDense,
)
from deel.lip.activations import GroupSort, GroupSort2
from deel.lip.losses import HKR, KR, HingeMargin, MulticlassHKR, MulticlassKR

import numpy as np

2025-04-23 09:55:46.323964: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1745394946.544275    7950 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1745394946.608031    7950 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-04-23 09:55:47.186816: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


### data preparation

For this task we will select two classes: 0 and 8. Labels are changed to {-1,1}, wich is compatible
with the Hinge term used in the loss.

In [3]:
from keras.datasets import mnist

# first we select the two classes
selected_classes = [0, 8]  # must be two classes as we perform binary classification


def prepare_data(x, y, class_a=0, class_b=8):
    """
    This function convert the MNIST data to make it suitable for our binary classification
    setup.
    """
    # select items from the two selected classes
    mask = (y == class_a) + (
        y == class_b
    )  # mask to select only items from class_a or class_b
    x = x[mask]
    y = y[mask]
    x = x.astype("float32")
    y = y.astype("float32")
    # convert from range int[0,255] to float32[-1,1]
    x /= 255
    x = x.reshape((-1, 28, 28, 1))
    # change label to binary classification {-1,1}
    y[y == class_a] = 1.0
    y[y == class_b] = 0.0
    return x, y.reshape((-1, 1))


# now we load the dataset
(x_train, y_train_ord), (x_test, y_test_ord) = mnist.load_data()

# prepare the data
x_train, y_train = prepare_data(
    x_train, y_train_ord, selected_classes[0], selected_classes[1]
)
x_test, y_test = prepare_data(
    x_test, y_test_ord, selected_classes[0], selected_classes[1]
)

x_train = np.transpose(x_train,(0,3,1,2))
x_test = np.transpose(x_test,(0,3,1,2))

### Build lipschitz Model

Let's first explicit the paremeters of this experiment

In [4]:
# training parameters
epochs = 100
batch_size = 128

# network parameters
activation = GroupSort  # ReLU, MaxMin, GroupSort2

# loss parameters
min_margin = 1.0
alpha = 10.0


Now we can build the network.
Here the experiment is done with a MLP. But `Deel-lip` also provide state of the art 1-Lipschitz convolutions.

In [5]:
keras.utils.clear_session()
# helper function to build the 1-lipschitz MLP
model = Sequential(
    layers=[
        Input((1, 28, 28)),
        Flatten(),
        SpectralDense(32, GroupSort2(), use_bias=True),
        SpectralDense(16, GroupSort2(), use_bias=True),
        SpectralDense(1, activation=None, use_bias=False),
    ],
    name="lipModel",
)
model.summary()

I0000 00:00:1745394991.759263    7950 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 20710 MB memory:  -> device: 0, name: NVIDIA A10G, pci bus id: 0000:00:1e.0, compute capability: 8.6


In [6]:
model.compile(
    loss=HKR(
        alpha=alpha, min_margin=min_margin
    ),  # HKR stands for the hinge regularized KR loss
    metrics=[
        # KR,  # shows the KR term of the loss
        HingeMargin(min_margin=min_margin),  # shows the hinge term of the loss
    ],
    optimizer=Adam(learning_rate=0.001),
)

### Learn classification on MNIST

Now the model is build, we can learn the task.

In [7]:
model.fit(
    x=x_train,
    y=y_train,
    validation_data=(x_test, y_test),
    batch_size=batch_size,
    shuffle=True,
    epochs=epochs,
    verbose=1,
)

Epoch 1/100


I0000 00:00:1745394996.551800    8106 service.cc:148] XLA service 0x7f85300090b0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1745394996.551838    8106 service.cc:156]   StreamExecutor device (0): NVIDIA A10G, Compute Capability 8.6
2025-04-23 09:56:36.633483: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
I0000 00:00:1745394996.911385    8106 cuda_dnn.cc:529] Loaded cuDNN version 90300
2025-04-23 09:56:37.061743: W external/local_xla/xla/service/gpu/nvptx_compiler.cc:930] The NVIDIA driver's CUDA version is 12.4 which is older than the PTX compiler version 12.5.82. Because the driver is older than the PTX compiler version, XLA is disabling parallel compilation, which may slow down compilation. You should update your NVIDIA driver or use the NVIDIA-provided CUDA forward compatibility packages.




[1m45/92[0m [32m━━━━━━━━━[0m[37m━━━━━━━━━━━[0m [1m0s[0m 3ms/step - HingeMargin: 0.1574 - loss: -0.4738

I0000 00:00:1745395001.615301    8106 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


[1m92/92[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 29ms/step - HingeMargin: 0.1058 - loss: -2.0724 - val_HingeMargin: 0.0240 - val_loss: -5.3713
Epoch 2/100
[1m92/92[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - HingeMargin: 0.0312 - loss: -5.2590 - val_HingeMargin: 0.0243 - val_loss: -5.4120
Epoch 3/100
[1m92/92[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - HingeMargin: 0.0292 - loss: -5.3180 - val_HingeMargin: 0.0253 - val_loss: -5.6822
Epoch 4/100
[1m92/92[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - HingeMargin: 0.0235 - loss: -5.6581 - val_HingeMargin: 0.0210 - val_loss: -5.8057
Epoch 5/100
[1m92/92[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - HingeMargin: 0.0202 - loss: -5.7531 - val_HingeMargin: 0.0180 - val_loss: -5.8916
Epoch 6/100
[1m92/92[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - HingeMargin: 0.0227 - loss: -5.8064 - val_HingeMargin: 0.0195 - val_loss: -5.90

<keras.src.callbacks.history.History at 0x7f85dcda5d50>

In [8]:
vanilla_model = model.vanilla_export()

In [9]:
vanilla_model.compile(
    loss=HKR(
        alpha=alpha, min_margin=min_margin
    ),  # HKR stands for the hinge regularized KR loss
    metrics=[
        # KR,  # shows the KR term of the loss
        HingeMargin(min_margin=min_margin),  # shows the hinge term of the loss
    ],
    optimizer=Adam(learning_rate=0.001),
)

In [11]:
# def convert_reflected_two_logits_model(original_model):
#     """
#     Prend un modèle Keras qui sort un seul logit (z) et retourne un nouveau
#     modèle qui sort deux logits [-z, z].
#     """
#     inputs = original_model.inputs[0]
#     single_logit_output = original_model.outputs[0] # Sortie z, forme (batch, 1)

#     # Calculer -z en utilisant Keras Ops
#     neg_logit = keras.ops.negative(single_logit_output) # Forme (batch, 1)

#     # Concaténer [-z, z] en utilisant une couche Keras ou Keras Ops
#     # Utilisons keras.layers.Concatenate pour rester dans le style API Fonctionnelle
#     two_logits_output = keras.layers.Concatenate(axis=-1, name="reflected_two_logits")([neg_logit, single_logit_output]) # Forme (batch, 2)

#     # Créer le nouveau modèle
#     new_model = keras.Model(inputs=inputs, outputs=two_logits_output, name="reflected_two_logits_model")
#     return new_model

In [12]:
# outputs_2_vanilla_model.compile(
#     loss=MulticlassHKR(
#         alpha=alpha, min_margin=min_margin
#     ),  # HKR stands for the hinge regularized KR loss
#     metrics=[
#         # KR,  # shows the KR term of the loss
#         "accuracy",
#         HingeMargin(min_margin=min_margin),  # shows the hinge term of the loss
#     ],
#     optimizer=Adam(learning_rate=0.001),
# )

In [10]:
layer = vanilla_model.layers[-1]
new_dense = Dense(units=3, activation=None, use_bias=True)
vanilla_model_bis = keras.models.Sequential(vanilla_model.layers[:-1] + [new_dense])

In [11]:
new_dense(layer.input) # compile and erase weights

<KerasTensor shape=(None, 3), dtype=float32, sparse=False, ragged=False, name=keras_tensor_9>

In [12]:
w_temp = np.zeros((16,3), dtype = 'float32')
b_temp = np.zeros((3,))
b_temp[1:] = -10000

w = layer.get_weights()[0] #(16,2)
w_temp[:,:1] = w

In [13]:
new_dense.set_weights([w_temp, b_temp])

In [14]:
vanilla_model_bis.compile(
        # decreasing alpha and increasing min_margin improve robustness (at the cost of accuracy)
        # note also in the case of lipschitz networks, more robustness require more parameters.
        loss=MulticlassHKR(alpha=100, min_margin=0.25),
        optimizer=Adam(1e-4),
        metrics=["accuracy", MulticlassKR()],)

In [15]:
vanilla_model_bis.summary()

In [16]:
vanilla_model_bis.evaluate(x_test, y_test)

[1m62/62[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - MulticlassKR: 2.0904 - accuracy: 0.5073 - loss: 656962.5625


[668720.375, 0.4984646737575531, 2.210339069366455]

In [17]:
vanilla_model_bis.predict(x_test[:10])

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 174ms/step


array([[ 3.8609564e+00, -1.0000000e+04, -1.0000000e+04],
       [ 3.4461536e+00, -1.0000000e+04, -1.0000000e+04],
       [ 3.2614102e+00, -1.0000000e+04, -1.0000000e+04],
       [ 4.9798212e+00, -1.0000000e+04, -1.0000000e+04],
       [ 2.8508511e+00, -1.0000000e+04, -1.0000000e+04],
       [ 1.0446979e+00, -1.0000000e+04, -1.0000000e+04],
       [-1.7230902e+00, -1.0000000e+04, -1.0000000e+04],
       [ 4.3527122e+00, -1.0000000e+04, -1.0000000e+04],
       [ 5.1488953e+00, -1.0000000e+04, -1.0000000e+04],
       [-3.2299943e+00, -1.0000000e+04, -1.0000000e+04]], dtype=float32)

In [18]:
y_test[:10]

array([[1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [0.],
       [1.],
       [1.],
       [0.]], dtype=float32)

In [19]:
def convert_reflected_two_logits_model2(original_model):
    """
    Prend un modèle Keras qui sort un seul logit (z) et retourne un nouveau
    modèle qui sort deux logits [-z, z].
    """
    inputs = original_model.inputs
    
    single_logit_output = original_model.outputs[0][:,:1] # Sortie z, forme (batch, 1)
    print(single_logit_output.shape, single_logit_output)
    # Calculer -z en utilisant Keras Ops
    neg_logit = keras.ops.negative(single_logit_output) # Forme (batch, 1)

    # Concaténer [-z, z] en utilisant une couche Keras ou Keras Ops
    # Utilisons keras.layers.Concatenate pour rester dans le style API Fonctionnelle
    two_logits_output = keras.layers.Concatenate(axis=-1, name="reflected_two_logits")([neg_logit, single_logit_output, original_model.outputs[0][:,1:3]]) # Forme (batch, 2)

    # # Créer le nouveau modèle
    new_model = keras.Model(inputs=inputs, outputs=[two_logits_output], name="reflected_two_logits_model")
    return new_model

In [20]:
outputs_2_vanilla_model = convert_reflected_two_logits_model2(vanilla_model_bis)

(None, 1) <KerasTensor shape=(None, 1), dtype=float32, sparse=False, ragged=False, name=keras_tensor_15>


In [21]:
y_test[:10]

array([[1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [0.],
       [1.],
       [1.],
       [0.]], dtype=float32)

In [22]:
outputs_2_vanilla_model.predict(x_test[:10])

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 183ms/step


Expected: ['keras_tensor_10']
Received: inputs=Tensor(shape=(10, 1, 28, 28))


array([[-3.8609564e+00,  3.8609564e+00, -1.0000000e+04, -1.0000000e+04],
       [-3.4461536e+00,  3.4461536e+00, -1.0000000e+04, -1.0000000e+04],
       [-3.2614102e+00,  3.2614102e+00, -1.0000000e+04, -1.0000000e+04],
       [-4.9798212e+00,  4.9798212e+00, -1.0000000e+04, -1.0000000e+04],
       [-2.8508511e+00,  2.8508511e+00, -1.0000000e+04, -1.0000000e+04],
       [-1.0446979e+00,  1.0446979e+00, -1.0000000e+04, -1.0000000e+04],
       [ 1.7230902e+00, -1.7230902e+00, -1.0000000e+04, -1.0000000e+04],
       [-4.3527122e+00,  4.3527122e+00, -1.0000000e+04, -1.0000000e+04],
       [-5.1488953e+00,  5.1488953e+00, -1.0000000e+04, -1.0000000e+04],
       [ 3.2299943e+00, -3.2299943e+00, -1.0000000e+04, -1.0000000e+04]],
      dtype=float32)

In [23]:
outputs_2_vanilla_model.summary()

In [24]:
outputs_2_vanilla_model.save("/home/aws_install/robustess_project/lip_models/demo3_FC_vanilla_MNIST08_channelfirst_False_disj_Neurons_single_output_converted_2.keras")
vanilla_model.save("/home/aws_install/robustess_project/lip_models/demo3_FC_vanilla_MNIST08_channelfirst_False_disj_Neurons_single_output.keras")
vanilla_model_bis.save("/home/aws_install/robustess_project/lip_models/demo3_FC_vanilla_MNIST08_channelfirst_False_disj_Neurons_single_output_converted_2_4_logits.keras")

# Test 4 outputs in 1


In [25]:
vanilla_model.summary()

In [26]:
layer = vanilla_model.layers[-1]
new_dense = Dense(units=4, activation=None, use_bias=True)
vanilla_model_bis = keras.models.Sequential(vanilla_model.layers[:-1] + [new_dense])

In [27]:
new_dense(layer.input) # compile and erase weights

<KerasTensor shape=(None, 4), dtype=float32, sparse=False, ragged=False, name=keras_tensor_19>

In [31]:
w_temp = np.zeros((16,4), dtype = 'float32')
b_temp = np.zeros((4,))
b_temp[2:] = -10000

w = layer.get_weights()[0] 
w_temp[:,0:1] = -w
w_temp[:,1:2] = w

In [32]:
new_dense.set_weights([w_temp, b_temp])

In [33]:
vanilla_model_bis.summary()

In [34]:
vanilla_model_bis.predict(x_test[:10])

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 172ms/step


array([[-3.8609564e+00,  3.8609564e+00, -1.0000000e+04, -1.0000000e+04],
       [-3.4461536e+00,  3.4461536e+00, -1.0000000e+04, -1.0000000e+04],
       [-3.2614102e+00,  3.2614102e+00, -1.0000000e+04, -1.0000000e+04],
       [-4.9798212e+00,  4.9798212e+00, -1.0000000e+04, -1.0000000e+04],
       [-2.8508511e+00,  2.8508511e+00, -1.0000000e+04, -1.0000000e+04],
       [-1.0446979e+00,  1.0446979e+00, -1.0000000e+04, -1.0000000e+04],
       [ 1.7230902e+00, -1.7230902e+00, -1.0000000e+04, -1.0000000e+04],
       [-4.3527122e+00,  4.3527122e+00, -1.0000000e+04, -1.0000000e+04],
       [-5.1488953e+00,  5.1488953e+00, -1.0000000e+04, -1.0000000e+04],
       [ 3.2299943e+00, -3.2299943e+00, -1.0000000e+04, -1.0000000e+04]],
      dtype=float32)

In [36]:
y_test[:10]

array([[1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [0.],
       [1.],
       [1.],
       [0.]], dtype=float32)

As we can see the model reach a very decent accuracy on this task.