# Example use of the quantization framework

In this notebook, we show to use the simple quantization framework developed for this Large Language Model use cas example. 

This framework is used to handle floating points along their quantized versions. 
Essentially, it does so by propagating the quantization parameters such as the scales and zero points throughout all the operators found in the function. 
This function is then meant to be compiled using Concrete Python, creating a circuit that can be executed in FHE.

### Imports

In [1]:
from pprint import pprint

import numpy as np
from concrete.fhe.tracing import Tracer
from quant_framework import DualArray, Quantizer

from concrete import fhe

### Function definition 

Let us define the function to compile using Concrete Python. We also generate the input floating point values. 
These values will also be used for the calibration step, which essentially computes and stores the quantization parameters while executing the function.


In [2]:
np.random.seed(42)

n_values = 10
x_calib = np.random.randn(n_values)

In [3]:
# Create the quantizer. Higher N_BITS will give better results for the floating point comparison
N_BITS = 8
quantizer = Quantizer(n_bits=N_BITS)


def finalize(x):
    """Finalize the output value.

    If the DualArray's integer array is a Tracer, an object used during compilation, return it
    as is. Else, return the DualArray. This is called at the end of the run_numpy method because
    the compiler can only consider Tracer objects or Numpy arrays as input and outputs.
    """
    if isinstance(x.int_array, Tracer):
        return x.int_array
    return x


def f(q_x):
    """Function made of quantized operators to compile."""

    # Convert the inputs to a DualArray instance using the stored calibration data. This is
    # necessary as Concrete Python can only compile functions that inputs Numpy arrays, while we
    # still want to be able to propagate the quantization parameters throughout the different
    # operators
    dual_x = DualArray(float_array=x_calib, int_array=q_x, quantizer=quantizer)
    dual_y = dual_x.exp(key="exp")
    dual_x = dual_x.add(dual_y, "add")
    dual_x = dual_x.truediv(3, key="truediv")
    dual_x = dual_x.quantize(key="output")
    return finalize(dual_x)

Let us define the expected function using Numpy operators. in order to be able to compare the results.

In [4]:
def expected_f(x):
    """Expected function made of float operators."""
    return (np.exp(x) + x) / 3

### Calibrate the function

In [5]:
# Convert the input to a DualArray and quantize it
dual_x = DualArray(x_calib, quantizer=quantizer).quantize(key="input")

# Calibrate the function in order to compute and store the quantization parameters
dual_result = f(dual_x.int_array)

In [6]:
# At this step, the quantizer contains all the needed quantization parameters (scale, zero_point)
pprint(quantizer.scale_dict)

{'add_sub_add_requant': (0.012416286069347535, 0),
 'add_sub_add_self': (0.012416286069347535, 0),
 'exp': (0.012416286069347535, 0),
 'input': (0.012434746578798358, 0),
 'output': (0.016833222749391915, 0),
 'truediv': (0.012405140942975472, 0)}


### Clear result comparison

We can now compare the expected result with the float result computed bu the quantized function.

In [7]:
expected_output = expected_f(x_calib)

# De-quantize the output
float_output = dual_result.dequantize("output").float_array

# Compare the float output with the expected output using the MAE score
output_mae = np.mean(np.abs(float_output - expected_output))

print(f"MAE between the expected result and the computed float result: {output_mae:.4f}")

MAE between the expected result and the computed float result: 0.0051


Now that the function is fully calibrated, it is possible to use it with integer values only as well on a new input.

In [8]:
# Create a new input and quantize it
new_x = np.random.randn(10)
new_int_x = quantizer.quantize(new_x, key="input")

# Retrieve the integer result
new_int_result = f(new_int_x).int_array

# Dequantize the output
new_float_output = quantizer.dequantize(new_int_result, key="output")

# Compare the float output with the expected output using the MAE score
new_expected_output = expected_f(new_x)
new_output_mae = np.mean(np.abs(new_float_output - new_expected_output))

print(f"MAE between the expected result and the computed float result: {new_output_mae:.4f}")

MAE between the expected result and the computed float result: 0.0028


### Compilation with Concrete Python

Since the function can work with integer values only, it is possible to compile it and build an FHE circuit using Concrete Python. 

In [9]:
# Instantiate the compiler
compiler = fhe.Compiler(f, {"q_x": "encrypted"})

# Build the inputset as a batch of single quantized input
inputset = list(quantizer.quantize(x_calib, key="input"))

# Compile the function using the inputset and print the computation graph
circuit = compiler.compile(inputset, show_graph=True)

# Generate the keys
circuit.keygen()


Computation Graph
--------------------------------------------------------------------------------
 %0 = q_x                      # EncryptedScalar<int8>          ∈ [-38, 127]
 %1 = 0                        # ClearScalar<uint1>             ∈ [0, 0]
 %2 = subtract(%0, %1)         # EncryptedScalar<int8>          ∈ [-38, 127]
 %3 = 0                        # ClearScalar<uint1>             ∈ [0, 0]
 %4 = subtract(%0, %3)         # EncryptedScalar<int8>          ∈ [-38, 127]
 %5 = subgraph(%4)             # EncryptedScalar<int8>          ∈ [-38, 127]
 %6 = subgraph(%2)             # EncryptedScalar<uint9>         ∈ [50, 390]
 %7 = 1                        # ClearScalar<uint1>             ∈ [1, 1]
 %8 = multiply(%7, %6)         # EncryptedScalar<uint9>         ∈ [50, 390]
 %9 = add(%5, %8)              # EncryptedScalar<uint10>        ∈ [12, 517]
%10 = 0                        # ClearScalar<uint1>             ∈ [0, 0]
%11 = subtract(%9, %10)        # EncryptedScalar<uint10>        ∈ [12, 5

Now, we can evaluate the function in FHE using simulation or not, and then compare these results.

In [10]:
input_0 = new_int_x[0]

# Compute the result in the clear directly using the quantized operators
clear_evaluation = f(input_0)

# Compute the result in the clear using FHE simulation
simulated_evaluation = circuit.simulate(input_0)

# Compute the result in FHE
fhe_evaluation = circuit.encrypt_run_decrypt(input_0)

print((clear_evaluation.int_array, simulated_evaluation, fhe_evaluation))
print(
    "Results are identical:",
    all((clear_evaluation.int_array, simulated_evaluation, fhe_evaluation)),
)

(3, 3, 3)
Results are identical: True
