# Quantizing a Model with QKeras

In this section we will quantize a model using QKeras.

We will start by a simple model to perform MNIST classification.

In [None]:
from tensorflow.keras.datasets import mnist
from qkeras import *
from tensorflow.keras.layers import *
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import *
from tensorflow.keras.utils import to_categorical

In [None]:
def get_model():
    x = x_in = Input((784,))
    x = Dense(20)(x)
    x = Activation("relu")(x)
    x = Dense(10)(x)
    x = Activation("softmax")(x)
    
    model = Model(inputs=x_in, outputs=x)
    
    return model

def get_train_test_set():
    (x_train, y_train), (x_test, y_test) = mnist.load_data()
    
    x_train = x_train / 256.0
    x_test = x_test / 256.0
    
    x_train = x_train.reshape(x_train.shape[0], -1)
    x_test = x_test.reshape(x_test.shape[0], -1)
        
    y_train = to_categorical(y_train, 10)
    y_test = to_categorical(y_test, 10)
    
    return (x_train, y_train), (x_test, y_test)

In [None]:
(x_train, y_train), (x_test, y_test) = get_train_test_set()

model = get_model()

model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])

In [None]:
history = model.fit(x_train, y_train, epochs=10, validation_split=0.1, verbose=True, batch_size=32)

evaluate = model.evaluate(x_test, y_test)

print("loss = {:.6f}, accuracy = {:.4f}".format(evaluate[0], evaluate[1]))

Now, let's create a quantized model with 2 bits on the inputs and weights ($2,0,1$ means weights and bias are quantized using 2 bits, with 0 bits to the left of the decimal point, and using symmetric representations for positive and negative numbers), and with 3-bits as input to the last layer with weights and biases with 4 bits.

In [None]:
def get_qmodel():
    x = x_in = Input((784,))
    x = QActivation("quantized_relu(2)")(x)
    x = QDense(20,
               kernel_quantizer=quantized_bits(2,0,1,alpha=1),
               bias_quantizer=quantized_bits(2,0,1))(x)
    x = QActivation("quantized_relu(3,1)")(x)
    x = QDense(10,
               kernel_quantizer=quantized_bits(4,0,1,alpha=1),
               bias_quantizer=quantized_bits(4,0,1))(x)
    x = Activation("softmax")(x)
    
    model = Model(inputs=x_in, outputs=x)
    
    print_qstats(model)
    
    return model

In [None]:
qmodel = get_qmodel()

adam = Adam(lr=0.0005)

qmodel.compile(optimizer=adam, loss="binary_crossentropy", metrics=["accuracy"])

This shows the number and types of operations for each stage of the network.

In QKeras, we need to quantize all tensors (QActivation), and the weights and biases. Please note that we used an artifact to tag the input as a 2 bit input by applying $\tt{quantized\_relu}(2)$ which applies a quantization of 2 bits to the input and specifies that it should only keep positive numbers, thus not allocating the sign bit for the number.

For weight quantization, we used $\tt{quantized\_bits(4,0,1,alpha=1)}$. These parameters mean 4 bits of weights, 0 bits to the left of the decimal point, and 1 means symmetric represenation for positive and negative weights. Finally, $\tt{alpha=1}$ tells the quantizer that this representation will only have mantissa quantization. Without this parameter, QKeras will use shared exponent representation.

You can see that we had to override the learning rate of Adam. Usually in a quantized network, we have to reduce the learning rate.

Let's see how this network behaves now.

In [None]:
history = qmodel.fit(x_train, y_train, epochs=10, validation_split=0.1, verbose=True, batch_size=32)

qmodel.save('section2_model_0.h5')

evaluate = qmodel.evaluate(x_test, y_test)

print("loss = {:.6f}, accuracy = {:.4f}".format(evaluate[0], evaluate[1]))

You should see that the loss now increased (doubled), but the accuracy function reduced by roughly 3%.

You should try to quantize the network by a different amount now.  For example, let's use power-of-2 quantization on the first layer so that we can do a multiplier free first layer. Just have in mind the second parameter of quantized_po2 is the maximum value (a bit different from quantized_bits.

In [None]:
def get_qmodel():
    x = x_in = Input((784,))
    x = QActivation("quantized_relu(2)")(x)
    x = QDense(20,
               kernel_quantizer=quantized_po2(4,1),
               bias_quantizer=quantized_bits(2,0,1))(x)
    x = QActivation("quantized_relu(3,1)")(x)
    x = QDense(10,
               kernel_quantizer=quantized_bits(4,0,1,alpha=1),
               bias_quantizer=quantized_bits(4,0,1))(x)
    x = Activation("softmax")(x)
    
    model = Model(inputs=x_in, outputs=x)
    
    print_qstats(model)
    
    return model

In [None]:
qmodel = get_qmodel()

adam = Adam(lr=0.0005)

qmodel.compile(optimizer=adam, loss="binary_crossentropy", metrics=["accuracy"])

In [None]:
history = qmodel.fit(x_train, y_train, epochs=10, validation_split=0.1, verbose=True, batch_size=32)

qmodel.save('section2_model_1.h5')

evaluate = qmodel.evaluate(x_test, y_test)

print("loss = {:.6f}, accuracy = {:.4f}".format(evaluate[0], evaluate[1]))

Great! Without using any multipliers in the first layer, we were able to get pretty much the same accuracy as before.