# This is how a neural network learns to add, multiply and compare handwritten digits WITHOUT knowing their values 

<p align="center"> <img src="https://i.dlpng.com/static/png/6906777_preview.png"> </p>   

I described in a [previous post](https://blog.jovian.ai/how-to-train-supervised-machine-learning-algorithms-without-labeled-data-6ebddc01a00f), how useful are autoencoders in  automated labeling. The main property of these networks is their ability to learn features/patterns in the data. This is in fact not specific to autoencoders and can be implemented using other unsupervised techniques, mainly **PCA**.  
The ability to detect and learn features in data can be used in other areas.  

In this post, I will present some applications of convolutional autoencoders:  
- First, a convolutional autoencoder will be trained on **MNIST** data.
- After the training of the encoder and decoder, we will freeze their weights and use them with additional dense layers to "learn" arithmetic operations, namely addition, multiplication and comparison.  
The trick is to *never* explicitly associate the handwritten digits in **MNIST** dataset with their respective labels. We will see that the neural networks will be nevertheless able to reach 97+% accuracy in all cases on unseen data.

The first step is described in the following diagram:
<p align="center"> <img src="https://i.imgur.com/chLUEdp.png"> </p>   

In the second step, we will use the encoder in series with dense layers to perform arithmetic operations: addition, multiplication and comparison. We will train only the dense layer weights, and supply the results of the operations as labels. note that we will not supply the digits values (labels).

<p align="center"> <img src="https://i.imgur.com/s8U8up4.png"> </p> 


# Training an autoencoder on MNIST data

Similar to the previous article, we will use MNIST data in this experiment. The autoencoder will learn the handwritten digits features using 60000 training samples. We import MNIST using *KERAS* library.

In [1]:
#import libraries and setup 
import keras
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import logging
logging.getLogger('tensorflow').disabled = True
from keras.models import Sequential, Model
from keras.layers import Dense, Conv2D, Flatten, MaxPooling2D, UpSampling2D, Reshape, Concatenate, Input
from keras.callbacks import EarlyStopping
from tensorflow.keras.utils import to_categorical
es = EarlyStopping(monitor='val_loss', mode='min', verbose=2, patience=10, restore_best_weights=True)

In [2]:
# import mnist
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
print(x_train.shape,y_train.shape)

(60000, 28, 28) (60000,)


We scale the data in the range `[0,1]` and reshape it to *KERAS* format for pictures (nbr_samples x width x height x channels) 

In [3]:
#normalize data
if x_train.max() >1:
    x_train = x_train / 255
    x_test = x_test / 255

default_shape = x_train.shape
#reshape input data to 1 channel
x_train = x_train.reshape(-1,default_shape[1],default_shape[2],1)
x_test = x_test.reshape(-1,default_shape[1],default_shape[2],1)
image_dim = x_train.shape[1:]

We will implement a similar autoencoder architecture as in [[1]](https://blog.jovian.ai/how-to-train-supervised-machine-learning-algorithms-without-labeled-data-6ebddc01a00f). It is based on a series of convolutional layers, that will gradually encode the 28x28 image (784 pixel) into a 100 elements array, and decode that representation back to the original format. The resulting image -after the training step- will hopefully resemble to the original one.

In [None]:
# create an autoencoder / decoder 
encoder = Sequential()
encoder.add(Conv2D(32,kernel_size=(3,3), strides=(1,1),padding='same', activation='selu',input_shape=image_dim))
encoder.add(MaxPooling2D(2,2))
encoder.add(Conv2D(64,kernel_size=(3,3), strides=(1,1),padding='same',activation='selu'))
encoder.add(MaxPooling2D(2,2))
encoder.add(Conv2D(128,kernel_size=(3,3), strides=(1,1),padding='same',activation='selu'))
encoder.add(Flatten())
encoder.add(Dense(100,activation='sigmoid'))
encoder.summary()

In [None]:
encoder_out_dim = encoder.layers[-1].output_shape[1:] # dimension of the encoder output

In [None]:
decoder = Sequential()
decoder.add(Dense(6272, activation='sigmoid', input_shape=encoder_out_dim))
decoder.add(Reshape(( 7, 7, 128)))
decoder.add(Conv2D(128,kernel_size=(3,3), strides=(1,1),padding='same', activation='selu'))
decoder.add(UpSampling2D((2,2)))
decoder.add(Conv2D(64,kernel_size=(3,3), strides=(1,1),padding='same', activation='selu'))
decoder.add(UpSampling2D((2,2)))
decoder.add(Conv2D(1,kernel_size=(3,3), strides=(1,1),padding='same', activation='sigmoid'))

decoder.summary()

The autoencoder is created using the encoder and the decoder:

In [None]:
enc_dec = Sequential([encoder,decoder])
enc_dec.summary()

It will be trained as a set of binary classifiers for each pixel.

In [None]:
enc_dec.compile(optimizer='nadam', loss = 'binary_crossentropy')
history = enc_dec.fit(x_train,x_train, batch_size=1000,epochs=1000,validation_split=0.2, verbose=2,callbacks=[es,es])

The early stopping will make sure the autoencoder will not overfit the training data. There are two ways to verify the network. First, we can evaluate the loss function on test data, and expect it to be close to the loss value on the training data.

In [None]:
enc_dec.evaluate(x_test,x_test,batch_size=1000)

In [None]:
enc_dec.evaluate(x_train,x_train,batch_size=1000)

It is very close, around `0.08` for both data sets. The second method is to check the resulting reconstitution that we obtain for a random sample from the test data.

In [None]:
random_label = np.random.randint(0,9999)
img_sample = x_test[random_label,:,:].reshape((1,28,28,1))
plt.imshow(img_sample.reshape(28,28), cmap='gray');
pred_img = enc_dec.predict(img_sample) 
plt.figure();
plt.imshow(pred_img.reshape(28,28), cmap='gray');

*A picture is worth a thousand words!* Just to be on the safe side, I ran this test multiple times and the results were consistent. Let's save the encoder and the decoder.

In [None]:
# save models
encoder.save('encoder')
decoder.save('decoder')

Now we have a trained encoder and decoder, let's focus on the *encoder*. For each image, is associated a representation that captures most of the interesting features. This representation is sufficient to reconstitute the image using the decoder. Here is the representation of the sample image we used earlier: 

In [None]:
representation_sample = encoder.predict(img_sample)
print(representation_sample) 

Using these 100 numbers, we generate a 28x28 image (784 pixels).

In [None]:
recons_image = decoder.predict(representation_sample)
plt.imshow(recons_image.reshape(28,28), cmap='gray');

And here is where the *fun part* begins! using the lower-dimension representation, let's do some math.

# Learning how to add two handwritten digits

The idea is simple. Using the representation of two images, we train a neural network to compute their sum. We will not provide the value of each digit, but we will provide the sum during the training step.  
We will be performing addition between numbers in the range [0-9]. The results will be in the range [0-18]. So the results will be coded using two outputs:  
1- Units, multiclass output [0,1,2,3,4,5,6,7,8,9]  
2- Tens, binary output [0,1]  

<p align="center"> <img src="https://i.imgur.com/Zgnd82F.png"> </p> 


Using the functional API in *KERAS* we define the network architecture. First, we import the encoder *twice* and freeze its weights:

In [4]:
# duplicate encoders and freeze weights
encoder1 = keras.models.load_model('encoder') 
encoder1._name = 'encoder1'
encoder1.trainable = False

encoder2 = keras.models.load_model('encoder')
encoder2._name = 'encoder2'
encoder2.trainable = False

Using the encoders, we build the 'addition' model:

In [None]:
# create model to learn addition
input1 = Input(shape=image_dim)
input2 = Input(shape=image_dim)
enc1_out = encoder1(input1)
enc2_out = encoder2(input2)
model_c = Concatenate()([enc1_out,enc2_out])
model_c = Dense(1000,activation='relu')(model_c)

model_b1 = Dense(200,activation='relu')(model_c)
model_b2 = Dense(200,activation='relu')(model_c)

model_b1 = Dense(100,activation='relu')(model_b1)
model_b2 = Dense(100,activation='relu')(model_b2)

units =  Dense(10,activation='softmax',name ='units')(model_b1)
tens = Dense(1,activation='sigmoid',name ='tens')(model_b2)

model_addition = Model(inputs=[input1,input2],outputs=[units,tens])

model_addition.compile(optimizer='nadam', loss = ['categorical_crossentropy','binary_crossentropy'], metrics=['acc'])

This model has two inputs (the two handwritten digits images) and two outputs (units and tens of the sum). We will use two different losses due to the nature of the outputs. Note that there is a common hidden layer of 1000 units, and then two branches (one for each output).  
We need to create datasets to train and test our model. Inputs will be random combinations of handwritten digits. Outputs will be the sums for each combination. 

In [None]:
# generate a dataset for additions
train_size = 200000
random_labels1 = np.random.randint(0,25000,train_size)
random_labels2 = np.random.randint(0,25000,train_size)

x_train_1 = x_train[random_labels1]
x_train_2 = x_train[random_labels2]

y_train_1 = y_train[random_labels1]
y_train_2 = y_train[random_labels2]

y_add = y_train_1 + y_train_2
y_add_tens = y_add //10 
y_add_units = y_add %10 
y_add_units_cat = to_categorical(y_add_units)


# the same with x_test
test_size = 5000
random_labels1 = np.random.randint(0,10000,test_size)
random_labels2 = np.random.randint(0,10000,test_size)

x_test_1 = x_test[random_labels1]
x_test_2 = x_test[random_labels2]

y_test_1 = y_test[random_labels1]
y_test_2 = y_test[random_labels2]

y_test_add = y_test_1 + y_test_2
y_test_add_tens = y_test_add //10 
y_test_add_units = y_test_add %10 
y_test_add_units_cat = to_categorical(y_test_add_units)

Now we are ready to train our model! 

In [None]:
history_addition = model_addition.fit([x_train_1,x_train_2],[y_add_units_cat,y_add_tens], batch_size=100,epochs=1000,validation_split=0.2, verbose=2,callbacks=[es,es])

At the end of the training, the accuracy on both outputs is pretty good (98% and 99,5%). Let's see first how the model performs on the test data.

In [None]:
test_results = model_addition.evaluate([x_test_1,x_test_2],[y_test_add_units_cat,y_test_add_tens],batch_size=1000)

Results are still in the 9x%. We can show a random sample of the model predictions.

In [None]:
random_label_1 = np.random.randint(0,9999)
random_label_2 = np.random.randint(0,9999)

img_sample1 = x_test[random_label_1,:,:].reshape((1,28,28,1))
img_sample2 = x_test[random_label_2,:,:].reshape((1,28,28,1))

plt.subplot(1,2,1)
plt.imshow(img_sample1.reshape(28,28), cmap='gray');

plt.subplot(1,2,2)
plt.imshow(img_sample2.reshape(28,28), cmap='gray');

prediction = model_addition.predict([img_sample1,img_sample2])
unit = prediction[0]
ten = prediction[1]

sum_images = np.argmax(unit)+10*np.round(ten)
print('sum =',sum_images)

Results look promising! We actually could improve the accuracy by training the model on more random samples (increase `train_size` value) or tweak the model architecture. One last thing: save the model!

In [None]:
# save the model
model_addition.save('model_addition')

# Learning how to multiply two handwritten digits

Using a similar method, we can train a neural network to compute multiplication result of two handwritten digits. The main difference is that the output will be in the range [0,81]. The network will output two values:  
1- units, multiclass [0,1,2,3,4,5,6,7,8,9]  
2- tens, multiclass [0,1,2,3,4,5,6,7,8] 

We will use the same architecture as previously, with a slight modification in the output layer (softmax instead of sigmoid, and 8 neurons instead of 1)

In [None]:
# duplicate encoders and freeze weights
encoder3 = keras.models.load_model('encoder') 
encoder3._name = 'encoder1'
encoder3.trainable = False

encoder4 = keras.models.load_model('encoder')
encoder4._name = 'encoder2'
encoder4.trainable = False

In [None]:
# create model to learn multiplication
input1 = Input(shape=image_dim)
input2 = Input(shape=image_dim)
enc1_out = encoder3(input1)
enc2_out = encoder4(input2)
model_c = Concatenate()([enc1_out,enc2_out])
model_c = Dense(1000,activation='relu')(model_c)

model_b1 = Dense(200,activation='relu')(model_c)
model_b2 = Dense(200,activation='relu')(model_c)

model_b1 = Dense(100,activation='relu')(model_b1)
model_b2 = Dense(100,activation='relu')(model_b2)

units =  Dense(10,activation='softmax',name ='units')(model_b1)
tens = Dense(9,activation='softmax',name ='tens')(model_b2)

model_mult = Model(inputs=[input1,input2],outputs=[units,tens])

model_mult.compile(optimizer='nadam', loss = ['categorical_crossentropy','categorical_crossentropy'], metrics=['acc'])

Now we need to create data for training and testing as we did previously. We already generated random images, so all we need now is to create labels by multiplying the values.

In [None]:
# generate a dataset for multiplication

y_mult = y_train_1 * y_train_2
y_mult_tens = y_mult //10 
y_mult_units = y_mult %10 
y_mult_units_cat = to_categorical(y_mult_units)
y_mult_tens_cat = to_categorical(y_mult_tens)

# the same with x_test

y_test_mult = y_test_1 * y_test_2
y_test_mult_tens = y_test_mult //10 
y_test_mult_units = y_test_mult %10 
y_test_mult_units_cat = to_categorical(y_test_mult_units)
y_test_mult_tens_cat = to_categorical(y_test_mult_tens)

Next step is to train the model, and test it.

In [None]:
history_mult = model_mult.fit([x_train_1,x_train_2],[y_mult_units_cat,y_mult_tens_cat], batch_size=100,epochs=1000,validation_split=0.2, verbose=2,callbacks=[es,es])

In [None]:
test_results = model_mult.evaluate([x_test_1,x_test_2],[y_test_mult_units_cat,y_test_mult_tens_cat],batch_size=1000)

We achieve similar performances (slightly better actually!) when compared to addition. Let's see how the model works on sample data: 

In [None]:
random_label_1 = np.random.randint(0,9999)
random_label_2 = np.random.randint(0,9999)

img_sample1 = x_test[random_label_1,:,:].reshape((1,28,28,1))
img_sample2 = x_test[random_label_2,:,:].reshape((1,28,28,1))

plt.subplot(1,2,1)
plt.imshow(img_sample1.reshape(28,28), cmap='gray');

plt.subplot(1,2,2)
plt.imshow(img_sample2.reshape(28,28), cmap='gray');

prediction = model_mult.predict([img_sample1,img_sample2])
unit = prediction[0]
ten = prediction[1]

mult_images = np.argmax(unit)+10*np.argmax(ten)
print('multiplication result =',mult_images)

In [None]:
# save the model
model_mult.save('model_mult')

# Learning how to compare two handwritten digits

The last arithmetic operation our model will predict is the comparison. The model will have one binary output (1 if image1 > image2 and 0 elsewhere). We proceed the same way as previously.

In [None]:
# duplicate encoders and freeze weights
encoder5 = keras.models.load_model('encoder') 
encoder5._name = 'encoder1'
encoder5.trainable = False

encoder6 = keras.models.load_model('encoder')
encoder6._name = 'encoder2'
encoder6.trainable = False

In [None]:
# create model to learn comparison
input1 = Input(shape=image_dim)
input2 = Input(shape=image_dim)
enc1_out = encoder5(input1)
enc2_out = encoder6(input2)
model_c = Concatenate()([enc1_out,enc2_out])
model_c = Dense(1000,activation='relu')(model_c)

model_c = Dense(200,activation='relu')(model_c)

model_c = Dense(100,activation='relu')(model_c)

comp =  Dense(1,activation='sigmoid',name ='comp')(model_c)


model_comp = Model(inputs=[input1,input2],outputs=[comp])

model_comp.compile(optimizer='nadam', loss = ['binary_crossentropy'], metrics=['acc'])

In [None]:
# generate a dataset for comparison
y_comp = y_train_1 > y_train_2

# the same with x_test
y_test_comp = y_test_1 > y_test_2

In [None]:
history_comp = model_comp.fit([x_train_1,x_train_2],y_comp, batch_size=100,epochs=1000,validation_split=0.2, verbose=2,callbacks=[es])

In [None]:
test_results = model_comp.evaluate([x_test_1,x_test_2],y_test_comp,batch_size=1000)

In [None]:
random_label_1 = np.random.randint(0,9999)
random_label_2 = np.random.randint(0,9999)

img_sample1 = x_test[random_label_1,:,:].reshape((1,28,28,1))
img_sample2 = x_test[random_label_2,:,:].reshape((1,28,28,1))

plt.subplot(1,2,1)
plt.imshow(img_sample1.reshape(28,28), cmap='gray');

plt.subplot(1,2,2)
plt.imshow(img_sample2.reshape(28,28), cmap='gray');

prediction = np.round(model_comp.predict([img_sample1,img_sample2]))


print('comparison result =',prediction,'1 if the number on the left is greater, 0 elsewhere')

In [None]:
# save the model
model_comp.save('model_comp')

# Conclusion and future work