## Modulo Experiments

This notebook depicts an attempt at getting a neural network to learn a modulo function.

The general consensus is that a neural network can fit any function (Cybenko, 1990) http://cognitivemedium.com/magic_paper/assets/Cybenko.pdf

However, given that the neurons in a traditional neural network are typically only adding or subtracting the input weights * input data value, it is difficult to approximate multiplication and division.

A modulo involves calculating both addition/subtraction and multiplication/division, hence it is fundamentally challenging.

However, can a neural network learn a concept, such as how to calculate a modulo function? Such concepts are typically calculated using a procedural set of instructions, and is not exactly the domain of traditional neural networks.

The modulo function is basically the remainder. Some examples are:
- 5 % 2 = 1
- 5 % 3 = 2
- 6 % 2 = 0

One example of a set of instructions for modulo of a % b is as follows:
1) Calculate c = a//b (the floor of a/b)
2) Calculate modulo = a - c*b

Dependencies:
- tensorflow
- numpy

## Inputs/Outputs

Train Inputs:
250000 random integers from 0 to 2^20 (binarized)

Test Inputs:
10000 random integers from 2^20 to 2^21 (binarized)

Outputs:
Modulo of the input with factor 7

## Base Model
The base model in this notebook uses 3 different models:
- MLP with 1 hidden layer of 1000 nodes with ReLU activation, output softmax layer of 7 nodes
- MLP with 2 hidden layers of 1000 nodes with ReLU activation, output softmax layer of 7 nodes
- a ResNet with 3 layers, with skip connections between all layers, output softmax layer of 7 nodes

## Baseline Results
MLP with 1/2 hidden layers of 1000 nodes, ResNet: 
We are able to get a train accuracy of up to 100%, but the test accuracy tends to 0%!
Strong overfitting here.
This also means that the model is unable to generalize.

# What can we do?
We try five approaches:
- Add divisor as input
- Using whole numbers instead of binary as input
- Few-shot learning
- One-shot learning
- Zero-shot learning with X-factor

# Binary Representation (Baseline Results)

In [5]:
import tensorflow as tf, numpy as np

# hyperparameters here
divisor = 7

# convert a number into binary
def int2bits(i,fill=21): 
    return list(map(int,bin(i)[2:].zfill(fill)))

def bits2int(b):
    return sum(i*2**n for n,i in enumerate(reversed(b)))

# Data. 
I = np.random.randint(0,2**20,size=(250000,))
X = np.array(list(map(int2bits,I)))
Y = np.array([int2bits(2**i,divisor) for i in I % divisor])

# Test Data. 
It = np.random.randint(2**20,2**21,size=(10000,))
Xt = np.array(list(map(int2bits,It)))
Yt = np.array([int2bits(2**i,divisor) for i in It % divisor])

# MLP

In [3]:
# Model.
from tensorflow.keras.layers import Dense, Input, Concatenate
from tensorflow.keras import Model


### Change the model architecture here
########################################################
inputs = Input(shape = (21))

x = Dense(1000, 'relu')(inputs)
outputs = Dense(divisor, 'softmax')(x)

model = Model(inputs=inputs, outputs=outputs)

########################################################

model.compile('adam','categorical_crossentropy',['accuracy'])

# Train (report the final score at the 20th epoch)
model.fit(X,Y,10_000,20,validation_data=(Xt,Yt))

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x7f8761b9e7d0>

In [6]:
# Model.
from tensorflow.keras.layers import Dense, Input, Concatenate
from tensorflow.keras import Model


### Change the model architecture here
########################################################
inputs = Input(shape = (21))
# x = Dense(1000, 'relu')(inputs)

# # Do a ResNet style skip connection
# layer1 = Concatenate()([x, inputs])
# x = Dense(1000, 'relu')(layer1)

# # Do a double skip connection
# layer2 = Concatenate()([x, layer1])
# x = Dense(1000, 'relu')(layer2)

# # Do a triple skip connection
# layer3 = Concatenate()([x, layer2])
# x = Dense(1000, 'relu')(layer3)

x = Dense(1000, 'relu')(inputs)
x = Dense(1000, 'relu')(x)
outputs = Dense(divisor, 'softmax')(x)

model = Model(inputs=inputs, outputs=outputs)

########################################################

model.compile('adam','categorical_crossentropy',['accuracy'])

# Train (report the final score at the 20th epoch)
model.fit(X,Y,10_000,20,validation_data=(Xt,Yt))

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x7f8761b376d0>

In [7]:
# Model.
from tensorflow.keras.layers import Dense, Input, Concatenate
from tensorflow.keras import Model


### Change the model architecture here
########################################################
inputs = Input(shape = (21))
# x = Dense(1000, 'relu')(inputs)

# # Do a ResNet style skip connection
# layer1 = Concatenate()([x, inputs])
# x = Dense(1000, 'relu')(layer1)

# # Do a double skip connection
# layer2 = Concatenate()([x, layer1])
# x = Dense(1000, 'relu')(layer2)

# # Do a triple skip connection
# layer3 = Concatenate()([x, layer2])
# x = Dense(1000, 'relu')(layer3)

x = Dense(1000, 'relu')(inputs)
x = Dense(1000, 'relu')(x)
x = Dense(1000, 'relu')(x)
outputs = Dense(divisor, 'softmax')(x)

model = Model(inputs=inputs, outputs=outputs)

########################################################

model.compile('adam','categorical_crossentropy',['accuracy'])

# Train (report the final score at the 20th epoch)
model.fit(X,Y,10_000,20,validation_data=(Xt,Yt))

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x7f874d47bb50>

# ResNet

In [8]:
# Model.
from tensorflow.keras.layers import Dense, Input, Concatenate
from tensorflow.keras import Model


### Change the model architecture here
########################################################
inputs = Input(shape = (21))
x = Dense(1000, 'relu')(inputs)

# Do a ResNet style skip connection
layer1 = Concatenate()([x, inputs])
x = Dense(1000, 'relu')(layer1)

# Do a double skip connection
layer2 = Concatenate()([x, layer1])
x = Dense(1000, 'relu')(layer2)

# Do a triple skip connection
layer3 = Concatenate()([x, layer2])
x = Dense(1000, 'relu')(layer3)

# x = Dense(1000, 'relu')(inputs)
# x = Dense(1000, 'relu')(x)
# x = Dense(1000, 'relu')(x)
outputs = Dense(divisor, 'softmax')(x)

model = Model(inputs=inputs, outputs=outputs)

########################################################

model.compile('adam','categorical_crossentropy',['accuracy'])

# Train (report the final score at the 20th epoch)
model.fit(X,Y,10_000,20,validation_data=(Xt,Yt))

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x7f8738bba650>

# Conclusion: Binary inputs with MLP, ResNet doesn't work!

# How about binary with divisor as input?

In [208]:
import tensorflow as tf, numpy as np

# hyperparameters here
divisor = 7

# convert a number into binary
def int2bits(i,fill=24): 
    return list(map(int,bin(i)[2:].zfill(fill)))

def bits2int(b):
    return sum(i*2**n for n,i in enumerate(reversed(b)))

# Data. 
I = np.random.randint(0,2**20,size=(250000,))
X = np.array(list(map(int2bits,I*8+divisor)))
Y = np.array([int2bits(2**i,divisor) for i in I % divisor])

# Test Data. 
It = np.random.randint(2**20,2**21,size=(10000,))
Xt = np.array(list(map(int2bits,It*8+divisor)))
Yt = np.array([int2bits(2**i,divisor) for i in It % divisor])

In [210]:
# Model.
from tensorflow.keras.layers import Dense, Input, Concatenate
from tensorflow.keras import Model


### Change the model architecture here
########################################################
inputs = Input(shape = (24))

x = Dense(100, 'relu')(inputs)
outputs = Dense(divisor, 'softmax')(x)

model = Model(inputs=inputs, outputs=outputs)

########################################################

model.compile('adam','categorical_crossentropy',['accuracy'])

# Train (report the final score at the 20th epoch)
model.fit(X,Y,10_000,20,validation_data=(Xt,Yt))

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x7f85c6cab250>

In [None]:
# Model.
from tensorflow.keras.layers import Dense, Input, Concatenate
from tensorflow.keras import Model


### Change the model architecture here
########################################################
inputs = Input(shape = (21))
# x = Dense(1000, 'relu')(inputs)

# # Do a ResNet style skip connection
# layer1 = Concatenate()([x, inputs])
# x = Dense(1000, 'relu')(layer1)

# # Do a double skip connection
# layer2 = Concatenate()([x, layer1])
# x = Dense(1000, 'relu')(layer2)

# # Do a triple skip connection
# layer3 = Concatenate()([x, layer2])
# x = Dense(1000, 'relu')(layer3)

x = Dense(1000, 'relu')(inputs)
outputs = Dense(divisor, 'softmax')(x)

model = Model(inputs=inputs, outputs=outputs)

########################################################

model.compile('adam','categorical_crossentropy',['accuracy'])

# Train (report the final score at the 20th epoch)
model.fit(X,Y,10_000,20,validation_data=(Xt,Yt))

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20

In [20]:
# Model.
from tensorflow.keras.layers import Dense, Input, Concatenate
from tensorflow.keras import Model


### Change the model architecture here
########################################################
inputs = Input(shape = (24))

x = Dense(1000, 'relu')(inputs)
outputs = Dense(divisor, 'softmax')(x)

model = Model(inputs=inputs, outputs=outputs)

########################################################

model.compile('adam','categorical_crossentropy',['accuracy'])

# Train (report the final score at the 20th epoch)
model.fit(X,Y,10_000,20,validation_data=(Xt,Yt))

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x7f874892cf10>

In [21]:
# Model.
from tensorflow.keras.layers import Dense, Input, Concatenate
from tensorflow.keras import Model


### Change the model architecture here
########################################################
inputs = Input(shape = (24))

x = Dense(1000, 'relu')(inputs)
x = Dense(1000, 'relu')(x)
outputs = Dense(divisor, 'softmax')(x)

model = Model(inputs=inputs, outputs=outputs)

########################################################

model.compile('adam','categorical_crossentropy',['accuracy'])

# Train (report the final score at the 20th epoch)
model.fit(X,Y,10_000,20,validation_data=(Xt,Yt))

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x7f86fdb3a390>

# ResNet

In [23]:
# Model.
from tensorflow.keras.layers import Dense, Input, Concatenate
from tensorflow.keras import Model


### Change the model architecture here
########################################################
inputs = Input(shape = (24))
x = Dense(1000, 'relu')(inputs)

# Do a ResNet style skip connection
layer1 = Concatenate()([x, inputs])
x = Dense(1000, 'relu')(layer1)

# Do a double skip connection
layer2 = Concatenate()([x, layer1])
x = Dense(1000, 'relu')(layer2)

# Do a triple skip connection
layer3 = Concatenate()([x, layer2])
x = Dense(1000, 'relu')(layer3)

outputs = Dense(divisor, 'softmax')(x)

model = Model(inputs=inputs, outputs=outputs)

########################################################

model.compile('adam','categorical_crossentropy',['accuracy'])

# Train (report the final score at the 20th epoch)
model.fit(X,Y,10_000,20,validation_data=(Xt,Yt))

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x7f86fdb5d310>

# Conclusion: Binary with divisor as input doesn't work either

# Whole number with divisor as input

In [36]:
import tensorflow as tf, numpy as np

# hyperparameters here
divisor = 7

# Data. 
I = np.random.randint(0,2**20,size=(250000,))
X = np.array([[i, divisor] for i in I])
Y = np.array([[1 if element%divisor == i else 0 for i in range(7)] for element in I])

# Test Data. 
It = np.random.randint(0,2**20,size=(250000,))
Xt = np.array([[i, divisor] for i in It])
Yt = np.array([[1 if element%divisor == i else 0 for i in range(7)] for element in It])

In [37]:
# Model.
from tensorflow.keras.layers import Dense, Input, Concatenate
from tensorflow.keras import Model

### Change the model architecture here
########################################################
inputs = Input(shape = (2))

x = Dense(1000, 'relu')(inputs)
outputs = Dense(divisor, 'softmax')(x)

model = Model(inputs=inputs, outputs=outputs)

########################################################

model.compile('adam','categorical_crossentropy',['accuracy'])

# Train (report the final score at the 20th epoch)
model.fit(X,Y,10_000,20,validation_data=(Xt,Yt))

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x7f86ff118bd0>

In [38]:
# Model.
from tensorflow.keras.layers import Dense, Input, Concatenate
from tensorflow.keras import Model

### Change the model architecture here
########################################################
inputs = Input(shape = (2))

x = Dense(1000, 'relu')(inputs)
x = Dense(1000, 'relu')(inputs)
outputs = Dense(divisor, 'softmax')(x)

model = Model(inputs=inputs, outputs=outputs)

########################################################

model.compile('adam','categorical_crossentropy',['accuracy'])

# Train (report the final score at the 20th epoch)
model.fit(X,Y,10_000,20,validation_data=(Xt,Yt))

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x7f870e31e0d0>

In [40]:
# Model.
from tensorflow.keras.layers import Dense, Input, Concatenate
from tensorflow.keras import Model


### Change the model architecture here
########################################################
inputs = Input(shape = (2))
x = Dense(1000, 'relu')(inputs)

# Do a ResNet style skip connection
layer1 = Concatenate()([x, inputs])
x = Dense(1000, 'relu')(layer1)

# Do a double skip connection
layer2 = Concatenate()([x, layer1])
x = Dense(1000, 'relu')(layer2)

# Do a triple skip connection
layer3 = Concatenate()([x, layer2])
x = Dense(1000, 'relu')(layer3)

outputs = Dense(divisor, 'softmax')(x)

model = Model(inputs=inputs, outputs=outputs)

########################################################

model.compile('adam','categorical_crossentropy',['accuracy'])

# Train (report the final score at the 20th epoch)
model.fit(X,Y,10_000,20,validation_data=(Xt,Yt))

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x7f86fe25a090>

# Conclusion: Whole number with divisor as input doesn't work even when testing distribution is same as training distribution

# Could it be we need few-shot learning?

From the earlier experiments, the neural network could not figure out how to handle the 21st bit, but has no issues handling bits 0 to 20. Maybe we need to give it some information about the 21st bit for it to generalize.

Hence, here, other than the 250000 random training inputs from 0 to 2^20, we also insert 100 training inputs from 2^20 to 2^21

In [55]:
import tensorflow as tf, numpy as np

# hyperparameters here
divisor = 7

# convert a number into binary
def int2bits(i,fill=21): 
    return list(map(int,bin(i)[2:].zfill(fill)))

def bits2int(b):
    return sum(i*2**n for n,i in enumerate(reversed(b)))

# Data. 
I = np.append(np.random.randint(0,2**20,size=(250000,)), np.random.randint(2**20, 2**21, size = (100,)))
X = np.array(list(map(int2bits,I)))
Y = np.array([int2bits(2**i,divisor) for i in I % divisor])

# Test Data. 
It = np.random.randint(2**20,2**21,size=(10000,))
Xt = np.array(list(map(int2bits,It)))
Yt = np.array([int2bits(2**i,divisor) for i in It % divisor])

In [57]:
# Model.
from tensorflow.keras.layers import Dense, Input, Concatenate
from tensorflow.keras import Model


### Change the model architecture here
########################################################
inputs = Input(shape = (21))

x = Dense(1000, 'relu')(inputs)
outputs = Dense(divisor, 'softmax')(x)

model = Model(inputs=inputs, outputs=outputs)

########################################################

model.compile('adam','categorical_crossentropy',['accuracy'])

# Train (report the final score at the 20th epoch)
model.fit(X,Y,10_000,100,validation_data=(Xt,Yt))

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<tensorflow.python.keras.callbacks.History at 0x7f86f03671d0>

In [58]:
# Model.
from tensorflow.keras.layers import Dense, Input, Concatenate
from tensorflow.keras import Model


### Change the model architecture here
########################################################
inputs = Input(shape = (21))

x = Dense(1000, 'relu')(inputs)
x = Dense(1000, 'relu')(x)
outputs = Dense(divisor, 'softmax')(x)

model = Model(inputs=inputs, outputs=outputs)

########################################################

model.compile('adam','categorical_crossentropy',['accuracy'])

# Train (report the final score at the 20th epoch)
model.fit(X,Y,10_000,100,validation_data=(Xt,Yt))

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<tensorflow.python.keras.callbacks.History at 0x7f86e9ae70d0>

In [60]:
# Model.
from tensorflow.keras.layers import Dense, Input, Concatenate
from tensorflow.keras import Model


### Change the model architecture here
########################################################
inputs = Input(shape = (21))
x = Dense(1000, 'relu')(inputs)

# Do a ResNet style skip connection
layer1 = Concatenate()([x, inputs])
x = Dense(1000, 'relu')(layer1)

# Do a double skip connection
layer2 = Concatenate()([x, layer1])
x = Dense(1000, 'relu')(layer2)

# Do a triple skip connection
layer3 = Concatenate()([x, layer2])
x = Dense(1000, 'relu')(layer3)

outputs = Dense(divisor, 'softmax')(x)

model = Model(inputs=inputs, outputs=outputs)

########################################################

model.compile('adam','categorical_crossentropy',['accuracy'])

# Train (report the final score at the 20th epoch)
model.fit(X,Y,10_000,100,validation_data=(Xt,Yt))

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<tensorflow.python.keras.callbacks.History at 0x7f86d9584810>

# Conclusion: Few-shot learning works. Can achieve about 55% test accuracy for 1 layer of MLP, 88% test accuracy for 2 layers of MLP, and even 98% test accuracy for ResNet. We did it!

# How about one-shot learning?

Here we only give one out-of-sample training sample to let the neural network learn how to handle the 21st bit.

In [61]:
import tensorflow as tf, numpy as np

# hyperparameters here
divisor = 7

# convert a number into binary
def int2bits(i,fill=21): 
    return list(map(int,bin(i)[2:].zfill(fill)))

def bits2int(b):
    return sum(i*2**n for n,i in enumerate(reversed(b)))

# Data. 
I = np.append(np.random.randint(0,2**20,size=(250000,)), np.random.randint(2**20, 2**21, size = (1,)))
X = np.array(list(map(int2bits,I)))
Y = np.array([int2bits(2**i,divisor) for i in I % divisor])

# Test Data. 
It = np.random.randint(2**20,2**21,size=(10000,))
Xt = np.array(list(map(int2bits,It)))
Yt = np.array([int2bits(2**i,divisor) for i in It % divisor])

In [62]:
# Model.
from tensorflow.keras.layers import Dense, Input, Concatenate
from tensorflow.keras import Model


### Change the model architecture here
########################################################
inputs = Input(shape = (21))

x = Dense(1000, 'relu')(inputs)
outputs = Dense(divisor, 'softmax')(x)

model = Model(inputs=inputs, outputs=outputs)

########################################################

model.compile('adam','categorical_crossentropy',['accuracy'])

# Train (report the final score at the 20th epoch)
model.fit(X,Y,10_000,100,validation_data=(Xt,Yt))

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<tensorflow.python.keras.callbacks.History at 0x7f86e8762a50>

In [63]:
# Model.
from tensorflow.keras.layers import Dense, Input, Concatenate
from tensorflow.keras import Model


### Change the model architecture here
########################################################
inputs = Input(shape = (21))

x = Dense(1000, 'relu')(inputs)
x = Dense(1000, 'relu')(x)
outputs = Dense(divisor, 'softmax')(x)

model = Model(inputs=inputs, outputs=outputs)

########################################################

model.compile('adam','categorical_crossentropy',['accuracy'])

# Train (report the final score at the 20th epoch)
model.fit(X,Y,10_000,100,validation_data=(Xt,Yt))

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<tensorflow.python.keras.callbacks.History at 0x7f86df5f9650>

In [64]:
# Model.
from tensorflow.keras.layers import Dense, Input, Concatenate
from tensorflow.keras import Model


### Change the model architecture here
########################################################
inputs = Input(shape = (21))
x = Dense(1000, 'relu')(inputs)

# Do a ResNet style skip connection
layer1 = Concatenate()([x, inputs])
x = Dense(1000, 'relu')(layer1)

# Do a double skip connection
layer2 = Concatenate()([x, layer1])
x = Dense(1000, 'relu')(layer2)

# Do a triple skip connection
layer3 = Concatenate()([x, layer2])
x = Dense(1000, 'relu')(layer3)

outputs = Dense(divisor, 'softmax')(x)

model = Model(inputs=inputs, outputs=outputs)

########################################################

model.compile('adam','categorical_crossentropy',['accuracy'])

# Train (report the final score at the 20th epoch)
model.fit(X,Y,10_000,100,validation_data=(Xt,Yt))

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<tensorflow.python.keras.callbacks.History at 0x7f86d91870d0>

# Conclusion: One-shot learning works, but converges slower than few-shot learning. It achieves 15/31/89% accuracy for 1-layer/2-layer/ResNet accordingly

# Could zero-shot learning work if we give it the structure to consider?

The first few experiments show that we would require at least one training set to be in the out-of-sample in order for the neural network to generalize. But we have not solved the original task yet. Can we use zero-shot learning in order to let the network really learn modulo?

I could not do it for generic modulo, but for modulo 7, I realize a small trick.
We can view the input as chunks of 3. The first 3 bits will have the same modulo 7 as the second 3 bits, and so on. 

This can be seen that 1 = 8 (mod 7), and in general for any positive integer a, a = 8a = 7a + a = a (mod 7). Hence, a bit shift of 3 to the left (same as multiplying by 8), will have no impact on modulo 7, and can be computed the same way also.

Hence, if we could somehow utilize the network generalization for the first few chunks of 3 bits, it could perhaps use those same mechanisms to evaluate the 21st bit.

That is how I thought of using Conv1D to do chunks of 3 for the binarized input, in order to have the same weights (same method of evaluating modulo), and it can generalize to any chunk of 3.

Further, we then use the neural network speciality to generalize these outputs form the chunks of 3 bits into modulo.

And it worked!

In [190]:
import tensorflow as tf, numpy as np

# hyperparameters here
divisor = 7

# convert a number into binary
def int2bits(i,fill=21): 
    return list(map(int,bin(i)[2:].zfill(fill)))

def bits2int(b):
    return sum(i*2**n for n,i in enumerate(reversed(b)))

# Data. 
I = np.random.randint(0,2**20,size=(250000,))
X = np.array(list(map(int2bits, I))) 
Y = np.array([int2bits(2**i,divisor) for i in I % divisor])

# Test Data. 
It = np.random.randint(2**20,2**21,size=(10000,))
Xt = np.array(list(map(int2bits,It)))
Yt = np.array([int2bits(2**i,divisor) for i in It % divisor])

In [205]:
# Model.
from tensorflow.keras.layers import Dense, Input, Concatenate, Reshape, Conv1D, Flatten
from tensorflow.keras import Model


### Change the model architecture here
########################################################
inputs = Input(shape = (21))

x = Reshape((-1, 1))(inputs)
print(x.shape)

x = Conv1D(filters = 1, kernel_size = 3, strides = 3)(x)
print(x.shape)
x = Flatten()(x)
x = Dense(1000, 'relu')(x)
outputs = Dense(7, 'softmax')(x)

model = Model(inputs=inputs, outputs=outputs)

########################################################

model.compile('adam','categorical_crossentropy',['accuracy'])

# Train (report the final score at the 20th epoch)
model.fit(X,Y,10000,100,validation_data=(Xt,Yt))

(None, 21, 1)
(None, 7, 1)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 7

<tensorflow.python.keras.callbacks.History at 0x7f85f7a658d0>

In [206]:
# Model.
from tensorflow.keras.layers import Dense, Input, Concatenate, Reshape, Conv1D, Flatten
from tensorflow.keras import Model


### Change the model architecture here
########################################################
inputs = Input(shape = (21))

x = Reshape((-1, 1))(inputs)
print(x.shape)

x = Conv1D(filters = 1, kernel_size = 3, strides = 3)(x)
print(x.shape)
x = Flatten()(x)
x = Dense(1000, 'relu')(x)
x = Dense(1000, 'relu')(x)
outputs = Dense(7, 'softmax')(x)

model = Model(inputs=inputs, outputs=outputs)

########################################################

model.compile('adam','categorical_crossentropy',['accuracy'])

# Train (report the final score at the 20th epoch)
model.fit(X,Y,10000,100,validation_data=(Xt,Yt))

(None, 21, 1)
(None, 7, 1)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 7

<tensorflow.python.keras.callbacks.History at 0x7f86035d0690>

# Conclusion: 1 MLP layer after Conv1D gives about 75% test accuracy, and 2 MLP layers after Conv1D gives about 99% test accuracy. A tremendous improvement!

# What about the 22nd bit being hidden?

To evaluate 22nd bit, we need to use chunks of 6 instead, as chunks of 3 would leave the 22nd bit on its own and the neural network would not have any idea how to evaluate it.

In [216]:
import tensorflow as tf, numpy as np

# hyperparameters here
divisor = 7

# convert a number into binary
def int2bits(i,fill=24): 
    return list(map(int,bin(i)[2:].zfill(fill)))

def bits2int(b):
    return sum(i*2**n for n,i in enumerate(reversed(b)))

# Data. 
I = np.random.randint(0,2**21,size=(250000,))
X = np.array(list(map(int2bits, I))) 
Y = np.array([int2bits(2**i,divisor) for i in I % divisor])

# Test Data. 
It = np.random.randint(2**21,2**22,size=(10000,))
Xt = np.array(list(map(int2bits,It)))
Yt = np.array([int2bits(2**i,divisor) for i in It % divisor])

In [217]:
# Model.
from tensorflow.keras.layers import Dense, Input, Concatenate, Reshape, Conv1D, Flatten
from tensorflow.keras import Model


### Change the model architecture here
########################################################
inputs = Input(shape = (24))

x = Reshape((-1, 1))(inputs)
print(x.shape)

x = Conv1D(filters = 1, kernel_size = 6, strides = 6)(x)
print(x.shape)
x = Flatten()(x)
x = Dense(1000, 'relu')(x)
x = Dense(1000, 'relu')(x)
outputs = Dense(7, 'softmax')(x)

model = Model(inputs=inputs, outputs=outputs)

########################################################

model.compile('adam','categorical_crossentropy',['accuracy'])

# Train (report the final score at the 20th epoch)
model.fit(X,Y,10000,100,validation_data=(Xt,Yt))

(None, 24, 1)
(None, 4, 1)
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 7

<tensorflow.python.keras.callbacks.History at 0x7f85c42b6750>

# Conclusion

We use chunks of 6 and get a test accuracy of 98%! What an achievement.

Overall, the modulo experiments have been interesting. It shows that the neural network cannot approximate any function easily, especially if it is out-of-sample.

In order to evaluate out-of-sample test cases, either one-shot learning or few-shot learning is required, with an expressive enough network structure.

For zero-shot learning, it seems that the structure to evaluate the unknown data needs to be inside the network. Here, for modulo, the Conv1D to evaluate 3 bits at a time is important. For CNNs, zero-shot learning should also be possible if the general concept of the image processing can be done on generic data, and not contingent on just a particular training set.

Moving on, I believe existing neural networks are not sufficient for learning concepts. It is more suited for pattern recognition and generalizing within a known domain. Perhaps we would need to do graphical models to better mimic the structure of the various functional domains within the brain, coupled with memory in order to link these concepts together.

I am keen to see how to improve the neural networks to do this. Current neural networks are a first step, but they are not the answer yet.