# Comparison of CNN and ConvAC

While reading around the subject of deep learning I started to look for links between the field and AdS/CFT, my field of study for my PhD. Among other attempts to make this connection I found https://arxiv.org/abs/1704.01552 which posits a relation between the tensor networks used in condensed matter physics and "ConvACs" a variation on convolutional neural networks with linear activations and nonlinearity introduced by product pooling that the authors had previously introduced in https://arxiv.org/abs/1509.05009.

In order to understand this technique better, mostly so I can judge the plausibility of the connection to physics, I decided to implement a simple ConvAC and compare it's performance to an equivalent CNN on a simple problem: the MINST dataset.

It should be noted that the papers in question are very theoretical and this notebook comprises my messing around to try and understand them rather than an attempt to replicate a specific result

## Load Libraries and Dataset

In [12]:
import numpy as np
import tensorflow as tf

import matplotlib.pyplot as plt

from keras.models import Model, load_model
from keras.layers import Dense , Dropout, Lambda, Flatten, Conv2D, MaxPool2D, Input, AveragePooling2D
from keras.optimizers import Adam, RMSprop

from keras.utils.np_utils import to_categorical
from keras.preprocessing.image import ImageDataGenerator
import keras.backend as K

from keras.engine.topology import Layer

from keras.datasets import mnist

(X_train, y_train), (X_test, y_test) = mnist.load_data()

assert K.image_data_format() == "channels_last"

y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

X_train = np.reshape(X_train, (-1, 28, 28, 1))/255
X_test = np.reshape(X_test, (-1, 28, 28, 1))/255

In [2]:
optimizer = RMSprop(lr=0.001, rho=0.9, epsilon=1e-08, decay=0.0)

datagen = ImageDataGenerator(featurewise_center=False,  # set input mean to 0 over the dataset
                             samplewise_center=False,  # set each sample mean to 0
                             featurewise_std_normalization=False,  # divide inputs by std of the dataset
                             samplewise_std_normalization=False,  # divide each input by its std
                             zca_whitening=False,  # apply ZCA whitening
                             rotation_range=10,  # randomly rotate images in the range (degrees, 0 to 180)
                             zoom_range = 0.1, # Randomly zoom image 
                             width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
                             height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)
                             horizontal_flip=False,  # randomly flip images
                             vertical_flip=False)  # randomly flip images

datagen.fit(X_train)

  ' (' + str(x.shape[self.channel_axis]) + ' channels).')


## CNN

We'll use a very small model so it trains in a reasonable amount of time.

In [3]:
input_img = Input(shape = (28, 28, 1))

X1 = Conv2D(32, (3, 3), activation = "relu", padding = "same")(input_img)
X1 = MaxPool2D(pool_size = (2, 2))(X1)
X1 = Dropout(0.25)(X1)

X1 = Conv2D(64, (3, 3), activation = "relu", padding = "same")(X1)
X1 = MaxPool2D(pool_size = (2, 2))(X1)
X1 = Dropout(0.25)(X1)

X1 = Flatten()(X1)
X1 = Dense(256, activation = "relu")(X1)
X1 = Dropout(0.5)(X1)
CNN_output = Dense(10, activation = "softmax")(X1)

CNN_model = Model(input_img, CNN_output)
CNN_model.compile(optimizer = optimizer,
                  loss = "categorical_crossentropy",
                  metrics = ["accuracy"])

In [4]:
epochs = 3
batch_size = 86

CNN_model.fit_generator(datagen.flow(X_train, y_train, batch_size=batch_size),
                        epochs = epochs,
                        verbose = 1,
                        steps_per_epoch = X_train.shape[0] // batch_size)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x256fff61208>

## ConvAC

A product pooling layer is just an average pooling layer multiplied by the pool size. Obviously this gives us a hint at which model will give better performance in the end...

In [8]:
X2 = Conv2D(32, (3, 3), padding = "same")(input_img)
X2 = AveragePooling2D(pool_size = (2, 2))(X2)
X2 = Lambda(lambda x: 2*2*x)(X2)
X2 = Dropout(0.25)(X2)

X2 = Conv2D(64, (3, 3), padding = "same")(X2)
X2 = AveragePooling2D(pool_size = (2, 2))(X2)
X2 = Lambda(lambda x: 2*2*x)(X2)
X2 = Dropout(0.25)(X2)

X2 = Flatten()(X2)
X2 = Dense(256, activation = "relu")(X2)
X2 = Dropout(0.5)(X2)
ConvAC_output = Dense(10, activation = "softmax")(X2)

ConvAC_model = Model(input_img, ConvAC_output)
ConvAC_model.compile(optimizer = optimizer,
                     loss = "categorical_crossentropy",
                     metrics = ["accuracy"])

In [9]:
ConvAC_model.fit_generator(datagen.flow(X_train, y_train, batch_size=batch_size),
                        epochs = epochs,
                        verbose = 1,
                        steps_per_epoch = X_train.shape[0] // batch_size)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x25680fde710>

## Evaluation

And the winner is...

In [10]:
CNN_model.evaluate(x = X_test, y = y_test)



[0.030486235922342165, 0.99199999999999999]

In [11]:
ConvAC_model.evaluate(x = X_test, y = y_test)



[0.12419580230116845, 0.96240000000000003]

The CNN! Well that was unsurprising. We already know that max pooling performs better than average pooling. Still a hands on implementation was informative.