# Fully Connected Neural Network

Our first trial would be to use Keras to develop a fully connected neural network with a single hidden layer. So we will tsart with the most basic layout of a neural network and see how it works for this problem

In [14]:
# Common python import statements
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import  Dense, Flatten
import numpy as np
import math


# fixed random seed to have consistent results
np.random.seed(123)

# dimensions of our images. Keras will be resizing all images to have consistent data fed to training and testing phases
img_width, img_height = 300, 300 

train_data_dir = 'data/train'
validation_data_dir = 'data/test'
nb_train_samples = 3000
nb_validation_samples = 300
epochs = 5
batch_size = 500 # in next steps we cannot just use any size here as it will be limited by GPU memory size.
# if you get OOM exception anytime, decrease this value

In [15]:
# We need do define input data format or shape. Tensorflow defaults for image are height then width then colour channels
input_shape = (img_height, img_width, 3) 
# height and width are the same here but can have some implications when they are different and when doing prediction


## Define Keras model

In [16]:
model = Sequential() # sequential mode is a stack of layers one after other and this is the simplest network layout
model.add(Dense(3, input_shape=input_shape)) # this is input layer apparently
model.add(Flatten()) # flatten the multi dimensional input into a single dimension (vector)
model.add(Dense(32, activation = 'relu')) # RELU is simply max(x,0) which adds some non linear activation
model.add(Dense(2, activation = 'softmax')) # softmax is used in output layer only

In [17]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_7 (Dense)              (None, 300, 300, 3)       12        
_________________________________________________________________
flatten_3 (Flatten)          (None, 270000)            0         
_________________________________________________________________
dense_8 (Dense)              (None, 32)                8640032   
_________________________________________________________________
dense_9 (Dense)              (None, 2)                 66        
Total params: 8,640,110
Trainable params: 8,640,110
Non-trainable params: 0
_________________________________________________________________


**Can you see above that a simple network for 300x300 image has more than 8 million weights to train. Totally crazy !!**

For any model we need to compile it first with a loss function based on the problem and an optimizer and an accuracy metric.
I can just throw two words on accuracy here. If we are doing dogs vs cats then we may be concerned with 
total number of correct predictions only. But what if we are doing cancer detection where the average percentage of 
positives is something like 1%. Clearly we do not want to inform someone you have cancer when he does not and also
we do not want to miss people with cancer and let them go home without informing them. So for such cases there are other 
metrics like recall and precision. Just google "F1 Score"

In [18]:
model.compile(loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy'])

Next we need to prepare some generators that can feed Keras with data for training and testing. They sheild us from the hard task of loading images and converting them to multi dimensional arrays and so on. Also sometimes training datasets could be small so those generators can do some process called data augmentation to fake new training data. For example, images could be rotated a bit or zoomed as long as they contain the same main patterns needed for training. 

In [19]:
# this is the augmentation configuration we will use for training, some scaling and image manipulation as well
train_datagen = ImageDataGenerator(
    rescale=1. / 255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True)

# this is the augmentation configuration we will use for testing, only rescaling
test_datagen = ImageDataGenerator(rescale=1. / 255)

train_generator = train_datagen.flow_from_directory(
    train_data_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='categorical')

validation_generator = test_datagen.flow_from_directory(
    validation_data_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='categorical')

Found 3000 images belonging to 2 classes.
Found 300 images belonging to 2 classes.


Next step will be training the network and adjusting the weights. Output of final training epoch will be final accuracy for the few epochs run. An epoch is a training round against all training data that is spearated into a number of steps in which each step will be training a specific number of training samples.

In [20]:
model.fit_generator(
    train_generator,
    steps_per_epoch=math.ceil(nb_train_samples / batch_size),
    epochs=epochs,
    validation_data=validation_generator,
    validation_steps=math.ceil(nb_validation_samples/batch_size))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f4920011080>

**
The above is a very bad result, you know why?. A binary classification problem with a dummy solver will give 50% accuracy if the solver picks one of the two classes all the time assuming the dataset is balanced.
Let's see in notebook #3 if we can do better by adding another hidden layer or maybe more.
**