# Hands-on Introduction to Python And Machine Learning

Instructor: Tak-Kei Lam

(Readers are assumed to have a little bit programming background.)


### 3. Neural networks

In our brain, we have around $10^{11}$ basic units called neurons. These neurons
are massively interconnected as neural networks. A neuron, when stimulated,
communicate with its neighbours electro-chemically at ultra high speed. The
sender start the communication by releasing chemical substances to excite the
receivers. Upon receiving the signals, the inner electric potential of the
receiver will change. If the inner electric potential has reached a certain
threshold, the receiver will then in turn release chemical substances and
trigger the communication with its own neighbours. This happens repeatedly and
simultaneously among all neurons.

#### Artificial neural network
![Artificial neural network](ann.png)

In order to understand how our brain process information, and to create a computer that can learn, scientists have been developing artificial neural networks using computers. Since artificial neural network has the ability to learn from examples, they can be used to solve problems even if we do not understand the nature of problems well (and thus cannot give explicit instructions to the computer). They have been found to be very flexible and can be potentially useful for solving many problems.

An example of artificial neural network is shown in the above figure. The circles represent neurons. The arrows represents connections and indicate the direction of information flow. The network has one layer of four input neurons, one hidden layer of five neurous, and one output layer containing only one neuron. Every artificial neural network must have one layer of input and one output layer of output neurons. Each layer of neurons can have any number of neurons.

An artificial neural network is usually fully connected, where every neuron in a layer is connected with all neurons in the neighbour layers; it can also be sparse, which may be more suitable in some situations. It is also possible to have feedback loops and use the output neurons as the inputs of some neurons in the hidden layers.

Each input connection of the receiver has a weighting value. The weights of the input connections sum to $< 1$ for each receiver. In the above figure, the weights of the inputs connections of the output neuron are shown. The output of a neuron is calculated based on a function and the weighted inputs, and is usually in the range 0 to 1.

The number of layers, the number of neurons in each layer, how the neurons should be connected, the functions that neurons implement, and other parameters have to be tailor-made for the problem.

#### How do artificial networks learn?
Let's learn by the following example; the simplest example ever!

![A simple example of neural network](nn-example.png)

Artificial neural networks actually learn by adjusting the weights of each
neuron. Consider the super simple artificial neural network in
the above figure. The output neuron implement the function $f$:
\begin{align*}
    f &= \frac{1}{1+e^{-x}} \\
    \text{where } x &= w_a \times a + w_b \times b + w_c \times c \\
    c & = -0.5
\end{align*}

We want to use it to learn whether $a > b$:

| condition |  target value of $f$, $t$    |
|:---------:|:----------------------------:|
|$a > b$    | 1                            |
|$a \le b$   | 0                            |

Since $f$ is continuous in the range 0 to 1, we assume that:

|result  &nbsp;&nbsp;&nbsp;       | meaning |
|:-------------:|:-------:|
|$f > 0.5 $     | $a > b$   |
|$f \le 0.5$    | $a \le b$  |


Base on the initial configuration as shown in the figure,
\begin{align*}
    \text{if } a &=0.5, b =0.3 \\
    \text{then } f & = \frac{1}{1+e^{-(0.0 \times 0.5 + 0.0 \times 0.3 + 1 \times -0.5)}} \\
                   & = 0.377541
\end{align*}

The target value of $f$ should be $1$ instead; now it is not even $>0.5$!
Obviously, our artificial neural network fails to determine whether $a>b$ at
this moment. We can train the artificial neural network to improve it. But how? We need to use a little bit maths.

First of all, we have to define how far we are from the target result. Let $E$
be the squared error:
\begin{align*}
    E = \frac{1}{2} (t - f)^2
\end{align*}
where $t$ is the target result. Referring to the previous example, if $a=0.5,
b=0.3$, $f=0.377541$ and $t=1$.

Then, we can determine how $E$ responds to the change of each of the weights.
That is, the rates of change of $E$ with respect to
$w_a$($\frac{\partial{E}}{\partial{w_a}}$),
$w_b$($\frac{\partial{E}}{\partial{w_b}}$) and
$w_c$($\frac{\partial{E}}{\partial{w_c}}$). If we know these values, we can know
whether a weight should be increased or decreased. For example, if
$\frac{\partial{E}}{\partial{w_a}} > 0$, we know that increasing $w_a$ will
increase the error, and hence we should decrease it. We can also decide how
large the adjustment should be.


By the **chain rule**, we know that:
\begin{align*}
    \frac{\partial{E}}{\partial{w_i}} &= 
        \frac{\partial{E}}{\partial{f}} \times
        \frac{\partial{f}}{\partial{x}} \times 
        \frac{\partial{x}}{\partial{w_i}}
        & \text{ where } i \text{ can be } a,
        b, \text{ or } c \\
\end{align*}

We also know all the derivatives in the formula:
\begin{align*}
    \frac{\partial{E}}{\partial{f}} &= -(t-f) \\
    \frac{\partial{f}}{\partial{x}} &= (\frac{1}{1+e^{-x}})(1-\frac{1}{1+e^{-x}}) \\
    \frac{\partial{x}}{\partial{w_a}} &= a \\
    \frac{\partial{x}}{\partial{w_b}} &= b  \\
    \frac{\partial{x}}{\partial{w_c}} &= c = -0.5 \\
\end{align*}


Therefore, each of the weights can be updated as follows:
\begin{align*}
    w_i &= w_i - \eta \times \frac{\partial{E}}{\partial{w_i}} & \text{ where } i
    \text{ can be } a, b, \text{ or } c, \text{and } 0<\eta\le 1 \text{ is known as the
    learning rate } \\
\end{align*}

In our example, if $a=0.5, b=0.3$, the corrected weights are:
- $w_a=0.073140$, 
- $w_b=0.043884$,
- $w_c=0.926860$.

The value of $f$ based on these new weights will increase to $0.398027$.

Suppose we train our artificial neural network using the following *training data* using
$\eta=1$ for 100 iterations:

|        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$a, b$ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | $t$ |
|:-------------:|:---:|
|        $a=0.5,\\ b=0.3$ | 1 |
|        $a=0.3,\\ b=0.5$ | 0 |
|        $a=1.0,\\ b=0.2$ | 1 |
|        $a=0.2,\\ b=1.0$ | 0 |
|        $a=0.9,\\ b=0.8$ | 1 |
|        $a=0.5,\\ b=0.5$ | 0 |


After training, the final weights will be:
- $w_a=4.887006$,
- $w_b=-2.908520$,
- $w_c=2.280892$.

The result for some *test* data is listed in the following table. Only the entry $a=0.51, b=0.5$ is wrong. Our artificial neural network performs way better than before! It may work even better if we train it with more examples. Generally speaking, the more
examples the artificial neural network has learnt, the less error it will produce.

|       &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $a, b$&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | $t$ | $f$| correct?|
|:--:|:--:|:--:|:--:|
|        $a=0.5,\\ b=0.3$ | 1 | 0.605993| ✔|
|        $a=0.3,\\ b=0.5$ | 0 | 0.244419| ✔|
|        $a=1.0,\\ b=0.2$ | 1 | 0.959490| ✔|
|        $a=0.2,\\ b=1.0$ | 0 | 0.044296| ✔|
|        $a=0.9,\\ b=0.8$ | 1 | 0.717287| ✔|
|        $a=0.5,\\ b=0.5$ | 0 | 0.462271| ✔|
|        $a=0.51,\\ b=0.5$ | 1 | 0.474439| |
|        $a=0.61,\\ b=0.6$ | 1 | 0.523861| ✔|


We have demonstrated a artificial neural network with only one neuron can be used to solve problems without explicit if-then-else code. Using an artificial neural network to solve $a > b?$ is of course an overkill. How about using artificial neural network to "predict" stock trend? That may be possible using more complex artificial neural networks.

Have you ever wondered how a neural network program looks like? Let's see the code for the above example.

In [None]:
import numpy as np
import math

A = np.array([0.5, 0.3, 1, 0.2, 0.9, 0.5])
B = np.array([0.3, 0.5, 0.2, 1, 0.8, 0.5])
Expected_f = np.array([1, 0, 1, 0, 1, 0])

# initial weights
w_a = 0
w_b = 0
w_c = 1

epoch = 1
alpha = 1 # the learning rate "eta" in our formula

# training
while epoch <= 100:
    print('---------------- epoch: {}'.format(i))
    for i in range(0, len(A)):
        is_update = 0

        a = A[i]
        b = B[i]
        c = -0.5
        expected_f = Expected_f[i]

        x = w_a * a + w_b * b + w_c *c
        f = 1/(1+math.exp(-x))
        
        E = 0.5 * ((expected_f - f)**2)

        df_dx = f * (1-f)
        df_dw_a = df_dx * a
        df_dw_b = df_dx * b
        df_dw_c = df_dx * c


        dE_df = (-expected_f+f)
        dE_dw_a = dE_df * df_dw_a
        dE_dw_b = dE_df * df_dw_b
        dE_dw_c = dE_df * df_dw_c

        w_a = w_a - alpha*dE_dw_a
        w_b = w_b - alpha*dE_dw_b
        w_c = w_c - alpha*dE_dw_c

        print('x: {}'.format( x))
        print('a: {}, b: {}, f: {}, expected f: {}'.format( a, b, f, expected_f))
        print('df_dw_a: {}'.format( df_dw_a))
        print('df_dw_b: {}'.format( df_dw_b))
        print('df_dw_c: {}'.format( df_dw_c))
        print('dE_df: {}'.format( dE_df))
        print('dE_dw_a: {}'.format( dE_dw_a))
        print('dE_dw_b: {}'.format( dE_dw_b))
        print('dE_dw_c: {}'.format( dE_dw_c))
        print('w_a: {}'.format( w_a))
        print('w_b: {}'.format( w_b))
        print('w_c: {}'.format( w_c))

        x = w_a * a + w_b * b + w_c *c
        f = 1/(1+math.exp(-x))

        print('corrected a: {}, b: {}, f: {}, expected f: {}'.format(a, b, f, expected_f))
        print('')
        
        i = i+1

    epoch = epoch +1 


print('Finish training!')



# testing
A = np.array([0.5, 0.3, 1, 0.2, 0.9, 0.5, 0.51, 0.61, 0.2, 0.4])
B = np.array([0.3, 0.5, 0.2, 1, 0.8, 0.5, 0.5, 0.60, 0.1, 0.5])
Expected_f = np.array([1, 0, 1, 0, 1, 0, 1, 1, 1, 0])
for i in range(0, len(A)):
    a = A[i]
    b = B[i]
    expected_f = Expected_f[i]

    x = w_a * a + w_b * b + w_c *c
    f = 1/(1+math.exp(-x))

    correct = (expected_f == 1 and f > 0.5) or (expected_f == 0 and f<=0.5)
    print('a: {}, b: {}, f: {}, expected f: {}, correct? {}'.format(a, b, f, expected_f, correct))
    i = i + 1

print('Finish testing!')

Please note that the code is just for our particular single-neuron example. It is highly specialised and is not a general neural network implementation. Now you have realised how difficult it is to write a general neural network library, haven't you?

Efficiency and user-friendliness are the major factors that neural network library developers should pay attention to. The number of parameters to be calculated can easily go up to millions or billions. That is why specialised hardwares such as *GPU* and *FPGA* are prefered to general CPU for training and running neural networks.

For general neural network maching learning frameworks, please refer to:
1. [scikit-learn (neural network)](http://scikit-learn.org/stable/modules/neural_networks_supervised.html)
2. [Tensorflow](https://www.tensorflow.org/) and [Keras](https://keras.io/)
3. [pytorch](https://pytorch.org/)

A demo of using scikit-learn to classify Pokemons:

In [None]:
from sklearn.neural_network import MLPClassifier
import matplotlib.pyplot as plt
import numpy as np
from sklearn.cluster import KMeans
from sklearn.metrics import mean_squared_error, r2_score
import pandas as pd    

# Load the dataset
pokemons = pd.read_csv("pokemon.csv")
print(pokemons['Type 1'].unique())

pokemons = pokemons.sample(frac=1) # .sample(frac=1) randomize the data, and select 100% from the randomized

label_column = ['Type 1']
features_columns = ['HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed']

pokemons_features = pokemons[features_columns]
pokemons_label = pokemons[label_column]

# normalise every columns in pokemons_features
pokemons_features = pokemons_features.apply(lambda x: (x - x.min())/(x.max() - x.min()))

# .values convert the datastructure from pandas's dataframe into numpy's array

# Split the data into training/test sets
last_index = -int(0.20*len(pokemons_features))
pokemons_features_train = pokemons_features[:last_index].values
pokemons_features_test = pokemons_features[last_index:].values 

last_index = -int(0.20*len(pokemons_label))
pokemons_label_train = pokemons_label[:last_index].values.flatten() # the expected labels
pokemons_label_test = pokemons_label[last_index:].values.flatten()  # the expected labels


clf = MLPClassifier(solver='lbfgs', alpha=1e-5, hidden_layer_sizes=(50, 18), random_state=1)
clf.fit(pokemons_features_train, pokemons_label_train)

# Make predictions using the testing set
pokemons_label_pred = clf.predict(pokemons_features_test) # the actual labels

correct = 0.0
for i in range(0, len(pokemons_label_test)):
    print('expected {} VS. actual {}'.format(pokemons_label_test[i], pokemons_label_pred[i]))
    if pokemons_label_pred[i] == pokemons_label_test[i]:
        correct = correct+1
print('Accuracy: {}%'.format(correct/len(pokemons_label_test) * 100))


We have just learnt the most fundamental concepts  of artificial neural network. Here are a few "jargons" regarding neural network basics that we have talked about:
- Feed forward
- Back propagation
- Bias
- Fully connected layers
- Activation function
- (Stochastic) Gradient descent

>> Don't remember the keywords; remember the concepts. --- Tak-Kei Lam (the Great)

People found that there is a huge potential of using artificial neural networks to solve problems because of its simplicity$^*$. And therefore now we have a huge family of artificial neural networks in which each member is unique and best at doing something...

##### The family of artificial neural networks:
- Basic feed forward neural network (NN)
- Convolutional neural network (CNN)
- Recurrent neural network (RNN)
- Long short term memory (LSTM)
- Convolutional recurrent neural network (CRNN)
- Region-based convolutional neural network (RCNN)
- (an extremely lengthy and boring list of entries are not presented here...) 

#### An example of CNN using Tensorflow and Keras:
In this example, we are building a neural network to report whether a given Pokemon image is Pichu or Pikachu.

|          |         |
|:----:|:----:|
|![Pichu?](pichu-1.png) | ![Pikachu?](pikachu-1.png)|

In [None]:
import keras
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.pooling import MaxPooling2D
from keras.optimizers import SGD
from  skimage import io, color, exposure, transform
import numpy as np
import os
import glob
import matplotlib.pyplot as plt

NUM_CLASSES = 2
IMG_SIZE = 48
BATCH_SIZE = 1
EPOCHS  = 20

root_dir = './pokemon-images/'
img_paths = []
imgs = []
labels = []
label_names = []

# ---------------------------------------------------------------------------- model definition

class AccuracyHistory(keras.callbacks.Callback):
    def on_train_begin(self, logs={}):
        self.acc = []

    def on_epoch_end(self, batch, logs={}):
        self.acc.append(logs.get('acc'))
        
history = AccuracyHistory()

# preprocess the input image: kind of normalise it, and standardise the size
def preprocess_img(img):
    # Histogram normalization in v channel
    hsv = color.rgb2hsv(color.rgba2rgb(img))
    hsv[:, :, 2] = exposure.equalize_hist(hsv[:, :, 2])
    img = color.hsv2rgb(hsv)

    # central square crop
    min_side = min(img.shape[:-1])
    centre = img.shape[0] // 2, img.shape[1] // 2
    img = img[centre[0] - min_side // 2:centre[0] + min_side // 2,
              centre[1] - min_side // 2:centre[1] + min_side // 2,
              :]

    # rescale to standard size
    img = transform.resize(img, (IMG_SIZE, IMG_SIZE))
    return img

# construct the neural network model
# hey we're using more than one neurons connected in some ways! 
def constructModel(batchSize):
    model = Sequential()
    model.add(Conv2D(32, kernel_size=(5, 5), strides=(1, 1),
                     activation='relu',
                     input_shape=(IMG_SIZE, IMG_SIZE, 3)))
    model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
    model.add(Conv2D(64, (5, 5), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Flatten())
    model.add(Dense(1000, activation='relu'))
    model.add(Dense(NUM_CLASSES, activation='softmax'))
    model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.SGD(lr=0.01),
              metrics=['accuracy'])
    return model



# ---------------------------------------------------------------------------- model training
# read images
all_img_paths = glob.glob(os.path.join(root_dir, '*.png'))
np.random.shuffle(all_img_paths)
for img_path in all_img_paths:
    img = preprocess_img(io.imread(img_path))
    if 'pikachu' in img_path:
        label = 1
        label_name = 'Pikachu'
    elif 'pichu' in img_path:
            label = 0
            label_name = 'Pichu'
    print('Image: {} is of class: {}'.format(img_path, label))
    imgs.append(img)
    labels.append(label)
    label_names.append(label_name)
    img_paths.append(img_path)

X = np.array(imgs)
#print(X.shape)
# Make one hot targets
Y = np.eye(NUM_CLASSES)[labels]


num_training_imgs = int(0.8*len(X))
num_test_imgs = len(X) - num_training_imgs

img_paths_train = img_paths[0:num_training_imgs]
x_train = X[0:num_training_imgs]
y_train = Y[0:num_training_imgs]
labels_train = label_names[0:num_training_imgs]

for i,v in enumerate(x_train):
    print('training image: {} is {}'.format(img_paths_train[i], labels_train[i]))

model = constructModel(len(x_train))

model.fit(x_train, y_train,
          batch_size=BATCH_SIZE,
          epochs=EPOCHS,
          verbose=1,
          validation_data=(x_test, y_test),
          callbacks=[history])


# plot a graph of the change of model accuracy over time
plt.plot(range(1,EPOCHS+1), history.acc)
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.show()


# ---------------------------------------------------------------------------- model testing
img_paths_test = img_paths[-num_test_imgs:]
x_test = X[-num_test_imgs:]
y_test = Y[-num_test_imgs:]
labels_test = label_names[-num_test_imgs:]

for i,v in enumerate(x_test):
    print('test image: {} is {}'.format(img_paths_test[i], labels_test[i]))
    
# predict and print the probability that an image belongs to a class
print(model.predict(x_test))

score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])



** Exercise **:
- Try to use scikit-learn's neural network to model the following classification problem:

In [1]:
data = [0, 1]
classes  = [0, 1] # classes[i] is the class of the data point data[i]

** Exercise **:
- Try to use scikit-learn's neural network to model the following classification problem:

In [1]:
data = [[0, 0], [0,1], [1,0], [1,1]]
classes  = [0, 0, 1, 1] # classes[i] is the class of the data point data[i]

** Exercise **:
- Try to use scikit-learn's neural network to model the following classification problem:

In [1]:
data = [[0, 0], [0,1], [1,0], [1,1]]
classes  = [0, 1, 2, 3] # classes[i] is the class of the data point data[i]

** Exercise **:
- Try to use scikit-learn's neural network to model the following classification problem:

In [1]:
data = ['a', 'e', 'i', 'o', 'u', 'b', 'c', 'd', 'f', 'g', 'h', 'j', 'k', 'l', 'm', 'n', 'p', 'q', 'r', 's', 't', 'v', 'w', 'x', 'y', 'z']
classes  = [1, 1, 1,1, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0,0, 0, 0, 0, 0,0, 0, 0, 0, 0,0] # classes[i] is the class of the data point data[i]