# hand-sign recognizer
Here we build a convolutional neural-network for recognizing hand-signs.

> Inspired by: [Convolutional Neural Networks](https://www.coursera.org/learn/convolutional-neural-networks)

In [1]:
import torch
import h5py
import numpy as np
from numpy.random import default_rng
from tqdm import tqdm
from matplotlib import pyplot as plt

if torch.cuda.is_available(): # TODO: remove the false
    print("Cuda available.")
    tensor_type = 'torch.cuda.DoubleTensor'
    torch.backends.cuda.matmul.allow_tf32 = True
else:
    print("Cuda not found.")
    tensor_type = 'torch.DoubleTensor'

print(f"Setting {tensor_type} as default dtype...")
torch.set_default_tensor_type(tensor_type)

Cuda available.
Setting torch.cuda.DoubleTensor as default dtype...


In [2]:
train_x, train_y, test_x, test_y, class_labels = None, None, None, None, None
with h5py.File("../data/test-hand-signs.h5", "r") as f:
    test_x = np.array(f["test_set_x"])
    test_y = np.array(f["test_set_y"])

with h5py.File("../data/train-hand-signs.h5", "r") as f:
    train_x = np.array(f["train_set_x"])
    train_y = np.array(f["train_set_y"])
    class_labels = np.array(f["list_classes"])

print(f"# of training-examples: {train_x.shape[0]}")
print(f"# of test-examples: {test_x.shape[0]}")
print(f"image-dimensions: {test_x.shape[1:]}")
print(f"class-labels: {class_labels}")

# of training-examples: 1080
# of test-examples: 120
image-dimensions: (64, 64, 3)
class-labels: [0 1 2 3 4 5]


## #pre-processing
We will perform mean and variance normalization of the input.

In [3]:
train_mean = np.mean(a=train_x, axis=0)
train_std = np.std(a=train_x, axis=0)

# train-set normalization
train_X = (train_x - train_mean) / train_std
train_Y = np.eye(len(class_labels))[:, train_y].copy()

# test-set normalization
test_X = (test_x - train_mean) / train_std
test_Y = np.eye(len(class_labels))[:, test_y].copy()

print(f"Train-set X: {train_X.shape}, Y: {train_Y.shape}")
print(f"Test-set X: {test_X.shape}, Y: {test_Y.shape}")

cu_train_X = torch.tensor(train_X)
cu_train_Y = torch.tensor(train_Y)
cu_test_X = torch.tensor(test_X)
cu_test_Y = torch.tensor(test_Y)

Train-set X: (1080, 64, 64, 3), Y: (6, 1080)
Test-set X: (120, 64, 64, 3), Y: (6, 120)


## # architecture
We will use a 3-layer CNN, as defined below:

<center>

| #-layer | layer-type      | component        | properties                           |         |
|---------|-----------------|------------------|--------------------------------------|---------|
| 1       | 2d-convolution  | kernel           | $(h^{[1]}_k, w^{[1]}_k)$             | $(4,4)$ |
|         |                 | #-kernels        | $c^{[1]}$                            | 8       |
|         |                 | convolve-padding | $(h^{[1]}_p, w^{[1]}_p)$             | <same>  |
|         |                 | convolve-stride  | $(h^{[1]}_s, w^{[1]}_s)$             | $(1,1)$ |
|         |                 | pooling          | max-pooling                          |         |
|         |                 | pooling-filter   | $(h^{[1]}_l, w^{[1]}_l)$             | $(8,8)$ |
|         |                 | pooling-padding  | $({}^{l}h^{[1]}_p, {}^{l}w^{[1]}_p)$ |`<same>` |
|         |                 | pooling-stride   | $({}^{l}h^{[1]}_s, {}^{l}w^{[1]}_s)$ | $(8,8)$ |
|         |                 | activation       | ReLU                                 |         |
| 2       | 2d-convolution  | kernel           | $(h^{[2]}_k, w^{[2]}_k)$             | $(2,2)$ |
|         |                 | #-kernels        | $c^{[2]}$                            | 16      |
|         |                 | convolve-padding | $(h^{[2]}_p, w^{[2]}_p)$             |`<same>` |
|         |                 | convolve-stride  | $(h^{[2]}_s, w^{[2]}_s)$             | $(1,1)$ |
|         |                 | pooling          | max-pooling                          |         |
|         |                 | pooling-filter   | $(h^{[2]}_l, w^{[2]}_l)$             | $(4,4)$ |
|         |                 | pooling-padding  | $({}^{l}h^{[2]}_p, {}^{l}w^{[2]}_p)$ |`<same>` |
|         |                 | pooling-stride   | $({}^{l}h^{[2]}_s, {}^{l}w^{[2]}_s)$ | $(4,4)$ |
|         |                 | activation       | ReLU                                 |         |
| 3       | fully-connected | #-neurons        | $n^{[3]}$                            | $6$     |
|         |                 | activation       | softmax                              |         |

</center>

Also, the number of channels in the input, i.e. $c^{[0]} = 3$.

* Given the stride and kernel, `<same>` padding refers to the padding regime wherein the output immediately after the convolution is of the same size as that of the input, i.e.
$$
h^{[l]}_z = \left\lfloor\frac{h^{[l]}_a + 2h^{[l]}_p - h^{[l]}_k}{h^{[l]}_s} + 1\right\rfloor = h^{[l]}_a;\qquad w^{[l]}_z = \left\lfloor\frac{w^{[l]}_a + 2w^{[l]}_p - w^{[l]}_k}{w^{[l]}_s} + 1\right\rfloor = w^{[l]}_a
$$

## # forward-propagation
> **Note**: for the forward-propagation equations, read [Section-2, Back-propagation: Conv2D](./back-propagation_Conv2D.pdf).

In [None]:
def convolve(A: torch.Tensor, K: torch.Tensor, model: dict, layer: int):
    pass

def relu(Z: torch.Tensor) -> torch.Tensor:
    return torch.maximum(Z, torch.tensor(0))

sftmx = torch.nn.Softmax(dim=0)
def softmax(Z: torch.Tensor) -> torch.Tensor:
    return sftmx(Z)

def linear(W: torch.Tensor, A: torch.Tensor, b: torch.Tensor) -> torch.Tensor:
    return torch.matmul(W.T, A) + b

def activation(Zl: torch.Tensor, func_name: str) -> torch.Tensor:
    if func_name == 'relu':
        return relu(Zl)
    elif func_name == 'softmax':
        return softmax(Zl)
    else:
        raise ValueError(f"Unknown activation-function: {func_name}")

def forward_propogate(X: torch.Tensor, model: dict) -> tuple:
    L: int = model["L"]
    
    cache = {'m': X.shape[1], 'c-l0': (X, None, None, None)}

    Al_1 = X
    Al = None
    for l in range(L):
        Wl = model['W-l' + str(l + 1)]
        bl = model['b-l' + str(l + 1)]
        Zl = linear(Wl, Al_1, bl)
        Al = activation(Zl, model['g-l' + str(l + 1)])
        cache['c-l' + str(l + 1)] = (Al.clone(), Wl.clone(), bl.clone(), Zl.clone())
        
        Al_1 = Al
    
    return Al, cache

def softmax_cost(Al: torch.Tensor, Y: torch.Tensor, **kwargs) -> float:
    """
    Assumes Y and Al to be (nl,m) dimensional vectors
    """
    assert Al.shape == Y.shape

    return torch.multiply(-Y, torch.log(Al)).sum() / Al.shape[1]

In [13]:
s = cu_train_X[0]
print(f"shape: {s.shape} stride: {s.stride()}")

shape: torch.Size([64, 64, 3]) stride: (192, 3, 1)


In [24]:
x = np.arange(start=1,step=1,stop=16).reshape((3,5))
print(f"np.stride: {x.strides} type: {x.dtype}")
x = torch.Tensor(x)
print(f"shape: {x.shape} stride: {x.stride()}")
print(f"x: {x}")

y = np.arange(start=1,step=1,stop=7).reshape((2,3))
y = torch.Tensor(y)
print(f"shape: {y.shape} stride: {y.stride()}")
print(f"y: {y}")

np.stride: (20, 4) type: int32
shape: torch.Size([3, 5]) stride: (5, 1)
x: tensor([[ 1.,  2.,  3.,  4.,  5.],
        [ 6.,  7.,  8.,  9., 10.],
        [11., 12., 13., 14., 15.]])
shape: torch.Size([2, 3]) stride: (3, 1)
y: tensor([[1., 2., 3.],
        [4., 5., 6.]])


In [17]:
x.as_strided = ()

tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]])
