# Hand pose estimation with Keras
## Zane Geiger

### Introduction

This project will attempt to reproduce the results of Oberweger et. al in [Hands Deep in Deep Learning for Hand Pose Estimation](https://arxiv.org/abs/1502.06807) using [Keras](https://keras.io) with the [TensorFlow](https://www.tensorflow.org/) backend using the [NYU hand pose dataset](http://cims.nyu.edu/~tompson/NYU_Hand_Pose_Dataset.htm).

This initial implementation is based heavily on the work of James Supancic in his [Deep Hand Pose](https://github.com/jsupancic/deep_hand_pose) project, which is implemented in the [Caffe](http://caffe.berkeleyvision.org/) framework, but is likely to diverge substantially as my research continues.

This file contains all the code necessary to build the first stage pose estimation model as described in Figure 1b of [Hands Deep in Deep Learning](https://arxiv.org/pdf/1502.06807v2.pdf), but does not yet implement the hand pose prior constraint.

In [None]:
from keras.models import Sequential, model_from_yaml, load_model
from keras.optimizers import SGD
from keras.callbacks import LearningRateScheduler, TensorBoard
from keras.layers import *
from keras.backend.tensorflow_backend import set_session
import tensorflow as tf
import math
import h5py
from os import path

### Loading dataset

The default dataset location is the [dataset](../dataset) subfolder of the project root.
The [.hdf5](http://www.h5py.org/) file produced by [convert_images.ipynb](convert_images#Output-dataset) is placed in this directory after processing.

In [None]:
DATASET_DIR      = '../dataset'
dataset          = h5py.File(path.join(DATASET_DIR, 'dataset.hdf5'))

test_images      = dataset['image/test']
test_labels      = dataset['label/test']

train_images     = dataset['image/train']
train_labels     = dataset['label/train']

pca_eigenvectors = dataset['pca/eigenvectors']
pca_mean         = dataset['pca/mean']

### Loss function

The loss function used in the [Deep Hand Pose](https://github.com/jsupancic/deep_hand_pose/blob/master/examples/deep_hand_pose/oberweger-bgn.prototxt#L574) project is [Caffe's Euclidean loss](http://caffe.berkeleyvision.org/tutorial/layers/euclideanloss.html), which is computed as

$$ \frac 1 {2N} \sum_{i=1}^N \| {x_1}_i - {x_2}_i \|^2 $$

In [None]:
def euclidean(y_true, y_pred):
    return tf.reduce_sum((y_true - y_pred) ** 2)

### Learning rate decay policy

The learning rate decay policy used in [Deep Hand Pose](https://github.com/jsupancic/deep_hand_pose/blob/master/examples/deep_hand_pose/solver.prototxt#L24) is [Caffe's 'inv' policy](https://github.com/jsupancic/deep_hand_pose/blob/master/src/caffe/proto/caffe.proto#L162), where the current learning rate is defined to be

$$ r(i) = \frac {r(0)} {(1 + \gamma i) ^ p} $$

In [None]:
def inv_decay(base_lr, gamma, power):
    def decay(epoch):
        return base_lr * (1 + gamma * epoch) ** (-power)
    
    return decay

### Xavier initializer

The convolutional layers in the [Deep Hand Pose](https://github.com/jsupancic/deep_hand_pose/blob/master/examples/deep_hand_pose/oberweger-pca.prototxt#L53) project use [Caffe's Xavier filler](http://caffe.berkeleyvision.org/doxygen/classcaffe_1_1XavierFiller.html#details), which is computed as

$$ x \sim U(-a, +a) $$

where

$$ a = \sqrt {\frac 3 n} $$

and $ n $, by default, is the number of inputs to the layer (fan in).

Keras has a similar [he_uniform](https://github.com/fchollet/keras/blob/master/keras/initializations.py#L80) initializer, where

$$ a = \sqrt {\frac 6 n} $$

and $ n $ is the number of inputs to the layer.

In [None]:
def xavier(shape, name=None, dim_ordering='th'):
    fan_in, fan_out = initializations.get_fans(shape, dim_ordering=dim_ordering)
    s = np.sqrt(3. / fan_in)
    return initializations.uniform(shape, s, name=name)

### Gaussian initializer

The fully connected layers in the [Deep Hand Pose](https://github.com/jsupancic/deep_hand_pose/blob/master/examples/deep_hand_pose/oberweger-pca.prototxt#L195) project use [Caffe's Gaussian filler](http://caffe.berkeleyvision.org/doxygen/classcaffe_1_1GaussianFiller.html#details) with various standard deviations.

In [None]:
def gaussian(std_dev):
    def init(shape, name=None):
        return initializations.normal(shape, std_dev, name)
    
    return init

In [None]:
config = tf.ConfigProto()
config.gpu_options.allow_growth=True
set_session(tf.Session(config=config))

### Network layers

This network is very similar to that implemented in the [Deep Hand Pose](https://github.com/jsupancic/deep_hand_pose/blob/master/examples/deep_hand_pose/oberweger-pca.prototxt) project.

In [None]:
model = Sequential([
        Convolution2D(
            nb_filter   = 8,
            nb_row      = 5,
            nb_col      = 5,
            init        = xavier,
            input_shape = (128, 128, 1)
        ),
        MaxPooling2D(
            pool_size   = (2, 2)
        ),
        LeakyReLU(
            alpha       = 0.05
        ),
        Convolution2D(
            nb_filter   = 8,
            nb_row      = 5,
            nb_col      = 5,
            init        = xavier
        ),
        MaxPooling2D(
            pool_size   = (2, 2)
        ),
        LeakyReLU(
            alpha       = 0.05
        ),
        Convolution2D(
            nb_filter   = 8,
            nb_row      = 5,
            nb_col      = 5,
            init        = xavier
        ),
        LeakyReLU(
            alpha       = 0.05
        ),
        Flatten(),
        Dense(
            output_dim  = 1024,
            init        = gaussian(std_dev=0.01),
            activation  = 'relu'
        ),
        Dense(
            output_dim  = 1024,
            init        = gaussian(std_dev=0.05),
            activation  = 'relu'
        ),
        Dense(
            output_dim  = 22,
            init        = gaussian(std_dev=0.02)
        ),
        Dense(
            output_dim  = 28,
            weights     = (pca_eigenvectors, pca_mean),
            trainable   = False
        )
    ])

### Optimizer

The optimizer used in the [Deep Hand Pose](https://github.com/jsupancic/deep_hand_pose/blob/master/examples/deep_hand_pose/solver.prototxt#L37) project is [Caffe's stochastic gradient descent](http://caffe.berkeleyvision.org/tutorial/solver.html#sgd), with an [initial learning rate](https://github.com/jsupancic/deep_hand_pose/blob/master/examples/deep_hand_pose/solver.prototxt#L17) of 0.000005 and a [momentum](https://github.com/jsupancic/deep_hand_pose/blob/master/examples/deep_hand_pose/solver.prototxt#L19) of 0.9.

As mentioned [above](#Loss-function), the loss function used is Euclidean loss.

In [None]:
model.compile(
    optimizer = SGD(
        lr       = 0.000005,
        momentum = 0.9
    ),
    loss         = euclidean
)

### Training

Similarly to the Deep Hand Pose project, we perform [40000 epochs](https://github.com/jsupancic/deep_hand_pose/blob/master/examples/deep_hand_pose/solver.prototxt#L31) of training with batches of [64 images](https://github.com/jsupancic/deep_hand_pose/blob/master/examples/deep_hand_pose/oberweger-pca.prototxt#L14).

As mentioned [above](#Learning-rate-decay-policy), the learning rate decay used is Caffe's 'inv' policy, with an [initial learning rate](https://github.com/jsupancic/deep_hand_pose/blob/master/examples/deep_hand_pose/solver.prototxt#L17) of 0.000005, a [gamma](https://github.com/jsupancic/deep_hand_pose/blob/master/examples/deep_hand_pose/solver.prototxt#L25) of 0.0001, and a [power](https://github.com/jsupancic/deep_hand_pose/blob/master/examples/deep_hand_pose/solver.prototxt#L26) of 0.75.

In [None]:
model.fit(
    train_images,
    train_labels,
    batch_size = 64,
    shuffle    = 'batch',
    nb_epoch   = 40000,
    callbacks  = [
        LearningRateScheduler(
            inv_decay(
                base_lr = 0.000005,
                gamma   = 0.0001,
                power   = 0.75
        ))
    ])

### Evaluation

We evaluate the model on the training set with a batch size of 64, measuring the Euclidean loss.

In [None]:
model.evaluate(
    test_images,
    test_labels,
    batch_size = 64
)