Skip to content

One-shot Learning: Learning from fewer dataset with a single or few training examples. Exploration of method and techniques for state-of-the-art results

License

Notifications You must be signed in to change notification settings

victor-iyi/few-shot-learning

Repository files navigation

Few Shot Learning

The process of learning good features for machine learning applications can be very computationally expensive and may prove difficult in cases where little data is available. A prototypical example of this is the one-shot learning setting, in which we must correctly make predictions given only a single example of each new class.

Here, I explored the power of One-Shot Learning with a popular model called "Siamese Neural Network".

Table of Contents

Setup

Clone or Download

Fire up your favorite command line utility (e.g. Terminal, iTerm or Command Prompt), and type the following commands to clone the project.

$ git clone https://github.com/victor-iyiola/few-shot-learning.git
$ cd few-shot-learning && ls
LICENSE        README.md      datasets       images         omniglot       one-shot.ipynb          utils.py

Or simply download this repository, and change your working directory to the downloaded project.

$ cd path/to/few-shot-learning
$ ls
LICENSE        README.md      datasets       images         omniglot       one-shot.ipynb          utils.py

Note: ls command will be changed to dir for Windows users.

Install Third-Party Dependency Requirements

This project was developed with Python v3.6.5. However any higher version of Python works fine.

$ pip3 install --upgrade -r requirements.txt
$ jupyter notebook
[I 08:36:52.271 LabApp] The Jupyter Notebook is running at:
[I 08:36:52.271 LabApp] http://localhost:8888/?token=cb246f438ca40a1a319d12c877d2e825c923fc0525c9d136
[I 08:36:52.271 LabApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 08:36:52.273 LabApp]

    Copy/paste this URL into your browser when you connect for the first time,
    to login with a token:
        http://localhost:8888/?token=cb246f438ca40a1a319d12c877d2e825c923fc0525c9d136

Architecture

Model

A standard Siamese Convolutional Neural Network with $L$ layers each with $N_l$ units is used, where $h_{1, l}$ represents the hidden vector in layer $l$ for the first twin, and $h_{2,l}$ denotes the same for the second twin. Rectified Linear (ReLU) Units is exclusively used in the first $L-2$ layers and sigmoidal units in the remaining layers.

The model consists of a sequence of convolutional layers, each of which uses a single channel with filters of varying size and a fixed stride of 1. The number of convolutional filters is specified as a multiple of 16 to optimize performance. The network applies a ReLU activation function to the output feature maps, optionally followed by max-pooling with a filter size and stride of 2. Thus the $k^{th}$ filter map in each layer takes the following form:

$$ a^{(k)}{1, m} = \textrm{max-pool}(max(0, W^{(k)}{l-1} \star h_{1, (l-1)} + b_l), 2) $$

$$ a^{(k)}{2, m} = \textrm{max-pool}(max(0, W^{(k)}{l-1} \star h_{2, (l-1)} + b_l), 2) $$

where $W^{l-1, l}$ is the 3-dimensional tensor representing the feature maps for layer $l$ and $\star$ is the valid convolutional operation corresponding to returning only those output units which were the result of complete overlap between each convolutional filter and the input feature maps.

Best convolutional architecture selected for verification task. Siamese twin is not depicted, but joins immediately after the 4096 unit fully-connected layer where the L1 component-wise distance between vectors is computed.

Learning

Loss function. Let $M$ represent the mini-batch size, where $i$ indexes the $i^{th}$ mini-batch. Now let $y(x^{(i)}_1, x^{(i)}_2)$ be a length-$M$ vector which contains the labels for the mini-batch, where $y(x^{(i)}_1, x^{(i)}_2) = 1$ whenever $x_1$ and $x_2$ are from the same character class and $y(x^{(i)}_1, x^{(i)}_2) = 0$ otherwise. A regularized cross-entropy objective is imposed on a binary classifier of the following form:

$$ \ell(x^{(i)}_1, x^{(i)}_2) = y(x^{(i)}_1, x^{(i)}_2)\log{p(x^{(i)}_1, x^{(i)}_2)} + (1 - y(x^{(i)}_1, x^{(i)}_2)) \log{(1 - p(x^{(i)}_1, x^{(i)}_2))} + \lambda^T\big|w\big|^2 $$

Credits

Contribution

You are very welcome to modify and use them in your own projects.

Please keep a link to the original repository. If you have made a fork with substantial modifications that you feel may be useful, then please open a new issue on GitHub with a link and short description.

License (MIT)

This project is opened under the MIT 2.0 License which allows very broad use for both academic and commercial purposes.

A few of the images used for demonstration purposes may be under copyright. These images are included under the "fair usage" laws.

About

One-shot Learning: Learning from fewer dataset with a single or few training examples. Exploration of method and techniques for state-of-the-art results

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published