A small C++ implementation of LSTM networks, focused on OCR.
Clone or download
Latest commit 180771b Dec 8, 2017
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
OLD Remove test-timing test (relied on functionality removed in 37a4c97) Oct 28, 2016
docker Dockerfile, instructions in README Dec 9, 2016
misc Fixed small problems in Python interface. Oct 27, 2016
.clang-format Reformatted to Google standards. Aug 7, 2015
.gitignore gitignore SWIG-generated, test-*, DONE, unignore OLD Oct 28, 2016
.travis.yml more compiler version hacking Jul 17, 2015
LICENSE Added missing LICENSE file. clstm is covered by the same Apache 2 Apr 9, 2015
Makefile More GPU support; test-gpu now compiles and runs. Nov 12, 2015
README.md Trim README Dec 8, 2017
SConstruct Merge pull request #105 from kba/openmp Dec 8, 2017
TODO.md Time/batch switch computation implemented. Nov 29, 2015
batches.cc fix isnan() error on gcc 5 on ubuntu 16.04 LTS Apr 25, 2016
batches.h Solve the NaN issue on the CPU Feb 24, 2016
clstm.cc Fixed unitialized gradient bug. Feb 11, 2016
clstm.h Fix typo (found by codespell) Dec 5, 2016
clstm.i Fixed small problems in Python interface. Oct 27, 2016
clstm.proto Portability modifications. Small bug fixes. Jun 4, 2015
clstm_compute.cc Fixed unitialized gradient bug. Feb 11, 2016
clstm_compute.h Reformat. Better error checking on load/save. Jan 6, 2016
clstm_compute_cuda.cc tests work again (without gpu) Nov 10, 2015
clstm_prefab.cc Reformat. Better error checking on load/save. Jan 6, 2016
clstm_proto.cc Merge pull request #53 from amitdo/training-load-model-fix Jan 28, 2016
clstmfilter.cc Reformat. Better error checking on load/save. Jan 6, 2016
clstmfiltertrain.cc clstm.h: network_info was ambiguous for SWIG Oct 28, 2016
clstmhl.h Use input size of loaded net as target_height. Nov 11, 2016
clstmocr.cc Simpler check for -h/--help Oct 29, 2016
clstmocrtrain.cc Fixed return value out-of-order with what the calling function expect… Dec 4, 2016
ctc.cc Reformat. Better error checking on load/save. Jan 6, 2016
curun opt static linking, Docker CUDA Nov 4, 2015
display_server.py moved files over from ocropy/clstm subdirectory Jan 13, 2015
enroll.h fixed layout Oct 10, 2015
extras.cc APIs for internal state/params. Reformat. Jan 28, 2016
extras.h Reformat. Better error checking on load/save. Jan 6, 2016
numpyarray.h Reindent. Nov 19, 2015
nvcc-wrapper Now test-lstm works with CUDA on GPU. Nov 24, 2015
pstring.h Reformat. Better error checking on load/save. Jan 6, 2016
pytensor.h Reformat. Better error checking on load/save. Jan 6, 2016
run-cmu tests scripts: use /bin/bash, not /bin/sh (force push for circleci) Oct 28, 2016
run-gprof tests scripts: use /bin/bash, not /bin/sh (force push for circleci) Oct 28, 2016
run-profile tests scripts: use /bin/bash, not /bin/sh (force push for circleci) Oct 28, 2016
run-tests tests scripts: use /bin/bash, not /bin/sh (force push for circleci) Oct 28, 2016
run-uw3-500 Rewritten most mdarray code with Tensor Oct 21, 2015
setup.py Fixed small problems in Python interface. Oct 27, 2016
tensor.cc Reindent. Nov 19, 2015
tensor.h APIs for internal state/params. Reformat. Jan 28, 2016
test-2d.cc Reformat. Better error checking on load/save. Jan 6, 2016
test-batchlstm.cc test-batchlstm: Add missing include file Oct 28, 2016
test-cderiv.cc Reformat. Better error checking on load/save. Jan 6, 2016
test-ctc.cc Reformat. Better error checking on load/save. Jan 6, 2016
test-deriv.cc Reformat. Better error checking on load/save. Jan 6, 2016
test-edit.cc Reformat. Better error checking on load/save. Jan 6, 2016
test-filter.sh Files are now cleaned up after tests. Oct 21, 2015
test-lstm.cc Fixed small problems in Python interface. Oct 27, 2016
test-lstm.py simplified and unified gradient clipping Oct 23, 2015
test-lstm2.cc Changed error handling in get/set params/weights. Feb 10, 2016
test-ocr.sh Files are now cleaned up after tests. Oct 21, 2015
test.cu more nvidia-related stuff, still not working Nov 9, 2015
utils.h APIs for internal state/params. Reformat. Jan 28, 2016

README.md

clstm

CircleCI

CLSTM is an implementation of the LSTM recurrent neural network model in C++, using the Eigen library for numerical computations.

Status and scope

CLSTM is mainly in maintenance mode now. It was created at a time when there weren't a lot of good LSTM implementations around, but several good options have become available over the last year. Nevertheless, if you need a small library for text line recognition with few dependencies, CLSTM is still a good option.

Installation using Docker

You can train and run clstm without installation to the local machine using the docker image, which is based on Ubuntu 16.04. This is the best option for running clstm on a Windows host.

You can either run the last version of the clstm image from Docker Hub or build the Docker image from the repo (see ./docker/Dockerfile).

The command line syntax differs from a native installation:

docker run --rm -it -e [VARIABLES...] kbai/clstm BINARY [ARGS...]

is equivalent to

[VARIABLES...] BINARY [ARGS...]

For example:

docker run --rm -it -e ntrain=1000 kbai/clstm clstmocrtrain traininglist.txt

is equivalent to

ntrain=1000 clstmocrtrain traininglist.txt

Installation from source

Prerequisites

  • scons, swig, Eigen
  • protocol buffer library and compiler
  • libpng
  • Optional: HDF5, ZMQ, Python
# Ubuntu 15.04, 16.04 / Debian 8, 9
sudo apt-get install scons libprotobuf-dev protobuf-compiler libpng-dev libeigen3-dev swig

# Ubuntu 14.04:
sudo apt-get install scons libprotobuf-dev protobuf-compiler libpng-dev swig

The Debian repositories jessie-backports and stretch include sufficiently new libeigen3-dev packages.

It is also possible to download Eigen with Tensor support (> v3.3-beta1) and copy the header files to an include path:

# with wget
wget 'https://github.com/RLovelett/eigen/archive/3.3-rc1.tar.gz'
tar xf 3.3-rc1.tar.gz
rm -f /usr/local/include/eigen3
mv eigen-3.3-rc1 /usr/local/include/eigen3
# or with git:
sudo git clone --depth 1 --single-branch --branch 3.3-rc1 \
  "https://github.com/RLovelett/eigen" /usr/local/include/eigen3

To use the visual debugging methods, additionally:

# Ubuntu 15.04:
sudo apt-get install libzmq3-dev libzmq3 libzmqpp-dev libzmqpp3 libpng12-dev

For HDF5, additionally:

# Ubuntu 15.04:
sudo apt-get install hdf5-helpers libhdf5-8 libhdf5-cpp-8 libhdf5-dev python-h5py

# Ubuntu 14.04:
sudo apt-get install hdf5-helpers libhdf5-7 libhdf5-dev python-h5py

Building

To build a standalone C library, run

scons
sudo scons install

There are a bunch of options:

  • debug=1 build with debugging options, no optimization
  • display=1 build with display support for debugging (requires ZMQ, Python)
  • prefix=... install under a different prefix (untested)
  • eigen=... where to look for Eigen include files (should contain Eigen/Eigen)
  • openmp=... build with multi-processing support. Set the OMP_NUM_THREADS environment variable to the number of threads for Eigen to use.
  • hdf5lib=hdf5 what HDF5 library to use; enables HDF5 command line programs (may need hdf5_serial in some environments)

Running the tests

After building the executables, you can run two simple test runs as follows:

  • run-cmu will train an English-to-IPA LSTM
  • run-uw3-500 will download a small OCR training/test set and train an OCR LSTM

There is a full set of tests in the current version of clstm; just run them with:

./run-tests

This will check:

  • gradient checkers for layers and compute steps
  • training a simple model through the C++ API
  • training a simple model through the Python API
  • checking the command line training tools, including loading and saving

Python bindings

To build the Python extension, run

python setup.py build
sudo python setup.py install

(this is currently broken)

Documentation / Examples

You can find some documentation and examples in the form of iPython notebooks in the misc directory (these are version 3 notebooks and won't open in older versions).

You can view these notebooks online here: http://nbviewer.ipython.org/github/tmbdev/clstm/tree/master/misc/

C++ API

The clstm library operates on the Sequence type as its fundamental data type, representing variable length sequences of fixed length vectors. The underlying Sequence type is a rank 4 tensor with accessors for individual rank-2 tensors at different time steps.

Networks are built from objects implementing the INetwork interface. The INetwork interface contains:

struct INetwork {
    Sequence inputs, d_inputs;      // input sequence, input deltas
    Sequence outputs, d_outputs;    // output sequence, output deltas
    void forward();                 // propagate inputs to outputs
    void backward();                // propagate d_outputs to d_inputs
    void update();                  // update weights from the last backward() step
    void setLearningRate(Float,Float); // set learning rates
    ...
};

Network structures can be hierarchical and there are some network implementations whose purpose it is to combine other networks into more complex structures.

struct INetwork {
    ...
    vector<shared_ptr<INetwork>> sub;
    void add(shared_ptr<INetwork> net);
    ...
};

At its lowest level, layers are created by:

  • create an instance of the layer with make_layer
  • set any parameters (including ninput and noutput) as attributes
  • add any sublayers to the sub vector
  • call initialize()

There are three different functions for constructing layers and networks:

  • make_layer(kind) looks up the constructor and gives you an uninitialized layer
  • layer(kind,ninput,noutput,args,sub) performs all initialization steps in sequence
  • make_net(kind,args) initializes a whole collection of layers at once
  • make_net_init(kind,params) is like make_net, but parameters are given in string form

The layer(kind,ninput,noutput,args,sub) function will perform these steps in sequence.

Layers and networks are usually passed around as shared_ptr<INetwork>; there is a typedef of this calling it Network.

This can be used to construct network architectures in C++ pretty easily. For example, the following creates a network that stacks a softmax output layer on top of a standard LSTM layer:

Network net = layer("Stacked", ninput, noutput, {}, {
    layer("LSTM", ninput, nhidden,{},{}),
    layer("SoftmaxLayer", nhidden, noutput,{},{})
});

Note that you need to make sure that the number of input and output units are consistent between layers.

In addition to these basic functions, there is also a small implementation of CTC alignment.

The C++ code roughly follows the lstm.py implementation from the Python version of OCRopus. Gradients have been verified for the core LSTM implementation, although there may be still be bugs in other parts of the code.

There is also a small multidimensional array class in multidim.h; that isn't used in the core LSTM implementation, but it is used in debugging and testing code, for plotting, and for HDF5 input/output. Unlike Eigen, it uses standard C/C++ row major element order, as libraries like HDF5 expect. (NB: This will be replaced with Eigen::Tensor.)

LSTM models are stored in protocol buffer format (clstm.proto), although adding new formats is easy. There is an older HDF5-based storage format.

Python API

The clstm.i file implements a simple Python interface to clstm, plus a wrapper that makes an INetwork mostly a replacement for the lstm.py implementation from ocropy.

Command Line Drivers

There are several command line drivers:

  • clstmfiltertrain training-data test-data learns text filters;
    • input files consiste of lines of the form "inputoutput"
  • clstmfilter applies learned text filters
  • clstmocrtrain training-images test-images learns OCR (or image-to-text) transformations;
    • input files are lists of text line images; the corresponding UTF-8 ground truth is expected in the corresponding .gt.txt file
  • clstmocr applies learned OCR models

In addition, you get the following HDF5-based commands:

  • clstmseq learns sequence-to-sequence mappings
  • clstmctc learns sequence-to-string mappings using CTC alignment
  • clstmtext learns string-to-string transformations

Note that most parameters are passed through the environment:

lrate=3e-5 clstmctc uw3-dew.h5

See the notebooks in the misc/ subdirectory for documentation on the parameters and examples of usage.

(You can find all parameters via grep 'get.env' *.cc.)