Find file History

README.md

StreetView Tensorflow Recurrent End-to-End Transcription (STREET) Model.

A TensorFlow implementation of the STREET model described in the paper:

"End-to-End Interpretation of the French Street Name Signs Dataset"

Raymond Smith, Chunhui Gu, Dar-Shyang Lee, Huiyi Hu, Ranjith Unnikrishnan, Julian Ibarz, Sacha Arnoud, Sophia Lin.

International Workshop on Robust Reading, Amsterdam, 9 October 2016.

Available at: http://link.springer.com/chapter/10.1007%2F978-3-319-46604-0_30

Contact

Author: Ray Smith (rays@google.com).

Pull requests and issues: @theraysmith.

Contents

Introduction

The STREET model is a deep recurrent neural network that learns how to identify the name of a street (in France) from an image containing upto four different views of the street name sign. The model merges information from the different views and normalizes the text to the correct format. For example:

Example image

Avenue des Sapins

Installing and setting up the STREET model

Install Tensorflow

Install numpy:

sudo pip install numpy

Build the LSTM op:

cd cc
TF_INC=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())')
g++ -std=c++11 -shared rnn_ops.cc -o rnn_ops.so -fPIC -I $TF_INC -O3 -mavx

(Note: if running on Mac, add -undefined dynamic_lookup to your g++ command. If you are running a newer version of gcc, you may also need to add -D_GLIBCXX_USE_CXX11_ABI=0.)

Run the unittests:

cd ../python
python decoder_test.py
python errorcounter_test.py
python shapes_test.py
python vgslspecs_test.py
python vgsl_model_test.py

Downloading the datasets

The French Street Name Signs (FSNS) dataset is split into subsets, each of which is composed of multiple files. Note that these datasets are very large. The approximate sizes are:

  • Train: 512 files of 300MB each.
  • Validation: 64 files of 40MB each.
  • Test: 64 files of 50MB each.
  • Testdata: some smaller data files of a few MB for testing.
  • Total: ~158 Gb.

Here is a list of the download paths:

https://download.tensorflow.org/data/fsns-20160927/charset_size=134.txt
https://download.tensorflow.org/data/fsns-20160927/test/test-00000-of-00064
...
https://download.tensorflow.org/data/fsns-20160927/test/test-00063-of-00064
https://download.tensorflow.org/data/fsns-20160927/testdata/arial-32-00000-of-00001
https://download.tensorflow.org/data/fsns-20160927/testdata/fsns-00000-of-00001
https://download.tensorflow.org/data/fsns-20160927/testdata/mnist-sample-00000-of-00001
https://download.tensorflow.org/data/fsns-20160927/testdata/numbers-16-00000-of-00001
https://download.tensorflow.org/data/fsns-20160927/train/train-00000-of-00512
...
https://download.tensorflow.org/data/fsns-20160927/train/train-00511-of-00512
https://download.tensorflow.org/data/fsns-20160927/validation/validation-00000-of-00064
...
https://download.tensorflow.org/data/fsns-20160927/validation/validation-00063-of-00064

All URLs are stored in the text file python/fsns_urls.txt, to download them in parallel:

aria2c -c -j 20 -i fsns_urls.txt

If you ctrl+c and re-execute the command it will continue the aborted download.

Confidence Tests

The datasets download includes a directory testdata that contains some small datasets that are big enough to test that models can actually learn something. Assuming that you have put the downloads in directory data alongside python then you can run the following tests:

Mnist for zero-dimensional data

cd python
train_dir=/tmp/mnist
rm -rf $train_dir
python vgsl_train.py --model_str='16,0,0,1[Ct5,5,16 Mp3,3 Lfys32 Lfxs64]O0s12' \
  --max_steps=1024 --train_data=../data/testdata/mnist-sample-00000-of-00001 \
  --initial_learning_rate=0.001 --final_learning_rate=0.001 \
  --num_preprocess_threads=1 --train_dir=$train_dir
python vgsl_eval.py --model_str='16,0,0,1[Ct5,5,16 Mp3,3 Lfys32 Lfxs64]O0s12' \
  --num_steps=256 --eval_data=../data/testdata/mnist-sample-00000-of-00001 \
  --num_preprocess_threads=1 --decoder=../testdata/numbers.charset_size=12.txt \
  --eval_interval_secs=0 --train_dir=$train_dir --eval_dir=$train_dir/eval

Depending on your machine, this should run in about 1 minute, and should obtain error rates below 50%. Actual error rates will vary according to random initialization.

Fixed-length targets for number recognition

cd python
train_dir=/tmp/fixed
rm -rf $train_dir
python vgsl_train.py --model_str='8,16,0,1[S1(1x16)1,3 Lfx32 Lrx32 Lfx32]O1s12' \
  --max_steps=3072 --train_data=../data/testdata/numbers-16-00000-of-00001 \
  --initial_learning_rate=0.001 --final_learning_rate=0.001 \
  --num_preprocess_threads=1 --train_dir=$train_dir
python vgsl_eval.py --model_str='8,16,0,1[S1(1x16)1,3 Lfx32 Lrx32 Lfx32]O1s12' \
  --num_steps=256 --eval_data=../data/testdata/numbers-16-00000-of-00001 \
  --num_preprocess_threads=1 --decoder=../testdata/numbers.charset_size=12.txt \
  --eval_interval_secs=0 --train_dir=$train_dir --eval_dir=$train_dir/eval

Depending on your machine, this should run in about 1-2 minutes, and should obtain a label error rate between 50 and 80%, with word error rates probably not coming below 100%. Actual error rates will vary according to random initialization.

OCR-style data with CTC

cd python
train_dir=/tmp/ctc
rm -rf $train_dir
python vgsl_train.py --model_str='1,32,0,1[S1(1x32)1,3 Lbx100]O1c105' \
  --max_steps=4096 --train_data=../data/testdata/arial-32-00000-of-00001 \
  --initial_learning_rate=0.001 --final_learning_rate=0.001 \
  --num_preprocess_threads=1 --train_dir=$train_dir &
python vgsl_eval.py --model_str='1,32,0,1[S1(1x32)1,3 Lbx100]O1c105' \
  --num_steps=256 --eval_data=../data/testdata/arial-32-00000-of-00001 \
  --num_preprocess_threads=1 --decoder=../testdata/arial.charset_size=105.txt \
  --eval_interval_secs=15 --train_dir=$train_dir --eval_dir=$train_dir/eval &
tensorboard --logdir=$train_dir

Depending on your machine, the background training should run for about 3-4 minutes, and should obtain a label error rate between 10 and 50%, with correspondingly higher word error rates and even higher sequence error rate. Actual error rates will vary according to random initialization. The background eval will run for ever, and will have to be terminated by hand. The tensorboard command will run a visualizer that can be viewed with a browser. Go to the link that it prints to view tensorboard and see the training progress. See the Tensorboard introduction for more information.

Mini FSNS dataset

You can test the actual STREET model on a small FSNS data set. The model will overfit to this small dataset, but will give some confidence that everything is working correctly. Note that this test runs the training and evaluation in parallel, which is something that you should do when training any substantial system, so you can monitor progress.

cd python
train_dir=/tmp/fsns
rm -rf $train_dir
python vgsl_train.py --max_steps=10000 --num_preprocess_threads=1 \
  --train_data=../data/testdata/fsns-00000-of-00001 \
  --initial_learning_rate=0.0001 --final_learning_rate=0.0001 \
  --train_dir=$train_dir &
python vgsl_eval.py --num_steps=256 --num_preprocess_threads=1 \
   --eval_data=../data/testdata/fsns-00000-of-00001 \
   --decoder=../testdata/charset_size=134.txt \
   --eval_interval_secs=300 --train_dir=$train_dir --eval_dir=$train_dir/eval &
tensorboard --logdir=$train_dir

Depending on your machine, the training should finish in about 1-2 hours. As with the CTC testset above, the eval and tensorboard will have to be terminated manually.

Training a full FSNS model

After running the tests above, you are ready to train the real thing! Note that you might want to use a train_dir somewhere other than /tmp as you can stop the training, reboot if needed and continue if you keep the data intact, but /tmp gets deleted on a reboot.

cd python
train_dir=/tmp/fsns
rm -rf $train_dir
python vgsl_train.py --max_steps=100000000 --train_data=../data/train/train* \
  --train_dir=$train_dir &
python vgsl_eval.py --num_steps=1000 \
  --eval_data=../data/validation/validation* \
  --decoder=../testdata/charset_size=134.txt \
  --eval_interval_secs=300 --train_dir=$train_dir --eval_dir=$train_dir/eval &
tensorboard --logdir=$train_dir

Training will take a very long time (probably many weeks) to reach minimum error rate on a single machine, although it will probably take substatially fewer iterations than with parallel training. Faster training can be obtained with parallel training on a cluster. Since the setup is likely to be very site-specific, please see the TensorFlow documentation on Distributed TensorFlow for more information. Some code changes may be needed in the Train function in vgsl_model.py.

With 40 parallel training workers, nearly optimal error rates (about 25% sequence error on the validation set) are obtained in about 30 million steps, although the error continues to fall slightly over the next 30 million, to perhaps as low as 23%.

With a single machine the number of steps could be substantially lower. Although untested on this problem, on other problems the ratio is typically 5 to 1 so low error rates could be obtained as soon as 6 million iterations, which could be reached in about 4 weeks.

The Variable Graph Specification Language

The STREET model makes use of a graph specification language (VGSL) that enables rapid experimentation with different model architectures. The language defines a Tensor Flow graph that can be used to process images of variable sizes to output a 1-dimensional sequence, like a transcription/OCR problem, or a 0-dimensional label, as for image identification problems. For more information see vgslspecs