https://docs.google.com/document/d/1L11EjK0uObqjaBhHNcVPxeyIripGHSUaoEWGypuuVtk/pub

# Description

Objective: Build a live camera app that can interpret number strings in real-world images.

In this project, you will train a model that can decode sequences of digits from natural images, and create an app that prints the numbers it sees in real time. You may choose to implement your project as a simple Python script, a web app/service or an Android app.

## Step 1: Design and test a model architecture that can identify sequences of digits in an image.

Design and implement a deep learning model that learns to recognize sequences of digits. Train it using synthetic data first (recommended) or directly use real-world data (see step 2).

There are various aspects to consider when thinking about this problem:
- Your model can be derived from a deep neural net or a convolutional network.
- You could experiment sharing or not the weights between the softmax classifiers.
- You can also use a recurrent network in your deep neural net to replace the classification layers and directly emit the sequence of digits one-at-a-time.

To help you develop your model, the simplest path is likely to generate a synthetic dataset by concatenating character images from [notMNIST](https://www.google.com/url?q=http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html&sa=D&ust=1476236462755000&usg=AFQjCNHAxfxTiYixUmTSqZNPuGhfXBg6Cg) or [MNIST](https://www.google.com/url?q=http://yann.lecun.com/exdb/mnist/&sa=D&ust=1476236462756000&usg=AFQjCNFiIq8Kn5_Lm-p8qu_L67mAttbsfg). This can provide you with a quick way to run experiments. (Or you can go directly to the real-world dataset of Step 2.)

In order to produce a synthetic sequence of digits for testing, you can for example limit yourself to sequences up to five digits, and use five classifiers on top of your deep network. You would have to incorporate an additional ‘blank’ character to account for shorter number sequences.

Here is for example [a published baseline model](https://www.google.com/url?q=http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/42241.pdf&sa=D&ust=1476236462758000&usg=AFQjCNFYKwUmWGu1HE0PYqPHYt5l4N66pw) on this problem ([video](https://www.google.com/url?q=https://www.youtube.com/watch?v%3DvGPI_JvLoN0&sa=D&ust=1476236462759000&usg=AFQjCNFVU9fIkilx9J_FKlBBuYRZ9CGWoQ)).

- _What approach did you take in coming up with a solution to this problem?_
- _What does your final architecture look like? (Type of model, layers, sizes, connectivity, etc.)_
- _How did you train your model? Did you generate a synthetic dataset (if so, explain how)?_

In [1]:
# build baseline model http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/42241.pdf
# Multi-digit Number Recognition
# Deep Convolutional Neural Networks

# In this paper we propose a unified approach that integrates these three steps via the use of a deep convolutional neural network that operates directly on the image pixels.
# We employ the DistBelief (Dean et al., 2012) implementation of deep neural networks
# We find that the performance of this approach increases with the depth of the convolutional network,
# with the best performance occurring in the deepest architecture we trained, with eleven hidden layers.
# We evaluate this approach on the publicly available SVHN dataset and achieve over 96% accuracy in recognizing complete street numbers
# We show that on a per-digit recognition task, we improve upon the state-of-the-art and achieve 97.84% accuracy.

In [1]:
from multi_digit_mnist_data import MultiDigitMNISTData
from multi_digit_svhn_data import MultiDigitSVHNData
from multi_digit_model import MultiDigitModel

# digit 1

In [3]:
multi_digit_dataset = MultiDigitMNISTData(1)
multi_digit_dataset.load_data()
MultiDigitModel(multi_digit_dataset=multi_digit_dataset,
                        digit_count=multi_digit_dataset.digit_count,
                        num_steps=2001, 
                        batch_size=64, 
                        num_convs=[8,16,32,32,32,32,32,32], 
                        num_fc_1=64, 
                        num_fc_2=128,
                        drop_out_rate=0.5,
                        learning_rate_start=0.01,
                        learning_rate_decay_steps=10000,
                        learning_rate_decay_rate=0.96,
                       ).run()

('multi_digit_mnist_data', (70000, 784))
('multi_digit_mnist_target_length', (70000, 1))
('multi_digit_mnist_target_digits', (70000, 1))
('train_data', (63000, 28, 28, 1))
('train_label_length', (63000, 1, 2))
('train_label_digits', (63000, 1, 10))
('validation_data', (3500, 28, 28, 1))
('validation_label_length', (3500, 1, 2))
('validation_label_digits', (3500, 1, 10))
('test_data', (3500, 28, 28, 1))
('test_label_length', (3500, 1, 2))
('test_label_digits', (3500, 1, 10))
tf_train_dataset : Tensor("Placeholder:0", shape=(64, 28, 28, 1), dtype=float32)
tf_train_length_labels : Tensor("Placeholder_1:0", shape=(64, 2), dtype=float32)
tf_train_digits_labels : Tensor("Placeholder_2:0", shape=(64, 1, 10), dtype=float32)
Initialized
Minibatch loss at step 0: 6.523658
Minibatch accuracy_length: 42.2%
Minibatch accuracy_digit_0: 9.4%
Minibatch accuracy: 4.7%
finish : 2016-11-16T08:36:47+0900
Minibatch loss at step 100: 0.964253
Minibatch accuracy_length: 100.0%
Minibatch accuracy_digit_0: 62.

# digit 2

In [2]:
multi_digit_dataset = MultiDigitMNISTData(2)
multi_digit_dataset.load_data()
MultiDigitModel(multi_digit_dataset=multi_digit_dataset,
                        digit_count=multi_digit_dataset.digit_count,
                        num_steps=2001, 
                        batch_size=32, 
                        num_convs=[8,16,32,64,0,0,0,0], 
                        num_fc_1=32, 
                        num_fc_2=32,
                       ).run()

(70000, 1568)
(70000, 1)
(70000, 2)
(63000, 28, 56, 1)
(63000, 1, 3)
(63000, 2, 10)
(3500, 28, 56, 1)
(3500, 1, 3)
(3500, 2, 10)
(3500, 28, 56, 1)
(3500, 1, 3)
(3500, 2, 10)
tf_train_dataset : Tensor("Placeholder:0", shape=(32, 28, 56, 1), dtype=float32)
tf_train_length_labels : Tensor("Placeholder_1:0", shape=(32, 3), dtype=float32)
tf_train_digits_labels : Tensor("Placeholder_2:0", shape=(32, 2, 10), dtype=float32)
Initialized
Minibatch loss at step 0: 5.926517
Minibatch accuracy_length: 3.1%
Minibatch accuracy_digit_0: 9.4%
Minibatch accuracy_digit_1: 9.4%
Minibatch accuracy: 0.0%
finish : 2016-11-11T12:25:10+0900
Minibatch loss at step 100: 2.658033
Minibatch accuracy_length: 100.0%
Minibatch accuracy_digit_0: 46.9%
Minibatch accuracy_digit_1: 59.4%
Minibatch accuracy: 25.0%
finish : 2016-11-11T12:25:20+0900
Minibatch loss at step 200: 1.261888
Minibatch accuracy_length: 100.0%
Minibatch accuracy_digit_0: 59.4%
Minibatch accuracy_digit_1: 84.4%
Minibatch accuracy: 46.9%
finish : 20

# digit 3

In [3]:
multi_digit_dataset = MultiDigitMNISTData(3)
multi_digit_dataset.load_data()
MultiDigitModel(multi_digit_dataset=multi_digit_dataset,
                        digit_count=multi_digit_dataset.digit_count,
                        num_steps=2001, 
                        batch_size=32, 
                        num_convs=[8,16,32,64,0,0,0,0], 
                        num_fc_1=32, 
                        num_fc_2=32,
                       ).run()

(70000, 2352)
(70000, 1)
(70000, 3)
(63000, 28, 84, 1)
(63000, 1, 4)
(63000, 3, 10)
(3500, 28, 84, 1)
(3500, 1, 4)
(3500, 3, 10)
(3500, 28, 84, 1)
(3500, 1, 4)
(3500, 3, 10)
tf_train_dataset : Tensor("Placeholder:0", shape=(32, 28, 84, 1), dtype=float32)
tf_train_length_labels : Tensor("Placeholder_1:0", shape=(32, 4), dtype=float32)
tf_train_digits_labels : Tensor("Placeholder_2:0", shape=(32, 3, 10), dtype=float32)
Initialized
Minibatch loss at step 0: 8.927469
Minibatch accuracy_length: 12.5%
Minibatch accuracy_digit_0: 12.5%
Minibatch accuracy_digit_1: 3.1%
Minibatch accuracy_digit_2: 0.0%
Minibatch accuracy: 3.1%
finish : 2016-11-11T13:17:25+0900
Minibatch loss at step 100: 4.602657
Minibatch accuracy_length: 100.0%
Minibatch accuracy_digit_0: 15.6%
Minibatch accuracy_digit_1: 40.6%
Minibatch accuracy_digit_2: 54.5%
Minibatch accuracy: 9.4%
finish : 2016-11-11T13:17:41+0900
Minibatch loss at step 200: 3.190919
Minibatch accuracy_length: 100.0%
Minibatch accuracy_digit_0: 28.1%
Min

In [3]:
multi_digit_dataset = MultiDigitMNISTData(3)
multi_digit_dataset.load_data()
MultiDigitModel(multi_digit_dataset=multi_digit_dataset,
                        digit_count=multi_digit_dataset.digit_count,
                        num_steps=10001, 
                        batch_size=32, 
                        num_convs=[24,32,32,64,0,0,0,0], 
                        num_fc_1=1600, 
                        num_fc_2=512,
                       ).run()

(70000, 2352)
(70000, 1)
(70000, 3)
(63000, 28, 84, 1)
(63000, 1, 4)
(63000, 3, 10)
(3500, 28, 84, 1)
(3500, 1, 4)
(3500, 3, 10)
(3500, 28, 84, 1)
(3500, 1, 4)
(3500, 3, 10)
tf_train_dataset : Tensor("Placeholder:0", shape=(32, 28, 84, 1), dtype=float32)
tf_train_length_labels : Tensor("Placeholder_1:0", shape=(32, 4), dtype=float32)
tf_train_digits_labels : Tensor("Placeholder_2:0", shape=(32, 3, 10), dtype=float32)
Initialized
Minibatch loss at step 0: 162.974777
Minibatch accuracy_length: 34.4%
Minibatch accuracy_digit_0: 9.4%
Minibatch accuracy_digit_1: 3.1%
Minibatch accuracy_digit_2: 12.0%
Minibatch accuracy: 3.1%
finish : 2016-11-11T19:59:09+0900
Minibatch loss at step 100: 4.130840
Minibatch accuracy_length: 100.0%
Minibatch accuracy_digit_0: 46.9%
Minibatch accuracy_digit_1: 59.4%
Minibatch accuracy_digit_2: 30.0%
Minibatch accuracy: 28.1%
finish : 2016-11-11T19:59:50+0900
Minibatch loss at step 200: 2.381089
Minibatch accuracy_length: 100.0%
Minibatch accuracy_digit_0: 71.9%


# digit 4

In [4]:
multi_digit_dataset = MultiDigitMNISTData(4)
multi_digit_dataset.load_data()
MultiDigitModel(multi_digit_dataset=multi_digit_dataset,
                        digit_count=multi_digit_dataset.digit_count,
                        num_steps=4001, 
                        batch_size=32, 
                        num_convs=[8,16,32,64,0,0,0,0], 
                        num_fc_1=48, 
                        num_fc_2=48,
                       ).run()

(70000, 3136)
(70000, 1)
(70000, 4)
(63000, 28, 112, 1)
(63000, 1, 5)
(63000, 4, 10)
(3500, 28, 112, 1)
(3500, 1, 5)
(3500, 4, 10)
(3500, 28, 112, 1)
(3500, 1, 5)
(3500, 4, 10)
tf_train_dataset : Tensor("Placeholder:0", shape=(32, 28, 112, 1), dtype=float32)
tf_train_length_labels : Tensor("Placeholder_1:0", shape=(32, 5), dtype=float32)
tf_train_digits_labels : Tensor("Placeholder_2:0", shape=(32, 4, 10), dtype=float32)
Initialized
Minibatch loss at step 0: 11.847860
Minibatch accuracy_length: 28.1%
Minibatch accuracy_digit_0: 12.5%
Minibatch accuracy_digit_1: 12.5%
Minibatch accuracy_digit_2: 7.7%
Minibatch accuracy_digit_3: 0.0%
Minibatch accuracy: 0.0%
finish : 2016-11-11T13:27:21+0900
Minibatch loss at step 100: 5.910389
Minibatch accuracy_length: 100.0%
Minibatch accuracy_digit_0: 18.8%
Minibatch accuracy_digit_1: 25.0%
Minibatch accuracy_digit_2: 51.9%
Minibatch accuracy_digit_3: 53.3%
Minibatch accuracy: 6.2%
finish : 2016-11-11T13:27:42+0900
Minibatch loss at step 200: 4.45311

In [2]:
multi_digit_dataset = MultiDigitMNISTData(4)
multi_digit_dataset.load_data()
MultiDigitModel(multi_digit_dataset=multi_digit_dataset,
                        digit_count=multi_digit_dataset.digit_count,
                        num_steps=10001, 
                        batch_size=32, 
                        num_convs=[24,32,32,64,0,0,0,0], 
                        num_fc_1=1600, 
                        num_fc_2=512,
                       ).run()

(70000, 3136)
(70000, 1)
(70000, 4)
(63000, 28, 112, 1)
(63000, 1, 5)
(63000, 4, 10)
(3500, 28, 112, 1)
(3500, 1, 5)
(3500, 4, 10)
(3500, 28, 112, 1)
(3500, 1, 5)
(3500, 4, 10)
tf_train_dataset : Tensor("Placeholder:0", shape=(32, 28, 112, 1), dtype=float32)
tf_train_length_labels : Tensor("Placeholder_1:0", shape=(32, 5), dtype=float32)
tf_train_digits_labels : Tensor("Placeholder_2:0", shape=(32, 4, 10), dtype=float32)
Initialized
Minibatch loss at step 0: 152.640701
Minibatch accuracy_length: 31.2%
Minibatch accuracy_digit_0: 9.4%
Minibatch accuracy_digit_1: 6.2%
Minibatch accuracy_digit_2: 12.0%
Minibatch accuracy_digit_3: 14.3%
Minibatch accuracy: 0.0%
finish : 2016-11-11T18:29:30+0900
Minibatch loss at step 100: 4.616248
Minibatch accuracy_length: 100.0%
Minibatch accuracy_digit_0: 34.4%
Minibatch accuracy_digit_1: 59.4%
Minibatch accuracy_digit_2: 50.0%
Minibatch accuracy_digit_3: 57.1%
Minibatch accuracy: 12.5%
finish : 2016-11-11T18:30:28+0900
Minibatch loss at step 200: 3.040

# digit 5

In [None]:
multi_digit_dataset = MultiDigitMNISTData(5)
multi_digit_dataset.load_data()
MultiDigitModel(multi_digit_dataset=multi_digit_dataset,
                        digit_count=multi_digit_dataset.digit_count,
                        num_steps=10001, 
                        batch_size=32, 
                        num_convs=[24,32,32,64,0,0,0,0], 
                        num_fc_1=1600, 
                        num_fc_2=512,
                       ).run()

(70000, 3920)
(70000, 1)
(70000, 5)
(63000, 28, 140, 1)
(63000, 1, 6)
(63000, 5, 10)
(3500, 28, 140, 1)
(3500, 1, 6)
(3500, 5, 10)
(3500, 28, 140, 1)
(3500, 1, 6)
(3500, 5, 10)
tf_train_dataset : Tensor("Placeholder:0", shape=(32, 28, 140, 1), dtype=float32)
tf_train_length_labels : Tensor("Placeholder_1:0", shape=(32, 6), dtype=float32)
tf_train_digits_labels : Tensor("Placeholder_2:0", shape=(32, 5, 10), dtype=float32)
Initialized
Minibatch loss at step 0: 345.649292
Minibatch accuracy_length: 9.4%
Minibatch accuracy_digit_0: 9.4%
Minibatch accuracy_digit_1: 12.5%
Minibatch accuracy_digit_2: 8.7%
Minibatch accuracy_digit_3: 0.0%
Minibatch accuracy_digit_4: 12.5%
Minibatch accuracy: 0.0%
finish : 2016-11-11T16:27:21+0900
Minibatch loss at step 100: 7.185212
Minibatch accuracy_length: 100.0%
Minibatch accuracy_digit_0: 15.6%
Minibatch accuracy_digit_1: 40.6%
Minibatch accuracy_digit_2: 28.0%
Minibatch accuracy_digit_3: 57.9%
Minibatch accuracy_digit_4: 50.0%
Minibatch accuracy: 3.1%
fi

In [4]:
multi_digit_dataset = MultiDigitMNISTData(5)
multi_digit_dataset.load_data()
MultiDigitModel(multi_digit_dataset=multi_digit_dataset,
                        digit_count=multi_digit_dataset.digit_count,
                        num_steps=10001, 
                        batch_size=32, 
                        num_convs=[24,32,32,32,32,32,32,64], 
                        num_fc_1=1024, 
                        num_fc_2=2048,
                       ).run()

(70000, 3920)
(70000, 1)
(70000, 5)
(63000, 28, 140, 1)
(63000, 1, 6)
(63000, 5, 10)
(3500, 28, 140, 1)
(3500, 1, 6)
(3500, 5, 10)
(3500, 28, 140, 1)
(3500, 1, 6)
(3500, 5, 10)
tf_train_dataset : Tensor("Placeholder:0", shape=(32, 28, 140, 1), dtype=float32)
tf_train_length_labels : Tensor("Placeholder_1:0", shape=(32, 6), dtype=float32)
tf_train_digits_labels : Tensor("Placeholder_2:0", shape=(32, 5, 10), dtype=float32)
Initialized
Minibatch loss at step 0: 3883.244629
Minibatch accuracy_length: 18.8%
Minibatch accuracy_digit_0: 12.5%
Minibatch accuracy_digit_1: 12.5%
Minibatch accuracy_digit_2: 4.0%
Minibatch accuracy_digit_3: 0.0%
Minibatch accuracy_digit_4: 0.0%
Minibatch accuracy: 0.0%
finish : 2016-11-11T21:08:58+0900
Minibatch loss at step 100: 7.758302
Minibatch accuracy_length: 81.2%
Minibatch accuracy_digit_0: 25.0%
Minibatch accuracy_digit_1: 31.2%
Minibatch accuracy_digit_2: 36.0%
Minibatch accuracy_digit_3: 47.4%
Minibatch accuracy_digit_4: 45.5%
Minibatch accuracy: 9.4%
f

## Step 2: Train a model on a realistic dataset.

Once you have settled on a good architecture, you can train your model on real data. In particular, [the SVHN dataset](https://www.google.com/url?q=http://ufldl.stanford.edu/housenumbers/&sa=D&ust=1476236462762000&usg=AFQjCNGwcYwoDgoe0HcsyDz68YlW8fMgbA) is a good large scale dataset collected from house numbers in Google Street View. Training on this more challenging dataset, where the digits are not neatly lined-up and have various skews, fonts and colors, likely means you have to do some hyperparameter exploration to do well.

- _How does your model perform on a realistic dataset?_
- _What changes did you have to make, if any?_

In [24]:
svhn_data = MultiDigitSVHNData.maybe_pickle(digit_count=1, total_data_count=100000)
MultiDigitModel(multi_digit_dataset=svhn_data,
                        digit_count=1,
                        num_channels=3,
                        num_steps=301, 
                        batch_size=32, 
                        num_convs=[24,32,32,64,0,0,0,0], 
                        num_fc_1=1600, 
                        num_fc_2=512,
                       ).run()

tf_train_dataset : Tensor("Placeholder:0", shape=(32, 54, 54, 3), dtype=float32)
tf_train_length_labels : Tensor("Placeholder_1:0", shape=(32, 2), dtype=float32)
tf_train_digits_labels : Tensor("Placeholder_2:0", shape=(32, 1, 10), dtype=float32)
Initialized
Minibatch loss at step 0: 50.727543
Minibatch accuracy_length: 96.9%
Minibatch accuracy_digit_0: 9.4%
Minibatch accuracy: 9.4%
finish : 2016-11-16T10:28:10+0900
Minibatch loss at step 100: 2.278591
Minibatch accuracy_length: 100.0%
Minibatch accuracy_digit_0: 18.8%
Minibatch accuracy: 18.8%
finish : 2016-11-16T10:29:15+0900
Minibatch loss at step 200: 2.291404
Minibatch accuracy_length: 100.0%
Minibatch accuracy_digit_0: 9.4%
Minibatch accuracy: 9.4%
finish : 2016-11-16T10:30:17+0900
Minibatch loss at step 300: 2.246778
Minibatch accuracy_length: 100.0%
Minibatch accuracy_digit_0: 9.4%
Minibatch accuracy: 9.4%
finish : 2016-11-16T10:31:19+0900
Minibatch loss at step 400: 2.181519
Minibatch accuracy_length: 100.0%
Minibatch accuracy

KeyboardInterrupt: 

In [29]:
svhn_data = MultiDigitSVHNData.maybe_pickle(digit_count=2, total_data_count=50000)
MultiDigitModel(multi_digit_dataset=svhn_data,
                        digit_count=2,
                        num_channels=3,
                        num_steps=1001, 
                        batch_size=32, 
                        num_convs=[24,32,32,64,0,0,0,0], 
                        num_fc_1=256, 
                        num_fc_2=512,
                       ).run()

Pickling svhn/svhn_2_54_50000.pickle.
('total_data', (35521, 54, 54, 3))
('total_label_length', (35521, 1, 3))
('total_label_digits', (35521, 2, 10))
start : 2016-11-16T11:58:33+0900
end : 2016-11-16T12:08:48+0900
('train_data', (31968, 54, 54, 3))
('train_label_length', (31968, 1, 3))
('train_label_digits', (31968, 2, 10))
('validation_data', (1776, 54, 54, 3))
('validation_label_length', (1776, 1, 3))
('validation_label_digits', (1776, 2, 10))
('test_data', (1777, 54, 54, 3))
('test_label_length', (1777, 1, 3))
('test_label_digits', (1777, 2, 10))
('Unable to save data to', 'svhn/svhn_2_54_50000.pickle', ':', SystemError('error return without exception set',))
tf_train_dataset : Tensor("Placeholder:0", shape=(32, 54, 54, 3), dtype=float32)
tf_train_length_labels : Tensor("Placeholder_1:0", shape=(32, 3), dtype=float32)
tf_train_digits_labels : Tensor("Placeholder_2:0", shape=(32, 2, 10), dtype=float32)
Initialized
Minibatch loss at step 0: 154.659988
Minibatch accuracy_length: 6.2%
M

## Step 3 (optional): Put the model into an Android app.

Do this step only if you have access to an Android device. If you don’t, you may either:
- i) take pictures of numbers that you find around you, and run them through your classifier on your computer to produce example results, or,
- ii) use OpenCV / SimpleCV / Pygame to capture live images from a webcam.

Loading a TensorFlow model into a camera app on Android is demonstrated in [the TensorFlow Android demo app](https://www.google.com/url?q=https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/android&sa=D&ust=1476236462766000&usg=AFQjCNHmG7Hq8LBapwPWMAshjdU7AwABwA), which you can simply modify.

- _Is your model able to perform equally well on captured pictures or a live camera stream?_
- _Document how you built the interface to your model._

## Step 4: Explore!

There are many things you can do once you have the basic classifier in place. One example would be to also localize where the numbers are on the image. The SVHN dataset provides bounding boxes that you can tune to train a localizer. Simply training a regression loss to the coordinates of the bounding box is one way to get decent localization.

Once you have the data localized, you can for example try turn it into an augmented reality app by overlaying your answer on the image like the Word Lens app does.

Those are just examples of extensions you can look into. Use your imagination!

- _Make sure to report what extension(s) you have implemented and how they worked._