https://docs.google.com/document/d/1L11EjK0uObqjaBhHNcVPxeyIripGHSUaoEWGypuuVtk/pub

# Description

Objective: Build a live camera app that can interpret number strings in real-world images.

In this project, you will train a model that can decode sequences of digits from natural images, and create an app that prints the numbers it sees in real time. You may choose to implement your project as a simple Python script, a web app/service or an Android app.

## Step 1: Design and test a model architecture that can identify sequences of digits in an image.

Design and implement a deep learning model that learns to recognize sequences of digits. Train it using synthetic data first (recommended) or directly use real-world data (see step 2).

There are various aspects to consider when thinking about this problem:
- Your model can be derived from a deep neural net or a convolutional network.
- You could experiment sharing or not the weights between the softmax classifiers.
- You can also use a recurrent network in your deep neural net to replace the classification layers and directly emit the sequence of digits one-at-a-time.

To help you develop your model, the simplest path is likely to generate a synthetic dataset by concatenating character images from [notMNIST](https://www.google.com/url?q=http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html&sa=D&ust=1476236462755000&usg=AFQjCNHAxfxTiYixUmTSqZNPuGhfXBg6Cg) or [MNIST](https://www.google.com/url?q=http://yann.lecun.com/exdb/mnist/&sa=D&ust=1476236462756000&usg=AFQjCNFiIq8Kn5_Lm-p8qu_L67mAttbsfg). This can provide you with a quick way to run experiments. (Or you can go directly to the real-world dataset of Step 2.)

In order to produce a synthetic sequence of digits for testing, you can for example limit yourself to sequences up to five digits, and use five classifiers on top of your deep network. You would have to incorporate an additional ‘blank’ character to account for shorter number sequences.

Here is for example [a published baseline model](https://www.google.com/url?q=http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/42241.pdf&sa=D&ust=1476236462758000&usg=AFQjCNFYKwUmWGu1HE0PYqPHYt5l4N66pw) on this problem ([video](https://www.google.com/url?q=https://www.youtube.com/watch?v%3DvGPI_JvLoN0&sa=D&ust=1476236462759000&usg=AFQjCNFVU9fIkilx9J_FKlBBuYRZ9CGWoQ)).

- _What approach did you take in coming up with a solution to this problem?_
- _What does your final architecture look like? (Type of model, layers, sizes, connectivity, etc.)_
- _How did you train your model? Did you generate a synthetic dataset (if so, explain how)?_

In [1]:
# build baseline model http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/42241.pdf
# Multi-digit Number Recognition
# Deep Convolutional Neural Networks

# In this paper we propose a unified approach that integrates these three steps via the use of a deep convolutional neural network that operates directly on the image pixels.
# We employ the DistBelief (Dean et al., 2012) implementation of deep neural networks
# We find that the performance of this approach increases with the depth of the convolutional network,
# with the best performance occurring in the deepest architecture we trained, with eleven hidden layers.
# We evaluate this approach on the publicly available SVHN dataset and achieve over 96% accuracy in recognizing complete street numbers
# We show that on a per-digit recognition task, we improve upon the state-of-the-art and achieve 97.84% accuracy.

In [2]:
import numpy as np
import tensorflow as tf

In [36]:
class MultiDigitMNISTData(object):
    def __init__(self, digit_count=1):
        from sklearn.datasets import fetch_mldata
        mnist = fetch_mldata('MNIST original')
        
        self.mnist = mnist
        self.digit_count = digit_count
        self.image_width = 28
        self.image_height = 28 * self.digit_count
        self.target_digit_num_labels = 10
        self.num_channels = 1
        
        self.total_data_count = len(mnist.data)
        self.train_data_count = int(0.7 * self.total_data_count)
        self.validation_data_count = int(0.2 * self.total_data_count)
        self.test_data_count = self.total_data_count - self.train_data_count - self.validation_data_count
        
        self.multi_digit_mnist_data = np.zeros([self.total_data_count, self.image_width * self.image_height],
                                               dtype=np.float32)
        self.multi_digit_mnist_target_length = np.zeros([self.total_data_count, 1],
                                                        dtype=np.float32)
        self.multi_digit_mnist_target_digits = np.zeros([self.total_data_count, self.digit_count],
                                                        dtype=np.float32)
        
        self.train_data = None
        self.train_label_length = None
        self.train_label_digits = None
        self.validation_data = None
        self.validation_label_length = None
        self.validation_label_digits = None
        self.test_data = None
        self.test_label_length = None
        self.test_label_digits = None
        
        
    def reformat_dataset(self, dataset):
        return dataset.reshape((-1, self.image_width, self.image_height, self.num_channels)).astype(np.float32)
    
    def reformat_target_length(self, target_length):
        return (np.arange(self.digit_count + 1) == target_length[:,None]).astype(np.float32)
    
    def reformat_target_digits(self, target_digits):
        return (np.arange(self.target_digit_num_labels) == target_digits[:,:,None]).astype(np.float32)
    
    def load_data(self):
        import random
        for i in range(0, self.total_data_count):
            random_length = random.randrange(1,self.digit_count+1)
            self.multi_digit_mnist_target_length[i, 0] = random_length
            for j in range(0, random_length):
                random_index = random.randrange(0, self.total_data_count)
#                 print("j={}, j*784={}".format(j, j*784))
#                 print("self.multi_digit_mnist_data[i, j * 784:(j+1) * 784].shape : {}".format(self.multi_digit_mnist_data[i, j * 784:(j+1) * 784].shape))
#                 print("self.mnist.data[random_index].shape : {}".format(self.mnist.data[random_index].shape))
                self.multi_digit_mnist_data[i, j * 784:(j+1) * 784] = self.mnist.data[random_index] / 255.0
                self.multi_digit_mnist_target_digits[i, j] = self.mnist.target[random_index]
        print(self.multi_digit_mnist_data.shape) # (70000, 3920)
        print(self.multi_digit_mnist_target_length.shape) # (70000, 1)
        print(self.multi_digit_mnist_target_digits.shape) # (70000, 5)
        
        self.train_data = self.reformat_dataset(self.multi_digit_mnist_data[:self.train_data_count])
        self.train_label_length = self.reformat_target_length(self.multi_digit_mnist_target_length[:self.train_data_count])
        self.train_label_digits = self.reformat_target_digits(self.multi_digit_mnist_target_digits[:self.train_data_count])
        print(self.train_data.shape) # (49000, 140, 28, 1)
        print(self.train_label_length.shape) # (49000, 1, 6)
        print(self.train_label_digits.shape) # (49000, 5, 10)
        
        self.validation_data = self.reformat_dataset(self.multi_digit_mnist_data[self.train_data_count:self.train_data_count + self.validation_data_count])
        self.validation_label_length = self.reformat_target_length(self.multi_digit_mnist_target_length[self.train_data_count:self.train_data_count + self.validation_data_count])
        self.validation_label_digits = self.reformat_target_digits(self.multi_digit_mnist_target_digits[self.train_data_count:self.train_data_count + self.validation_data_count])
        print(self.validation_data.shape) # (14000, 140, 28, 1)
        print(self.validation_label_length.shape) # (14000, 1, 6)
        print(self.validation_label_digits.shape) # (14000, 5, 10)
        
        self.test_data = self.reformat_dataset(self.multi_digit_mnist_data[self.train_data_count + self.validation_data_count:])
        self.test_label_length = self.reformat_target_length(self.multi_digit_mnist_target_length[self.train_data_count + self.validation_data_count:])
        self.test_label_digits = self.reformat_target_digits(self.multi_digit_mnist_target_digits[self.train_data_count + self.validation_data_count:])
        print(self.test_data.shape) # (7000, 140, 28, 1)
        print(self.test_label_length.shape) # (7000, 1, 6)
        print(self.test_label_digits.shape) # (7000, 5, 10)

In [139]:
# (100, 6) vs (100, 6)
# logits[1] : Tensor("add_4:0", shape=(16, 10), dtype=float32)
# tf_train_digits_labels[:,0] : Tensor("strided_slice_5:0", shape=(16, 10), dtype=float32)
# train_length_prediction : Tensor("Softmax:0", shape=(16, 6), dtype=float32)
# train_digits_0_prediction : Tensor("Softmax_1:0", shape=(16, 10), dtype=float32)
# p_length, p_0, p_1, p_2, p_3, p_4, 
# train_length_prediction, train_digits_0_prediction, train_digits_1_prediction, train_digits_2_prediction, train_digits_3_prediction, train_digits_4_prediction
def accuracy_digit(p_length, p_digits, 
             batch_length_labels, batch_digits_labels, digit):
  eq_count = 0.0
  total_count = 0.0
  for i in range(0, len(p_digits[digit])):
#     print("length:{}".format(np.argmax(batch_length_labels[i])))
    if np.argmax(batch_length_labels[i]) > digit:
        total_count += 1.0
        if np.argmax(p_digits[digit][i]) == np.argmax(batch_digits_labels[i][digit]):
          eq_count += 1.0
#           print("Correct Predict. predict:{}, real:{}".format(np.argmax(p_digits[digit][i]), np.argmax(batch_digits_labels[i][digit])))
#         else:
#           print("False Predict. predict:{}, real:{}".format(np.argmax(p_digits[digit][i]), np.argmax(batch_digits_labels[i][digit])))
  return eq_count / total_count * 100

def accuracy_length(p_length, p_0, p_1, p_2, p_3, p_4, 
             batch_length_labels, batch_digits_labels):
  eq_count = 0.0
  for i in range(0, len(p_0)):
    if np.argmax(p_length[i]) == np.argmax(batch_length_labels[i]):
      eq_count += 1.0
  return eq_count / len(p_0) * 100

- 입력 : 한개 숫자 이미지
- 예측 : 한개 숫자 + 길이

In [39]:
num_steps = 1001
batch_size = 16
patch_size = 5
depth = 16
depth_1 = depth
depth_2 = depth
depth_3 = depth
num_hidden = 64

multi_digit_mnist = MultiDigitMNISTData(1)
multi_digit_mnist.load_data()
graph = tf.Graph()

with graph.as_default():
  # Input data.
  tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, multi_digit_mnist.image_width, multi_digit_mnist.image_height, multi_digit_mnist.num_channels))
  print("tf_train_dataset : {}".format(tf_train_dataset))
  tf_train_length_labels = tf.placeholder(tf.float32, shape=(batch_size, multi_digit_mnist.digit_count + 1))
  print("tf_train_length_labels : {}".format(tf_train_length_labels))
  tf_train_digits_labels = tf.placeholder(tf.float32, shape=(batch_size, multi_digit_mnist.digit_count, 10))
  print("tf_train_digits_labels : {}".format(tf_train_digits_labels))
  tf_valid_dataset = tf.constant(multi_digit_mnist.validation_data)
  tf_test_dataset = tf.constant(multi_digit_mnist.test_data)

  # Variables.
  layer1_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, num_channels, depth_1], stddev=0.1))
  layer1_biases =  tf.Variable(tf.constant(1.0, shape=[depth_1]))

  layer2_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth_1, depth_2], stddev=0.1))
  layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth_2]))

  layer3_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth_2, depth_3], stddev=0.1))
  layer3_biases = tf.Variable(tf.constant(1.0, shape=[depth_3]))

  flatted=int(round(float(multi_digit_mnist.image_width) / (pooling_stride ** 2)) * round(float(multi_digit_mnist.image_height) / (pooling_stride ** 2)) * depth_2)
  fc_layer1_weights = tf.Variable(tf.truncated_normal([flatted, flatted], stddev=0.1))
  fc_layer1_biases = tf.Variable(tf.constant(1.0, shape=[flatted]))

  fc_layer2_weights = tf.Variable(tf.truncated_normal([flatted, num_hidden], stddev=0.1))
  fc_layer2_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))

  digit_length_weights = tf.Variable(tf.truncated_normal([num_hidden, multi_digit_mnist.digit_count + 1], stddev=0.1))
  digit_length_biases = tf.Variable(tf.constant(1.0, shape=[multi_digit_mnist.digit_count + 1]))

  digits_0_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_0_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  digits_1_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_1_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  digits_2_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_2_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  digits_3_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_3_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  digits_4_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_4_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  # Model.
  def model(data, is_training=True):
    stride = 2
    conv = tf.nn.conv2d(data, layer1_weights, [1, stride, stride, 1], padding='SAME')
    conv = tf.nn.relu(conv + layer1_biases)
    conv = tf.nn.conv2d(conv, layer2_weights, [1, stride, stride, 1], padding='SAME')
    conv = tf.nn.relu(conv + layer2_biases)
    shape = conv.get_shape().as_list()
    reshape = tf.reshape(conv, [shape[0], shape[1] * shape[2] * shape[3]])
    conv = tf.nn.relu(tf.matmul(reshape, fc_layer2_weights) + fc_layer2_biases)

    digit_length_logit = tf.matmul(conv, digit_length_weights) + digit_length_biases
    digits_0_logit = tf.matmul(conv, digits_0_weights) + digits_0_biases

    return digit_length_logit, digits_0_logit;


  # Training computation.
  logits = model(tf_train_dataset, True)
  loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(logits[0], tf_train_length_labels)
        + tf.nn.softmax_cross_entropy_with_logits(logits[1], tf_train_digits_labels[:,0])
    )

  # Optimizer.
  optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)

  # Predictions for the training, validation, and test data.
  train_length_prediction = tf.nn.softmax(logits[0])
  train_digits_0_prediction = tf.nn.softmax(logits[1])

with tf.Session(graph=graph) as session:
  tf.initialize_all_variables().run()
  print('Initialized')
  for step in range(num_steps):
    offset = (step * batch_size) % (multi_digit_mnist.train_data.shape[0] - batch_size)
    batch_data = multi_digit_mnist.train_data[offset:(offset + batch_size), :, :, :]
    batch_length_labels = multi_digit_mnist.train_label_length[offset:(offset + batch_size), 0, :]
    batch_digits_labels = multi_digit_mnist.train_label_digits[offset:(offset + batch_size), :]
    
    feed_dict = {tf_train_dataset : batch_data, tf_train_length_labels : batch_length_labels, tf_train_digits_labels : batch_digits_labels}
    _, l, p_length, p_0 = session.run(
      [optimizer, loss, train_length_prediction, train_digits_0_prediction], feed_dict=feed_dict)
    if (step % 50 == 0):
      print('Minibatch loss at step %d: %f' % (step, l))
      accuracy_result = accuracy_length(p_length, p_0, p_1, p_2, p_3, p_4, batch_length_labels, batch_digits_labels)
      print('Minibatch accuracy_length: %.1f%%' % accuracy_result)
      for k in range(0,1):
          accuracy_result = accuracy_digit(p_length, [p_0, p_1, p_2, p_3, p_4], batch_length_labels, batch_digits_labels, k)
          print("Minibatch accuracy_digit_{}".format(k) + ": %.1f%%" % accuracy_result)


(70000, 784)
(70000, 1)
(70000, 1)
(49000, 28, 28, 1)
(49000, 1, 2)
(49000, 1, 10)
(14000, 28, 28, 1)
(14000, 1, 2)
(14000, 1, 10)
(7000, 28, 28, 1)
(7000, 1, 2)
(7000, 1, 10)
tf_train_dataset : Tensor("Placeholder:0", shape=(16, 28, 28, 1), dtype=float32)
tf_train_length_labels : Tensor("Placeholder_1:0", shape=(16, 2), dtype=float32)
tf_train_digits_labels : Tensor("Placeholder_2:0", shape=(16, 1, 10), dtype=float32)
Initialized
Minibatch loss at step 0: 7.147527
Minibatch accuracy_length: 0.0%
Minibatch accuracy_digit_0: 6.2%
Minibatch loss at step 50: 2.337904
Minibatch accuracy_length: 100.0%
Minibatch accuracy_digit_0: 6.2%
Minibatch loss at step 100: 2.258426
Minibatch accuracy_length: 100.0%
Minibatch accuracy_digit_0: 18.8%
Minibatch loss at step 150: 2.255040
Minibatch accuracy_length: 100.0%
Minibatch accuracy_digit_0: 25.0%
Minibatch loss at step 200: 1.599000
Minibatch accuracy_length: 100.0%
Minibatch accuracy_digit_0: 50.0%
Minibatch loss at step 250: 0.797102
Minibatch 

- 입력 : 두개 숫자 이미지
- 예측 : 한개 숫자

In [61]:
num_steps = 1001
batch_size = 16
patch_size = 5
depth = 16
depth_1 = depth
depth_2 = depth
depth_3 = depth
depth_4 = depth
num_hidden = 64

multi_digit_mnist = MultiDigitMNISTData(2)
multi_digit_mnist.load_data()
graph = tf.Graph()

with graph.as_default():
  # Input data.
  tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, multi_digit_mnist.image_width, multi_digit_mnist.image_height, multi_digit_mnist.num_channels))
  print("tf_train_dataset : {}".format(tf_train_dataset))
  tf_train_length_labels = tf.placeholder(tf.float32, shape=(batch_size, multi_digit_mnist.digit_count + 1))
  print("tf_train_length_labels : {}".format(tf_train_length_labels))
  tf_train_digits_labels = tf.placeholder(tf.float32, shape=(batch_size, multi_digit_mnist.digit_count, 10))
  print("tf_train_digits_labels : {}".format(tf_train_digits_labels))
  tf_valid_dataset = tf.constant(multi_digit_mnist.validation_data)
  tf_test_dataset = tf.constant(multi_digit_mnist.test_data)

  # Variables.
  layer1_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, num_channels, depth_1], stddev=0.1))
  layer1_biases =  tf.Variable(tf.constant(1.0, shape=[depth_1]))

  layer2_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth_1, depth_2], stddev=0.1))
  layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth_2]))

  layer3_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth_2, depth_3], stddev=0.1))
  layer3_biases = tf.Variable(tf.constant(1.0, shape=[depth_3]))
    
  layer4_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth_3, depth_4], stddev=0.1))
  layer4_biases = tf.Variable(tf.constant(1.0, shape=[depth_4]))

  flatted=int(round(float(multi_digit_mnist.image_width) / (pooling_stride ** 2)) * round(float(multi_digit_mnist.image_height) / (pooling_stride ** 2)) * depth_4)
  fc_layer1_weights = tf.Variable(tf.truncated_normal([flatted, flatted], stddev=0.1))
  fc_layer1_biases = tf.Variable(tf.constant(1.0, shape=[flatted]))

  fc_layer2_weights = tf.Variable(tf.truncated_normal([flatted, num_hidden], stddev=0.1))
  fc_layer2_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))

  digit_length_weights = tf.Variable(tf.truncated_normal([num_hidden, multi_digit_mnist.digit_count + 1], stddev=0.1))
  digit_length_biases = tf.Variable(tf.constant(1.0, shape=[multi_digit_mnist.digit_count + 1]))

  digits_0_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_0_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  digits_1_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_1_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  digits_2_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_2_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  digits_3_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_3_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  digits_4_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_4_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  # Model.
  def model(data, is_training=True):
    stride = 2
    conv = tf.nn.conv2d(data, layer1_weights, [1, stride, stride, 1], padding='SAME')
    conv = tf.nn.relu(conv + layer1_biases)
    conv = tf.nn.conv2d(conv, layer2_weights, [1, stride, stride, 1], padding='SAME')
    conv = tf.nn.relu(conv + layer2_biases)
    print("layer2 {} ".format(conv))
#     conv = tf.nn.conv2d(conv, layer3_weights, [1, 1, 1, 1], padding='SAME')
#     conv = tf.nn.relu(conv + layer3_biases)
#     print("layer3 {} ".format(conv))
#     conv = tf.nn.conv2d(conv, layer4_weights, [1, 1, 1, 1], padding='SAME')
#     conv = tf.nn.relu(conv + layer4_biases)
    shape = conv.get_shape().as_list()
    reshape = tf.reshape(conv, [shape[0], shape[1] * shape[2] * shape[3]])
    conv = tf.nn.relu(tf.matmul(reshape, fc_layer2_weights) + fc_layer2_biases)

    digit_length_logit = tf.matmul(conv, digit_length_weights) + digit_length_biases
    digits_0_logit = tf.matmul(conv, digits_0_weights) + digits_0_biases

    return digit_length_logit, digits_0_logit;


  # Training computation.
  logits = model(tf_train_dataset, True)
  loss = tf.reduce_mean(
#         tf.nn.softmax_cross_entropy_with_logits(logits[0], tf_train_length_labels)
        tf.nn.softmax_cross_entropy_with_logits(logits[1], tf_train_digits_labels[:,0])
    )

  # Optimizer.
  optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)

  # Predictions for the training, validation, and test data.
  train_length_prediction = tf.nn.softmax(logits[0])
  train_digits_0_prediction = tf.nn.softmax(logits[1])

with tf.Session(graph=graph) as session:
  tf.initialize_all_variables().run()
  print('Initialized')
  for step in range(num_steps):
    offset = (step * batch_size) % (multi_digit_mnist.train_data.shape[0] - batch_size)
    batch_data = multi_digit_mnist.train_data[offset:(offset + batch_size), :, :, :]
    batch_length_labels = multi_digit_mnist.train_label_length[offset:(offset + batch_size), 0, :]
    batch_digits_labels = multi_digit_mnist.train_label_digits[offset:(offset + batch_size), :]
    
    feed_dict = {tf_train_dataset : batch_data, tf_train_length_labels : batch_length_labels, tf_train_digits_labels : batch_digits_labels}
    _, l, p_length, p_0 = session.run(
      [optimizer, loss, train_length_prediction, train_digits_0_prediction], feed_dict=feed_dict)
    if (step % 50 == 0):
      print('Minibatch loss at step %d: %f' % (step, l))
      accuracy_result = accuracy_length(p_length, p_0, p_1, p_2, p_3, p_4, batch_length_labels, batch_digits_labels)
      print('Minibatch accuracy_length: %.1f%%' % accuracy_result)
      for k in range(0,1):
          accuracy_result = accuracy_digit(p_length, [p_0, p_1, p_2, p_3, p_4], batch_length_labels, batch_digits_labels, k)
          print("Minibatch accuracy_digit_{}".format(k) + ": %.1f%%" % accuracy_result)


(70000, 1568)
(70000, 1)
(70000, 2)
(49000, 28, 56, 1)
(49000, 1, 3)
(49000, 2, 10)
(14000, 28, 56, 1)
(14000, 1, 3)
(14000, 2, 10)
(7000, 28, 56, 1)
(7000, 1, 3)
(7000, 2, 10)
tf_train_dataset : Tensor("Placeholder:0", shape=(16, 28, 56, 1), dtype=float32)
tf_train_length_labels : Tensor("Placeholder_1:0", shape=(16, 3), dtype=float32)
tf_train_digits_labels : Tensor("Placeholder_2:0", shape=(16, 2, 10), dtype=float32)
layer2 Tensor("Relu_1:0", shape=(16, 7, 14, 16), dtype=float32) 
Initialized
Minibatch loss at step 0: 4.093553
Minibatch accuracy_length: 0.0%
Minibatch accuracy_digit_0: 12.5%
Minibatch loss at step 50: 2.368903
Minibatch accuracy_length: 0.0%
Minibatch accuracy_digit_0: 25.0%
Minibatch loss at step 100: 2.254499
Minibatch accuracy_length: 0.0%
Minibatch accuracy_digit_0: 25.0%
Minibatch loss at step 150: 2.006486
Minibatch accuracy_length: 37.5%
Minibatch accuracy_digit_0: 25.0%
Minibatch loss at step 200: 1.631744
Minibatch accuracy_length: 62.5%
Minibatch accuracy_

- 입력 : 두개 숫자
- 예측 : 한개 숫자 + 길이

In [85]:
num_steps = 1001
batch_size = 16
patch_size = 5
depth = 8
depth_1 = depth
depth_2 = depth_1 * 2
depth_3 = depth_2 * 2
depth_4 = depth_3 * 2
num_hidden = 32

multi_digit_mnist = MultiDigitMNISTData(2)
multi_digit_mnist.load_data()
graph = tf.Graph()

with graph.as_default():
  # Input data.
  tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, multi_digit_mnist.image_width, multi_digit_mnist.image_height, multi_digit_mnist.num_channels))
  print("tf_train_dataset : {}".format(tf_train_dataset))
  tf_train_length_labels = tf.placeholder(tf.float32, shape=(batch_size, multi_digit_mnist.digit_count + 1))
  print("tf_train_length_labels : {}".format(tf_train_length_labels))
  tf_train_digits_labels = tf.placeholder(tf.float32, shape=(batch_size, multi_digit_mnist.digit_count, 10))
  print("tf_train_digits_labels : {}".format(tf_train_digits_labels))
  tf_valid_dataset = tf.constant(multi_digit_mnist.validation_data)
  tf_test_dataset = tf.constant(multi_digit_mnist.test_data)

  # Variables.
  layer1_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, num_channels, depth_1], stddev=0.1))
  layer1_biases =  tf.Variable(tf.constant(1.0, shape=[depth_1]))

  layer2_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth_1, depth_2], stddev=0.1))
  layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth_2]))

  layer3_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth_2, depth_3], stddev=0.1))
  layer3_biases = tf.Variable(tf.constant(1.0, shape=[depth_3]))
    
  layer4_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth_3, depth_4], stddev=0.1))
  layer4_biases = tf.Variable(tf.constant(1.0, shape=[depth_4]))

  flatted=int(round(float(multi_digit_mnist.image_width) / (pooling_stride ** 2)) * round(float(multi_digit_mnist.image_height) / (pooling_stride ** 2)) * depth_2)
  fc_layer1_weights = tf.Variable(tf.truncated_normal([flatted, flatted], stddev=0.1))
  fc_layer1_biases = tf.Variable(tf.constant(1.0, shape=[flatted]))

  fc_layer2_weights = tf.Variable(tf.truncated_normal([flatted, num_hidden], stddev=0.1))
  fc_layer2_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))

  digit_length_weights = tf.Variable(tf.truncated_normal([num_hidden, multi_digit_mnist.digit_count + 1], stddev=0.1))
  digit_length_biases = tf.Variable(tf.constant(1.0, shape=[multi_digit_mnist.digit_count + 1]))

  digits_0_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_0_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  digits_1_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_1_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  digits_2_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_2_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  digits_3_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_3_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  digits_4_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_4_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  # Model.
  def model(data, is_training=True):
    stride = 2
    conv = tf.nn.conv2d(data, layer1_weights, [1, 1, 1, 1], padding='SAME')
    conv = tf.nn.relu(conv + layer1_biases)
    conv = tf.nn.max_pool(conv, [1, stride, stride, 1], [1, stride, stride, 1], padding='SAME')
    if is_training:
        conv = tf.nn.dropout(conv, 0.8)
    print(conv)
    
    conv = tf.nn.conv2d(conv, layer2_weights, [1, 1, 1, 1], padding='SAME')
    conv = tf.nn.relu(conv + layer2_biases)
    conv = tf.nn.max_pool(conv, [1, stride, stride, 1], [1, stride, stride, 1], padding='SAME')
    if is_training:
        conv = tf.nn.dropout(conv, 0.8)
    print(conv)
    
#     conv = tf.nn.conv2d(conv, layer3_weights, [1, 1, 1, 1], padding='SAME')
#     conv = tf.nn.relu(conv + layer3_biases)
    
#     conv = tf.nn.conv2d(conv, layer4_weights, [1, 1, 1, 1], padding='SAME')
#     conv = tf.nn.relu(conv + layer4_biases)

    shape = conv.get_shape().as_list()
    reshape = tf.reshape(conv, [shape[0], shape[1] * shape[2] * shape[3]])
#     reshape = tf.nn.relu(tf.matmul(reshape, fc_layer1_weights) + fc_layer1_biases)
    reshape = tf.nn.relu(tf.matmul(reshape, fc_layer2_weights) + fc_layer2_biases)

    digit_length_logit = tf.matmul(reshape, digit_length_weights) + digit_length_biases
    digits_0_logit = tf.matmul(reshape, digits_0_weights) + digits_0_biases

    return digit_length_logit, digits_0_logit;


  # Training computation.
  logits = model(tf_train_dataset, True)
  loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(logits[0], tf_train_length_labels)
        + tf.nn.softmax_cross_entropy_with_logits(logits[1], tf_train_digits_labels[:,0])
    )

  # Optimizer.
  optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)

  # Predictions for the training, validation, and test data.
  train_length_prediction = tf.nn.softmax(logits[0])
  train_digits_0_prediction = tf.nn.softmax(logits[1])

with tf.Session(graph=graph) as session:
  tf.initialize_all_variables().run()
  print('Initialized')
  for step in range(num_steps):
    offset = (step * batch_size) % (multi_digit_mnist.train_data.shape[0] - batch_size)
    batch_data = multi_digit_mnist.train_data[offset:(offset + batch_size), :, :, :]
    batch_length_labels = multi_digit_mnist.train_label_length[offset:(offset + batch_size), 0, :]
    batch_digits_labels = multi_digit_mnist.train_label_digits[offset:(offset + batch_size), :]
    
    feed_dict = {tf_train_dataset : batch_data, tf_train_length_labels : batch_length_labels, tf_train_digits_labels : batch_digits_labels}
    _, l, p_length, p_0 = session.run(
      [optimizer, loss, train_length_prediction, train_digits_0_prediction], feed_dict=feed_dict)
    if (step % 50 == 0):
      print('Minibatch loss at step %d: %f' % (step, l))
      accuracy_result = accuracy_length(p_length, p_0, p_1, p_2, p_3, p_4, batch_length_labels, batch_digits_labels)
      print('Minibatch accuracy_length: %.1f%%' % accuracy_result)
      for k in range(0,1):
          accuracy_result = accuracy_digit(p_length, [p_0, p_1, p_2, p_3, p_4], batch_length_labels, batch_digits_labels, k)
          print("Minibatch accuracy_digit_{}".format(k) + ": %.1f%%" % accuracy_result)


(70000, 1568)
(70000, 1)
(70000, 2)
(49000, 28, 56, 1)
(49000, 1, 3)
(49000, 2, 10)
(14000, 28, 56, 1)
(14000, 1, 3)
(14000, 2, 10)
(7000, 28, 56, 1)
(7000, 1, 3)
(7000, 2, 10)
tf_train_dataset : Tensor("Placeholder:0", shape=(16, 28, 56, 1), dtype=float32)
tf_train_length_labels : Tensor("Placeholder_1:0", shape=(16, 3), dtype=float32)
tf_train_digits_labels : Tensor("Placeholder_2:0", shape=(16, 2, 10), dtype=float32)
Tensor("dropout/mul:0", shape=(16, 14, 28, 8), dtype=float32)
Tensor("dropout_1/mul:0", shape=(16, 7, 14, 16), dtype=float32)
Initialized
Minibatch loss at step 0: 10.225079
Minibatch accuracy_length: 12.5%
Minibatch accuracy_digit_0: 6.2%
Minibatch loss at step 50: 2.892925
Minibatch accuracy_length: 50.0%
Minibatch accuracy_digit_0: 12.5%
Minibatch loss at step 100: 3.013327
Minibatch accuracy_length: 62.5%
Minibatch accuracy_digit_0: 0.0%
Minibatch loss at step 150: 2.834291
Minibatch accuracy_length: 75.0%
Minibatch accuracy_digit_0: 0.0%
Minibatch loss at step 200:

- 입력 : 두개 숫자
- 출력 : 두개 숫자 + 길이

In [149]:
num_steps = 2001
batch_size = 32
patch_size = 5
depth = 8
depth_1 = depth
depth_2 = depth_1 * 2
depth_3 = depth_2 * 2
depth_4 = depth_3 * 2
num_hidden = 32

pooling_stride = 2
pooling_kernel_size = 2
beta = 0.001

multi_digit_mnist = MultiDigitMNISTData(2)
multi_digit_mnist.load_data()
graph = tf.Graph()

with graph.as_default():
  # Input data.
  tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, multi_digit_mnist.image_width, multi_digit_mnist.image_height, multi_digit_mnist.num_channels))
  print("tf_train_dataset : {}".format(tf_train_dataset))
  tf_train_length_labels = tf.placeholder(tf.float32, shape=(batch_size, multi_digit_mnist.digit_count + 1))
  print("tf_train_length_labels : {}".format(tf_train_length_labels))
  tf_train_digits_labels = tf.placeholder(tf.float32, shape=(batch_size, multi_digit_mnist.digit_count, 10))
  print("tf_train_digits_labels : {}".format(tf_train_digits_labels))
  tf_valid_dataset = tf.constant(multi_digit_mnist.validation_data)
  tf_test_dataset = tf.constant(multi_digit_mnist.test_data)

  # Variables.
  layer1_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, num_channels, depth_1], stddev=0.1))
  layer1_biases =  tf.Variable(tf.constant(1.0, shape=[depth_1]))

  layer2_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth_1, depth_2], stddev=0.1))
  layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth_2]))

  layer3_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth_2, depth_3], stddev=0.1))
  layer3_biases = tf.Variable(tf.constant(1.0, shape=[depth_3]))
    
  layer4_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth_3, depth_4], stddev=0.1))
  layer4_biases = tf.Variable(tf.constant(1.0, shape=[depth_4]))

  flatted=int(round(float(multi_digit_mnist.image_width) / (pooling_stride ** 2)) * round(float(multi_digit_mnist.image_height) / (pooling_stride ** 2)) * depth_2)
  fc_layer1_weights = tf.Variable(tf.truncated_normal([flatted, flatted], stddev=0.1))
  fc_layer1_biases = tf.Variable(tf.constant(1.0, shape=[flatted]))

  fc_layer2_weights = tf.Variable(tf.truncated_normal([flatted, num_hidden], stddev=0.1))
  fc_layer2_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))

  digit_length_weights = tf.Variable(tf.truncated_normal([num_hidden, multi_digit_mnist.digit_count + 1], stddev=0.1))
  digit_length_biases = tf.Variable(tf.constant(1.0, shape=[multi_digit_mnist.digit_count + 1]))

  digits_0_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_0_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  digits_1_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_1_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  digits_2_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_2_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  digits_3_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_3_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  digits_4_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_4_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  # Model.
  def model(data, is_training=True):
    stride = 2
    conv = tf.nn.conv2d(data, layer1_weights, [1, 1, 1, 1], padding='SAME')
    conv = tf.nn.relu(conv + layer1_biases)
    conv = tf.nn.max_pool(conv, [1, stride, stride, 1], [1, stride, stride, 1], padding='SAME')
    if is_training:
        conv = tf.nn.dropout(conv, 0.8)
    print(conv)
    
    conv = tf.nn.conv2d(conv, layer2_weights, [1, 1, 1, 1], padding='SAME')
    conv = tf.nn.relu(conv + layer2_biases)
    conv = tf.nn.max_pool(conv, [1, stride, stride, 1], [1, stride, stride, 1], padding='SAME')
    if is_training:
        conv = tf.nn.dropout(conv, 0.8)
    print(conv)
    
#     conv = tf.nn.conv2d(conv, layer3_weights, [1, 1, 1, 1], padding='SAME')
#     if is_training:
#         conv = tf.nn.dropout(conv, 0.8)
#     print(conv)
#     conv = tf.nn.relu(conv + layer3_biases)
    
#     conv = tf.nn.conv2d(conv, layer4_weights, [1, 1, 1, 1], padding='SAME')
#     conv = tf.nn.relu(conv + layer4_biases)

    shape = conv.get_shape().as_list()
    reshape = tf.reshape(conv, [shape[0], shape[1] * shape[2] * shape[3]])
#     reshape = tf.nn.relu(tf.matmul(reshape, fc_layer1_weights) + fc_layer1_biases)
    reshape = tf.nn.relu(tf.matmul(reshape, fc_layer2_weights) + fc_layer2_biases)

    digit_length_logit = tf.matmul(reshape, digit_length_weights) + digit_length_biases    
    digit_length_logit_inner = tf.to_float(tf.argmax(tf.nn.softmax(digit_length_logit), dimension=1))
    
    digits_0_logit = tf.matmul(reshape, digits_0_weights) + digits_0_biases
    digits_1_logit = tf.matmul(reshape, digits_1_weights) + digits_1_biases

    return digit_length_logit, digits_0_logit, digits_1_logit;


  # Training computation.
  logits = model(tf_train_dataset, True)
  digits_1_mult = tf.to_float((tf.argmax(tf.nn.softmax(tf_train_length_labels), dimension=1) > 1))
  loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(logits[0], tf_train_length_labels)
        + tf.nn.softmax_cross_entropy_with_logits(logits[1], tf_train_digits_labels[:,0])
        + digits_1_mult* tf.nn.softmax_cross_entropy_with_logits(logits[2], tf_train_digits_labels[:,1])
        + beta * tf.nn.l2_loss(layer1_weights) 
        + beta * tf.nn.l2_loss(layer2_weights) 
        + beta * tf.nn.l2_loss(fc_layer2_weights) 
        + beta * tf.nn.l2_loss(digit_length_weights) 
        + beta * tf.nn.l2_loss(digits_0_weights) 
        + beta * tf.nn.l2_loss(digits_1_weights) 
    )

  # Optimizer.
  optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)

  # Predictions for the training, validation, and test data.
  train_length_prediction = tf.nn.softmax(logits[0])
  train_digits_0_prediction = tf.nn.softmax(logits[1])
  train_digits_1_prediction = tf.nn.softmax(logits[2])

with tf.Session(graph=graph) as session:
  tf.initialize_all_variables().run()
  print('Initialized')
  for step in range(num_steps):
    offset = (step * batch_size) % (multi_digit_mnist.train_data.shape[0] - batch_size)
    batch_data = multi_digit_mnist.train_data[offset:(offset + batch_size), :, :, :]
    batch_length_labels = multi_digit_mnist.train_label_length[offset:(offset + batch_size), 0, :]
    batch_digits_labels = multi_digit_mnist.train_label_digits[offset:(offset + batch_size), :]
    
    feed_dict = {tf_train_dataset : batch_data, tf_train_length_labels : batch_length_labels, tf_train_digits_labels : batch_digits_labels}
    _, l, p_length, p_0, p_1 = session.run(
      [optimizer, loss, train_length_prediction, train_digits_0_prediction, train_digits_1_prediction], feed_dict=feed_dict)
    if (step % 100 == 0):
      print('Minibatch loss at step %d: %f' % (step, l))
      accuracy_result = accuracy_length(p_length, p_0, p_1, p_2, p_3, p_4, batch_length_labels, batch_digits_labels)
      print('Minibatch accuracy_length: %.1f%%' % accuracy_result)
      for k in range(0,2):
          accuracy_result = accuracy_digit(p_length, [p_0, p_1, p_2, p_3, p_4], batch_length_labels, batch_digits_labels, k)
          print("Minibatch accuracy_digit_{}".format(k) + ": %.1f%%" % accuracy_result)


(70000, 1568)
(70000, 1)
(70000, 2)
(49000, 28, 56, 1)
(49000, 1, 3)
(49000, 2, 10)
(14000, 28, 56, 1)
(14000, 1, 3)
(14000, 2, 10)
(7000, 28, 56, 1)
(7000, 1, 3)
(7000, 2, 10)
tf_train_dataset : Tensor("Placeholder:0", shape=(32, 28, 56, 1), dtype=float32)
tf_train_length_labels : Tensor("Placeholder_1:0", shape=(32, 3), dtype=float32)
tf_train_digits_labels : Tensor("Placeholder_2:0", shape=(32, 2, 10), dtype=float32)
Tensor("dropout/mul:0", shape=(32, 14, 28, 8), dtype=float32)
Tensor("dropout_1/mul:0", shape=(32, 7, 14, 16), dtype=float32)
Initialized
Minibatch loss at step 0: 9.903417
Minibatch accuracy_length: 40.6%
Minibatch accuracy_digit_0: 21.9%
Minibatch accuracy_digit_1: 7.7%
Minibatch loss at step 100: 3.579196
Minibatch accuracy_length: 100.0%
Minibatch accuracy_digit_0: 9.4%
Minibatch accuracy_digit_1: 31.2%
Minibatch loss at step 200: 3.169344
Minibatch accuracy_length: 100.0%
Minibatch accuracy_digit_0: 31.2%
Minibatch accuracy_digit_1: 47.1%
Minibatch loss at step 300

- 입력 : 숫자 3개
- 출력 : 길이 + 숫자 2개

In [148]:
num_steps = 2001
batch_size = 32
patch_size = 5
depth = 8
depth_1 = depth
depth_2 = depth_1 * 2
depth_3 = depth_2 * 2
depth_4 = depth_3 * 2
num_hidden = 32

pooling_stride = 2
pooling_kernel_size = 2
beta = 0.001

multi_digit_mnist = MultiDigitMNISTData(3)
multi_digit_mnist.load_data()
graph = tf.Graph()

with graph.as_default():
  # Input data.
  tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, multi_digit_mnist.image_width, multi_digit_mnist.image_height, multi_digit_mnist.num_channels))
  print("tf_train_dataset : {}".format(tf_train_dataset))
  tf_train_length_labels = tf.placeholder(tf.float32, shape=(batch_size, multi_digit_mnist.digit_count + 1))
  print("tf_train_length_labels : {}".format(tf_train_length_labels))
  tf_train_digits_labels = tf.placeholder(tf.float32, shape=(batch_size, multi_digit_mnist.digit_count, 10))
  print("tf_train_digits_labels : {}".format(tf_train_digits_labels))
  tf_valid_dataset = tf.constant(multi_digit_mnist.validation_data)
  tf_test_dataset = tf.constant(multi_digit_mnist.test_data)

  # Variables.
  layer1_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, num_channels, depth_1], stddev=0.1))
  layer1_biases =  tf.Variable(tf.constant(1.0, shape=[depth_1]))

  layer2_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth_1, depth_2], stddev=0.1))
  layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth_2]))

  layer3_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth_2, depth_3], stddev=0.1))
  layer3_biases = tf.Variable(tf.constant(1.0, shape=[depth_3]))
    
  layer4_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth_3, depth_4], stddev=0.1))
  layer4_biases = tf.Variable(tf.constant(1.0, shape=[depth_4]))

  flatted=int(round(float(multi_digit_mnist.image_width) / (pooling_stride ** 2)) * round(float(multi_digit_mnist.image_height) / (pooling_stride ** 2)) * depth_2)
  fc_layer1_weights = tf.Variable(tf.truncated_normal([flatted, flatted], stddev=0.1))
  fc_layer1_biases = tf.Variable(tf.constant(1.0, shape=[flatted]))

  fc_layer2_weights = tf.Variable(tf.truncated_normal([flatted, num_hidden], stddev=0.1))
  fc_layer2_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))

  digit_length_weights = tf.Variable(tf.truncated_normal([num_hidden, multi_digit_mnist.digit_count + 1], stddev=0.1))
  digit_length_biases = tf.Variable(tf.constant(1.0, shape=[multi_digit_mnist.digit_count + 1]))

  digits_0_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_0_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  digits_1_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_1_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  digits_2_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_2_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  digits_3_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_3_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  digits_4_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_4_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  # Model.
  def model(data, is_training=True):
    stride = 2
    conv = tf.nn.conv2d(data, layer1_weights, [1, 1, 1, 1], padding='SAME')
    conv = tf.nn.relu(conv + layer1_biases)
    conv = tf.nn.max_pool(conv, [1, stride, stride, 1], [1, stride, stride, 1], padding='SAME')
    if is_training:
        conv = tf.nn.dropout(conv, 0.8)
    print(conv)
    
    conv = tf.nn.conv2d(conv, layer2_weights, [1, 1, 1, 1], padding='SAME')
    conv = tf.nn.relu(conv + layer2_biases)
    conv = tf.nn.max_pool(conv, [1, stride, stride, 1], [1, stride, stride, 1], padding='SAME')
    if is_training:
        conv = tf.nn.dropout(conv, 0.8)
    print(conv)
    
#     conv = tf.nn.conv2d(conv, layer3_weights, [1, 1, 1, 1], padding='SAME')
#     if is_training:
#         conv = tf.nn.dropout(conv, 0.8)
#     print(conv)
#     conv = tf.nn.relu(conv + layer3_biases)
    
#     conv = tf.nn.conv2d(conv, layer4_weights, [1, 1, 1, 1], padding='SAME')
#     conv = tf.nn.relu(conv + layer4_biases)

    shape = conv.get_shape().as_list()
    reshape = tf.reshape(conv, [shape[0], shape[1] * shape[2] * shape[3]])
#     reshape = tf.nn.relu(tf.matmul(reshape, fc_layer1_weights) + fc_layer1_biases)
    reshape = tf.nn.relu(tf.matmul(reshape, fc_layer2_weights) + fc_layer2_biases)

    digit_length_logit = tf.matmul(reshape, digit_length_weights) + digit_length_biases    
    digit_length_logit_inner = tf.to_float(tf.argmax(tf.nn.softmax(digit_length_logit), dimension=1))
    
    digits_0_logit = tf.matmul(reshape, digits_0_weights) + digits_0_biases
    digits_1_logit = tf.matmul(reshape, digits_1_weights) + digits_1_biases

    return digit_length_logit, digits_0_logit, digits_1_logit;


  # Training computation.
  logits = model(tf_train_dataset, True)
  digits_1_mult = tf.to_float((tf.argmax(tf.nn.softmax(tf_train_length_labels), dimension=1) > 1))
  loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(logits[0], tf_train_length_labels)
        + tf.nn.softmax_cross_entropy_with_logits(logits[1], tf_train_digits_labels[:,0])
        + digits_1_mult* tf.nn.softmax_cross_entropy_with_logits(logits[2], tf_train_digits_labels[:,1])
        + beta * tf.nn.l2_loss(layer1_weights) 
        + beta * tf.nn.l2_loss(layer2_weights) 
        + beta * tf.nn.l2_loss(fc_layer2_weights) 
        + beta * tf.nn.l2_loss(digit_length_weights) 
        + beta * tf.nn.l2_loss(digits_0_weights) 
        + beta * tf.nn.l2_loss(digits_1_weights) 
    )

  # Optimizer.
  optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)

  # Predictions for the training, validation, and test data.
  train_length_prediction = tf.nn.softmax(logits[0])
  train_digits_0_prediction = tf.nn.softmax(logits[1])
  train_digits_1_prediction = tf.nn.softmax(logits[2])

with tf.Session(graph=graph) as session:
  tf.initialize_all_variables().run()
  print('Initialized')
  for step in range(num_steps):
    offset = (step * batch_size) % (multi_digit_mnist.train_data.shape[0] - batch_size)
    batch_data = multi_digit_mnist.train_data[offset:(offset + batch_size), :, :, :]
    batch_length_labels = multi_digit_mnist.train_label_length[offset:(offset + batch_size), 0, :]
    batch_digits_labels = multi_digit_mnist.train_label_digits[offset:(offset + batch_size), :]
    
    feed_dict = {tf_train_dataset : batch_data, tf_train_length_labels : batch_length_labels, tf_train_digits_labels : batch_digits_labels}
    _, l, p_length, p_0, p_1 = session.run(
      [optimizer, loss, train_length_prediction, train_digits_0_prediction, train_digits_1_prediction], feed_dict=feed_dict)
    if (step % 100 == 0):
      print('Minibatch loss at step %d: %f' % (step, l))
      accuracy_result = accuracy_length(p_length, p_0, p_1, p_2, p_3, p_4, batch_length_labels, batch_digits_labels)
      print('Minibatch accuracy_length: %.1f%%' % accuracy_result)
      for k in range(0,2):
          accuracy_result = accuracy_digit(p_length, [p_0, p_1, p_2, p_3, p_4], batch_length_labels, batch_digits_labels, k)
          print("Minibatch accuracy_digit_{}".format(k) + ": %.1f%%" % accuracy_result)


(70000, 2352)
(70000, 1)
(70000, 3)
(49000, 28, 84, 1)
(49000, 1, 4)
(49000, 3, 10)
(14000, 28, 84, 1)
(14000, 1, 4)
(14000, 3, 10)
(7000, 28, 84, 1)
(7000, 1, 4)
(7000, 3, 10)
tf_train_dataset : Tensor("Placeholder:0", shape=(32, 28, 84, 1), dtype=float32)
tf_train_length_labels : Tensor("Placeholder_1:0", shape=(32, 4), dtype=float32)
tf_train_digits_labels : Tensor("Placeholder_2:0", shape=(32, 3, 10), dtype=float32)
Tensor("dropout/mul:0", shape=(32, 14, 42, 8), dtype=float32)
Tensor("dropout_1/mul:0", shape=(32, 7, 21, 16), dtype=float32)
Initialized
Minibatch loss at step 0: 16.063213
Minibatch accuracy_length: 28.1%
Minibatch accuracy_digit_0: 0.0%
Minibatch accuracy_digit_1: 10.5%
Minibatch loss at step 100: 4.341612
Minibatch accuracy_length: 100.0%
Minibatch accuracy_digit_0: 12.5%
Minibatch accuracy_digit_1: 19.0%
Minibatch loss at step 200: 3.618831
Minibatch accuracy_length: 96.9%
Minibatch accuracy_digit_0: 37.5%
Minibatch accuracy_digit_1: 21.7%
Minibatch loss at step 30

- 입력 : 숫자 3개
- 출력 : 숫자 3개 + 길이

In [150]:
num_steps = 2001
batch_size = 32
patch_size = 5
depth = 8
depth_1 = depth
depth_2 = depth_1 * 2
depth_3 = depth_2 * 2
depth_4 = depth_3 * 2
num_hidden = 32

pooling_stride = 2
pooling_kernel_size = 2
beta = 0.001

multi_digit_mnist = MultiDigitMNISTData(3)
multi_digit_mnist.load_data()
graph = tf.Graph()

with graph.as_default():
  # Input data.
  tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, multi_digit_mnist.image_width, multi_digit_mnist.image_height, multi_digit_mnist.num_channels))
  print("tf_train_dataset : {}".format(tf_train_dataset))
  tf_train_length_labels = tf.placeholder(tf.float32, shape=(batch_size, multi_digit_mnist.digit_count + 1))
  print("tf_train_length_labels : {}".format(tf_train_length_labels))
  tf_train_digits_labels = tf.placeholder(tf.float32, shape=(batch_size, multi_digit_mnist.digit_count, 10))
  print("tf_train_digits_labels : {}".format(tf_train_digits_labels))
  tf_valid_dataset = tf.constant(multi_digit_mnist.validation_data)
  tf_test_dataset = tf.constant(multi_digit_mnist.test_data)

  # Variables.
  layer1_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, num_channels, depth_1], stddev=0.1))
  layer1_biases =  tf.Variable(tf.constant(1.0, shape=[depth_1]))

  layer2_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth_1, depth_2], stddev=0.1))
  layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth_2]))

  layer3_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth_2, depth_3], stddev=0.1))
  layer3_biases = tf.Variable(tf.constant(1.0, shape=[depth_3]))
    
  layer4_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth_3, depth_4], stddev=0.1))
  layer4_biases = tf.Variable(tf.constant(1.0, shape=[depth_4]))

  flatted=int(round(float(multi_digit_mnist.image_width) / (pooling_stride ** 2)) * round(float(multi_digit_mnist.image_height) / (pooling_stride ** 2)) * depth_2)
  fc_layer1_weights = tf.Variable(tf.truncated_normal([flatted, flatted], stddev=0.1))
  fc_layer1_biases = tf.Variable(tf.constant(1.0, shape=[flatted]))

  fc_layer2_weights = tf.Variable(tf.truncated_normal([flatted, num_hidden], stddev=0.1))
  fc_layer2_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))

  digit_length_weights = tf.Variable(tf.truncated_normal([num_hidden, multi_digit_mnist.digit_count + 1], stddev=0.1))
  digit_length_biases = tf.Variable(tf.constant(1.0, shape=[multi_digit_mnist.digit_count + 1]))

  digits_0_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_0_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  digits_1_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_1_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  digits_2_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_2_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  digits_3_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_3_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  digits_4_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_4_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  # Model.
  def model(data, is_training=True):
    stride = 2
    conv = tf.nn.conv2d(data, layer1_weights, [1, 1, 1, 1], padding='SAME')
    conv = tf.nn.relu(conv + layer1_biases)
    conv = tf.nn.max_pool(conv, [1, stride, stride, 1], [1, stride, stride, 1], padding='SAME')
    if is_training:
        conv = tf.nn.dropout(conv, 0.8)
    print(conv)
    
    conv = tf.nn.conv2d(conv, layer2_weights, [1, 1, 1, 1], padding='SAME')
    conv = tf.nn.relu(conv + layer2_biases)
    conv = tf.nn.max_pool(conv, [1, stride, stride, 1], [1, stride, stride, 1], padding='SAME')
    if is_training:
        conv = tf.nn.dropout(conv, 0.8)
    print(conv)
    
#     conv = tf.nn.conv2d(conv, layer3_weights, [1, 1, 1, 1], padding='SAME')
#     if is_training:
#         conv = tf.nn.dropout(conv, 0.8)
#     print(conv)
#     conv = tf.nn.relu(conv + layer3_biases)
    
#     conv = tf.nn.conv2d(conv, layer4_weights, [1, 1, 1, 1], padding='SAME')
#     conv = tf.nn.relu(conv + layer4_biases)

    shape = conv.get_shape().as_list()
    reshape = tf.reshape(conv, [shape[0], shape[1] * shape[2] * shape[3]])
#     reshape = tf.nn.relu(tf.matmul(reshape, fc_layer1_weights) + fc_layer1_biases)
    reshape = tf.nn.relu(tf.matmul(reshape, fc_layer2_weights) + fc_layer2_biases)

    digit_length_logit = tf.matmul(reshape, digit_length_weights) + digit_length_biases    
    digit_length_logit_inner = tf.to_float(tf.argmax(tf.nn.softmax(digit_length_logit), dimension=1))
    
    digits_0_logit = tf.matmul(reshape, digits_0_weights) + digits_0_biases
    digits_1_logit = tf.matmul(reshape, digits_1_weights) + digits_1_biases
    digits_2_logit = tf.matmul(reshape, digits_2_weights) + digits_2_biases

    return digit_length_logit, digits_0_logit, digits_1_logit, digits_2_logit;


  # Training computation.
  logits = model(tf_train_dataset, True)
  digits_1_mult = tf.to_float((tf.argmax(tf.nn.softmax(tf_train_length_labels), dimension=1) > 1))
  digits_2_mult = tf.to_float((tf.argmax(tf.nn.softmax(tf_train_length_labels), dimension=1) > 2))

  loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(logits[0], tf_train_length_labels)
        + tf.nn.softmax_cross_entropy_with_logits(logits[1], tf_train_digits_labels[:,0])
        + digits_1_mult* tf.nn.softmax_cross_entropy_with_logits(logits[2], tf_train_digits_labels[:,1])
        + digits_2_mult* tf.nn.softmax_cross_entropy_with_logits(logits[3], tf_train_digits_labels[:,2])
        + beta * tf.nn.l2_loss(layer1_weights) 
        + beta * tf.nn.l2_loss(layer2_weights) 
        + beta * tf.nn.l2_loss(fc_layer2_weights) 
        + beta * tf.nn.l2_loss(digit_length_weights) 
        + beta * tf.nn.l2_loss(digits_0_weights) 
        + beta * tf.nn.l2_loss(digits_1_weights) 
        + beta * tf.nn.l2_loss(digits_2_weights) 
    )

  # Optimizer.
  optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)

  # Predictions for the training, validation, and test data.
  train_length_prediction = tf.nn.softmax(logits[0])
  train_digits_0_prediction = tf.nn.softmax(logits[1])
  train_digits_1_prediction = tf.nn.softmax(logits[2])
  train_digits_2_prediction = tf.nn.softmax(logits[3])

with tf.Session(graph=graph) as session:
  tf.initialize_all_variables().run()
  print('Initialized')
  for step in range(num_steps):
    offset = (step * batch_size) % (multi_digit_mnist.train_data.shape[0] - batch_size)
    batch_data = multi_digit_mnist.train_data[offset:(offset + batch_size), :, :, :]
    batch_length_labels = multi_digit_mnist.train_label_length[offset:(offset + batch_size), 0, :]
    batch_digits_labels = multi_digit_mnist.train_label_digits[offset:(offset + batch_size), :]
    
    feed_dict = {tf_train_dataset : batch_data, tf_train_length_labels : batch_length_labels, tf_train_digits_labels : batch_digits_labels}
    _, l, p_length, p_0, p_1, p_2 = session.run(
      [optimizer, loss, train_length_prediction, train_digits_0_prediction, train_digits_1_prediction, train_digits_2_prediction], feed_dict=feed_dict)
    if (step % 100 == 0):
      print('Minibatch loss at step %d: %f' % (step, l))
      accuracy_result = accuracy_length(p_length, p_0, p_1, p_2, p_3, p_4, batch_length_labels, batch_digits_labels)
      print('Minibatch accuracy_length: %.1f%%' % accuracy_result)
      for k in range(0,3):
          accuracy_result = accuracy_digit(p_length, [p_0, p_1, p_2, p_3, p_4], batch_length_labels, batch_digits_labels, k)
          print("Minibatch accuracy_digit_{}".format(k) + ": %.1f%%" % accuracy_result)


(70000, 2352)
(70000, 1)
(70000, 3)
(49000, 28, 84, 1)
(49000, 1, 4)
(49000, 3, 10)
(14000, 28, 84, 1)
(14000, 1, 4)
(14000, 3, 10)
(7000, 28, 84, 1)
(7000, 1, 4)
(7000, 3, 10)
tf_train_dataset : Tensor("Placeholder:0", shape=(32, 28, 84, 1), dtype=float32)
tf_train_length_labels : Tensor("Placeholder_1:0", shape=(32, 4), dtype=float32)
tf_train_digits_labels : Tensor("Placeholder_2:0", shape=(32, 3, 10), dtype=float32)
Tensor("dropout/mul:0", shape=(32, 14, 42, 8), dtype=float32)
Tensor("dropout_1/mul:0", shape=(32, 7, 21, 16), dtype=float32)
Initialized
Minibatch loss at step 0: 11.962111
Minibatch accuracy_length: 37.5%
Minibatch accuracy_digit_0: 12.5%
Minibatch accuracy_digit_1: 13.0%
Minibatch accuracy_digit_2: 0.0%
Minibatch loss at step 100: 5.999780
Minibatch accuracy_length: 43.8%
Minibatch accuracy_digit_0: 12.5%
Minibatch accuracy_digit_1: 11.1%
Minibatch accuracy_digit_2: 7.7%
Minibatch loss at step 200: 4.949679
Minibatch accuracy_length: 96.9%
Minibatch accuracy_digit_0:

- 입력 : 숫자 4개
- 출력 : 숫자 3개 + 길이

In [154]:
num_steps = 4001
batch_size = 32
patch_size = 5
depth = 8
depth_1 = depth
depth_2 = depth_1 * 2
depth_3 = depth_2 * 2
depth_4 = depth_3 * 2
num_hidden = 48

pooling_stride = 2
pooling_kernel_size = 2
beta = 0.001

multi_digit_mnist = MultiDigitMNISTData(4)
multi_digit_mnist.load_data()
graph = tf.Graph()

with graph.as_default():
  # Input data.
  tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, multi_digit_mnist.image_width, multi_digit_mnist.image_height, multi_digit_mnist.num_channels))
  print("tf_train_dataset : {}".format(tf_train_dataset))
  tf_train_length_labels = tf.placeholder(tf.float32, shape=(batch_size, multi_digit_mnist.digit_count + 1))
  print("tf_train_length_labels : {}".format(tf_train_length_labels))
  tf_train_digits_labels = tf.placeholder(tf.float32, shape=(batch_size, multi_digit_mnist.digit_count, 10))
  print("tf_train_digits_labels : {}".format(tf_train_digits_labels))
  tf_valid_dataset = tf.constant(multi_digit_mnist.validation_data)
  tf_test_dataset = tf.constant(multi_digit_mnist.test_data)

  # Variables.
  layer1_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, num_channels, depth_1], stddev=0.1))
  layer1_biases =  tf.Variable(tf.constant(1.0, shape=[depth_1]))

  layer2_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth_1, depth_2], stddev=0.1))
  layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth_2]))

  layer3_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth_2, depth_3], stddev=0.1))
  layer3_biases = tf.Variable(tf.constant(1.0, shape=[depth_3]))
    
  layer4_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth_3, depth_4], stddev=0.1))
  layer4_biases = tf.Variable(tf.constant(1.0, shape=[depth_4]))

  flatted=int(round(float(multi_digit_mnist.image_width) / (pooling_stride ** 2)) * round(float(multi_digit_mnist.image_height) / (pooling_stride ** 2)) * depth_2)
  fc_layer1_weights = tf.Variable(tf.truncated_normal([flatted, flatted], stddev=0.1))
  fc_layer1_biases = tf.Variable(tf.constant(1.0, shape=[flatted]))

  fc_layer2_weights = tf.Variable(tf.truncated_normal([flatted, num_hidden], stddev=0.1))
  fc_layer2_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))

  digit_length_weights = tf.Variable(tf.truncated_normal([num_hidden, multi_digit_mnist.digit_count + 1], stddev=0.1))
  digit_length_biases = tf.Variable(tf.constant(1.0, shape=[multi_digit_mnist.digit_count + 1]))

  digits_0_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_0_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  digits_1_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_1_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  digits_2_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_2_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  digits_3_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_3_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  digits_4_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_4_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  # Model.
  def model(data, is_training=True):
    stride = 2
    conv = tf.nn.conv2d(data, layer1_weights, [1, 1, 1, 1], padding='SAME')
    conv = tf.nn.relu(conv + layer1_biases)
    conv = tf.nn.max_pool(conv, [1, stride, stride, 1], [1, stride, stride, 1], padding='SAME')
    if is_training:
        conv = tf.nn.dropout(conv, 0.8)
    print(conv)
    
    conv = tf.nn.conv2d(conv, layer2_weights, [1, 1, 1, 1], padding='SAME')
    conv = tf.nn.relu(conv + layer2_biases)
    conv = tf.nn.max_pool(conv, [1, stride, stride, 1], [1, stride, stride, 1], padding='SAME')
    if is_training:
        conv = tf.nn.dropout(conv, 0.8)
    print(conv)
    
#     conv = tf.nn.conv2d(conv, layer3_weights, [1, 1, 1, 1], padding='SAME')
#     if is_training:
#         conv = tf.nn.dropout(conv, 0.8)
#     print(conv)
#     conv = tf.nn.relu(conv + layer3_biases)
    
#     conv = tf.nn.conv2d(conv, layer4_weights, [1, 1, 1, 1], padding='SAME')
#     conv = tf.nn.relu(conv + layer4_biases)

    shape = conv.get_shape().as_list()
    reshape = tf.reshape(conv, [shape[0], shape[1] * shape[2] * shape[3]])
#     reshape = tf.nn.relu(tf.matmul(reshape, fc_layer1_weights) + fc_layer1_biases)
    reshape = tf.nn.relu(tf.matmul(reshape, fc_layer2_weights) + fc_layer2_biases)

    digit_length_logit = tf.matmul(reshape, digit_length_weights) + digit_length_biases    
    digit_length_logit_inner = tf.to_float(tf.argmax(tf.nn.softmax(digit_length_logit), dimension=1))
    
    digits_0_logit = tf.matmul(reshape, digits_0_weights) + digits_0_biases
    digits_1_logit = tf.matmul(reshape, digits_1_weights) + digits_1_biases
    digits_2_logit = tf.matmul(reshape, digits_2_weights) + digits_2_biases

    return digit_length_logit, digits_0_logit, digits_1_logit, digits_2_logit;


  # Training computation.
  logits = model(tf_train_dataset, True)
  digits_1_mult = tf.to_float((tf.argmax(tf.nn.softmax(tf_train_length_labels), dimension=1) > 1))
  digits_2_mult = tf.to_float((tf.argmax(tf.nn.softmax(tf_train_length_labels), dimension=1) > 2))

  loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(logits[0], tf_train_length_labels)
        + tf.nn.softmax_cross_entropy_with_logits(logits[1], tf_train_digits_labels[:,0])
        + digits_1_mult* tf.nn.softmax_cross_entropy_with_logits(logits[2], tf_train_digits_labels[:,1])
        + digits_2_mult* tf.nn.softmax_cross_entropy_with_logits(logits[3], tf_train_digits_labels[:,2])
        + beta * tf.nn.l2_loss(layer1_weights) 
        + beta * tf.nn.l2_loss(layer2_weights) 
        + beta * tf.nn.l2_loss(fc_layer2_weights) 
        + beta * tf.nn.l2_loss(digit_length_weights) 
        + beta * tf.nn.l2_loss(digits_0_weights) 
        + beta * tf.nn.l2_loss(digits_1_weights) 
        + beta * tf.nn.l2_loss(digits_2_weights) 
    )

  # Optimizer.
  optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)

  # Predictions for the training, validation, and test data.
  train_length_prediction = tf.nn.softmax(logits[0])
  train_digits_0_prediction = tf.nn.softmax(logits[1])
  train_digits_1_prediction = tf.nn.softmax(logits[2])
  train_digits_2_prediction = tf.nn.softmax(logits[3])

with tf.Session(graph=graph) as session:
  tf.initialize_all_variables().run()
  print('Initialized')
  for step in range(num_steps):
    offset = (step * batch_size) % (multi_digit_mnist.train_data.shape[0] - batch_size)
    batch_data = multi_digit_mnist.train_data[offset:(offset + batch_size), :, :, :]
    batch_length_labels = multi_digit_mnist.train_label_length[offset:(offset + batch_size), 0, :]
    batch_digits_labels = multi_digit_mnist.train_label_digits[offset:(offset + batch_size), :]
    
    feed_dict = {tf_train_dataset : batch_data, tf_train_length_labels : batch_length_labels, tf_train_digits_labels : batch_digits_labels}
    _, l, p_length, p_0, p_1, p_2 = session.run(
      [optimizer, loss, train_length_prediction, train_digits_0_prediction, train_digits_1_prediction, train_digits_2_prediction], feed_dict=feed_dict)
    if (step % 100 == 0):
      print('Minibatch loss at step %d: %f' % (step, l))
      accuracy_result = accuracy_length(p_length, p_0, p_1, p_2, p_3, p_4, batch_length_labels, batch_digits_labels)
      print('Minibatch accuracy_length: %.1f%%' % accuracy_result)
      for k in range(0,3):
          accuracy_result = accuracy_digit(p_length, [p_0, p_1, p_2, p_3, p_4], batch_length_labels, batch_digits_labels, k)
          print("Minibatch accuracy_digit_{}".format(k) + ": %.1f%%" % accuracy_result)


(70000, 3136)
(70000, 1)
(70000, 4)
(49000, 28, 112, 1)
(49000, 1, 5)
(49000, 4, 10)
(14000, 28, 112, 1)
(14000, 1, 5)
(14000, 4, 10)
(7000, 28, 112, 1)
(7000, 1, 5)
(7000, 4, 10)
tf_train_dataset : Tensor("Placeholder:0", shape=(32, 28, 112, 1), dtype=float32)
tf_train_length_labels : Tensor("Placeholder_1:0", shape=(32, 5), dtype=float32)
tf_train_digits_labels : Tensor("Placeholder_2:0", shape=(32, 4, 10), dtype=float32)
Tensor("dropout/mul:0", shape=(32, 14, 56, 8), dtype=float32)
Tensor("dropout_1/mul:0", shape=(32, 7, 28, 16), dtype=float32)
Initialized
Minibatch loss at step 0: 25.488922
Minibatch accuracy_length: 34.4%
Minibatch accuracy_digit_0: 15.6%
Minibatch accuracy_digit_1: 13.6%
Minibatch accuracy_digit_2: 6.2%
Minibatch loss at step 100: 7.210330
Minibatch accuracy_length: 15.6%
Minibatch accuracy_digit_0: 3.1%
Minibatch accuracy_digit_1: 17.4%
Minibatch accuracy_digit_2: 0.0%
Minibatch loss at step 200: 7.183245
Minibatch accuracy_length: 18.8%
Minibatch accuracy_digit

- 입력 : 숫자 4개
- 출력 : 숫자 4개 + 길이

In [161]:
num_steps = 4001
batch_size = 32
patch_size = 5
depth = 8
depth_1 = depth
depth_2 = depth_1 * 2
depth_3 = depth_2 * 2
depth_4 = depth_3 * 2
num_hidden = 64

pooling_stride = 2
pooling_kernel_size = 2
beta = 0.001

multi_digit_mnist = MultiDigitMNISTData(4)
multi_digit_mnist.load_data()
graph = tf.Graph()

with graph.as_default():
  # Input data.
  tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, multi_digit_mnist.image_width, multi_digit_mnist.image_height, multi_digit_mnist.num_channels))
  print("tf_train_dataset : {}".format(tf_train_dataset))
  tf_train_length_labels = tf.placeholder(tf.float32, shape=(batch_size, multi_digit_mnist.digit_count + 1))
  print("tf_train_length_labels : {}".format(tf_train_length_labels))
  tf_train_digits_labels = tf.placeholder(tf.float32, shape=(batch_size, multi_digit_mnist.digit_count, 10))
  print("tf_train_digits_labels : {}".format(tf_train_digits_labels))
  tf_valid_dataset = tf.constant(multi_digit_mnist.validation_data)
  tf_test_dataset = tf.constant(multi_digit_mnist.test_data)

  # Variables.
  layer1_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, num_channels, depth_1], stddev=0.1))
  layer1_biases =  tf.Variable(tf.constant(1.0, shape=[depth_1]))

  layer2_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth_1, depth_2], stddev=0.1))
  layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth_2]))

  layer3_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth_2, depth_3], stddev=0.1))
  layer3_biases = tf.Variable(tf.constant(1.0, shape=[depth_3]))
    
  layer4_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth_3, depth_4], stddev=0.1))
  layer4_biases = tf.Variable(tf.constant(1.0, shape=[depth_4]))

  flatted=int(round(float(multi_digit_mnist.image_width) / (pooling_stride ** 2)) * round(float(multi_digit_mnist.image_height) / (pooling_stride ** 2)) * depth_2)
  fc_layer1_weights = tf.Variable(tf.truncated_normal([flatted, flatted], stddev=0.1))
  fc_layer1_biases = tf.Variable(tf.constant(1.0, shape=[flatted]))

  fc_layer2_weights = tf.Variable(tf.truncated_normal([flatted, num_hidden], stddev=0.1))
  fc_layer2_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))

  digit_length_weights = tf.Variable(tf.truncated_normal([num_hidden, multi_digit_mnist.digit_count + 1], stddev=0.1))
  digit_length_biases = tf.Variable(tf.constant(1.0, shape=[multi_digit_mnist.digit_count + 1]))

  digits_0_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_0_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  digits_1_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_1_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  digits_2_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_2_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  digits_3_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_3_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  digits_4_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_4_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  # Model.
  def model(data, is_training=True):
    stride = 2
    conv = tf.nn.conv2d(data, layer1_weights, [1, 1, 1, 1], padding='SAME')
    conv = tf.nn.relu(conv + layer1_biases)
    conv = tf.nn.max_pool(conv, [1, stride, stride, 1], [1, stride, stride, 1], padding='SAME')
    if is_training:
        conv = tf.nn.dropout(conv, 0.8)
    print(conv)
    
    conv = tf.nn.conv2d(conv, layer2_weights, [1, 1, 1, 1], padding='SAME')
    conv = tf.nn.relu(conv + layer2_biases)
    conv = tf.nn.max_pool(conv, [1, stride, stride, 1], [1, stride, stride, 1], padding='SAME')
    if is_training:
        conv = tf.nn.dropout(conv, 0.8)
    print(conv)
    
#     conv = tf.nn.conv2d(conv, layer3_weights, [1, 1, 1, 1], padding='SAME')
#     if is_training:
#         conv = tf.nn.dropout(conv, 0.8)
#     print(conv)
#     conv = tf.nn.relu(conv + layer3_biases)
    
#     conv = tf.nn.conv2d(conv, layer4_weights, [1, 1, 1, 1], padding='SAME')
#     conv = tf.nn.relu(conv + layer4_biases)

    shape = conv.get_shape().as_list()
    reshape = tf.reshape(conv, [shape[0], shape[1] * shape[2] * shape[3]])
#     reshape = tf.nn.relu(tf.matmul(reshape, fc_layer1_weights) + fc_layer1_biases)
    reshape = tf.nn.relu(tf.matmul(reshape, fc_layer2_weights) + fc_layer2_biases)

    digit_length_logit = tf.matmul(reshape, digit_length_weights) + digit_length_biases    
    digit_length_logit_inner = tf.to_float(tf.argmax(tf.nn.softmax(digit_length_logit), dimension=1))
    
    digits_0_logit = tf.matmul(reshape, digits_0_weights) + digits_0_biases
    digits_1_logit = tf.matmul(reshape, digits_1_weights) + digits_1_biases
    digits_2_logit = tf.matmul(reshape, digits_2_weights) + digits_2_biases
    digits_3_logit = tf.matmul(reshape, digits_3_weights) + digits_3_biases

    return digit_length_logit, digits_0_logit, digits_1_logit, digits_2_logit, digits_3_logit;


  # Training computation.
  logits = model(tf_train_dataset, True)
  digits_1_mult = tf.to_float((tf.argmax(tf.nn.softmax(tf_train_length_labels), dimension=1) > 1))
  digits_2_mult = tf.to_float((tf.argmax(tf.nn.softmax(tf_train_length_labels), dimension=1) > 2))
  digits_3_mult = tf.to_float((tf.argmax(tf.nn.softmax(tf_train_length_labels), dimension=1) > 3))

  loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(logits[0], tf_train_length_labels)
        + tf.nn.softmax_cross_entropy_with_logits(logits[1], tf_train_digits_labels[:,0])
        + digits_1_mult* tf.nn.softmax_cross_entropy_with_logits(logits[2], tf_train_digits_labels[:,1])
        + digits_2_mult* tf.nn.softmax_cross_entropy_with_logits(logits[3], tf_train_digits_labels[:,2])
        + digits_3_mult* tf.nn.softmax_cross_entropy_with_logits(logits[4], tf_train_digits_labels[:,3])
        + beta * tf.nn.l2_loss(layer1_weights) 
        + beta * tf.nn.l2_loss(layer2_weights) 
        + beta * tf.nn.l2_loss(fc_layer2_weights) 
        + beta * tf.nn.l2_loss(digit_length_weights) 
        + beta * tf.nn.l2_loss(digits_0_weights) 
        + beta * tf.nn.l2_loss(digits_1_weights) 
        + beta * tf.nn.l2_loss(digits_2_weights) 
    )

  # Optimizer.
  optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)
#   global_step = tf.Variable(0, trainable=False)
#   starter_learning_rate = 0.1
#   learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step,100, 0.96, staircase=True)
#   optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

  # Predictions for the training, validation, and test data.
  train_length_prediction = tf.nn.softmax(logits[0])
  train_digits_0_prediction = tf.nn.softmax(logits[1])
  train_digits_1_prediction = tf.nn.softmax(logits[2])
  train_digits_2_prediction = tf.nn.softmax(logits[3])
  train_digits_3_prediction = tf.nn.softmax(logits[4])

with tf.Session(graph=graph) as session:
  tf.initialize_all_variables().run()
  print('Initialized')
  for step in range(num_steps):
    offset = (step * batch_size) % (multi_digit_mnist.train_data.shape[0] - batch_size)
    batch_data = multi_digit_mnist.train_data[offset:(offset + batch_size), :, :, :]
    batch_length_labels = multi_digit_mnist.train_label_length[offset:(offset + batch_size), 0, :]
    batch_digits_labels = multi_digit_mnist.train_label_digits[offset:(offset + batch_size), :]
    
    feed_dict = {tf_train_dataset : batch_data, tf_train_length_labels : batch_length_labels, tf_train_digits_labels : batch_digits_labels}
    _, l, p_length, p_0, p_1, p_2, p_3 = session.run(
      [optimizer, loss, train_length_prediction, train_digits_0_prediction, train_digits_1_prediction, train_digits_2_prediction, train_digits_3_prediction], feed_dict=feed_dict)
    if (step % 100 == 0):
      print('Minibatch loss at step %d: %f' % (step, l))
      accuracy_result = accuracy_length(p_length, p_0, p_1, p_2, p_3, p_4, batch_length_labels, batch_digits_labels)
      print('Minibatch accuracy_length: %.1f%%' % accuracy_result)
      for k in range(0,4):
          accuracy_result = accuracy_digit(p_length, [p_0, p_1, p_2, p_3, p_4], batch_length_labels, batch_digits_labels, k)
          print("Minibatch accuracy_digit_{}".format(k) + ": %.1f%%" % accuracy_result)


(70000, 3136)
(70000, 1)
(70000, 4)
(49000, 28, 112, 1)
(49000, 1, 5)
(49000, 4, 10)
(14000, 28, 112, 1)
(14000, 1, 5)
(14000, 4, 10)
(7000, 28, 112, 1)
(7000, 1, 5)
(7000, 4, 10)
tf_train_dataset : Tensor("Placeholder:0", shape=(32, 28, 112, 1), dtype=float32)
tf_train_length_labels : Tensor("Placeholder_1:0", shape=(32, 5), dtype=float32)
tf_train_digits_labels : Tensor("Placeholder_2:0", shape=(32, 4, 10), dtype=float32)
Tensor("dropout/mul:0", shape=(32, 14, 56, 8), dtype=float32)
Tensor("dropout_1/mul:0", shape=(32, 7, 28, 16), dtype=float32)
Initialized
Minibatch loss at step 0: 37.588093
Minibatch accuracy_length: 18.8%
Minibatch accuracy_digit_0: 9.4%
Minibatch accuracy_digit_1: 7.7%
Minibatch accuracy_digit_2: 13.3%
Minibatch accuracy_digit_3: 0.0%
Minibatch loss at step 100: 7.331507
Minibatch accuracy_length: 81.2%
Minibatch accuracy_digit_0: 6.2%
Minibatch accuracy_digit_1: 19.2%
Minibatch accuracy_digit_2: 0.0%
Minibatch accuracy_digit_3: 14.3%
Minibatch loss at step 200: 

In [16]:
num_steps = 301
batch_size = 16
patch_size = 5
num_channels = 1
depth = 16
depth_1 = 16
depth_2 = 16
depth_3 = 16
num_hidden = 64

pooling_stride = 2
pooling_kernel_size = 2
beta = 0.0000001

multi_digit_mnist = MultiDigitMNISTData(1)
multi_digit_mnist.load_data()
graph = tf.Graph()

with graph.as_default():
  # Input data.
  tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, multi_digit_mnist.image_width, multi_digit_mnist.image_height, multi_digit_mnist.num_channels))
  print("tf_train_dataset : {}".format(tf_train_dataset))
  tf_train_length_labels = tf.placeholder(tf.float32, shape=(batch_size, multi_digit_mnist.digit_count + 1))
  print("tf_train_length_labels : {}".format(tf_train_length_labels))
  tf_train_digits_labels = tf.placeholder(tf.float32, shape=(batch_size, multi_digit_mnist.digit_count, 10))
  print("tf_train_digits_labels : {}".format(tf_train_digits_labels))
  tf_valid_dataset = tf.constant(multi_digit_mnist.validation_data)
  tf_test_dataset = tf.constant(multi_digit_mnist.test_data)

  # Variables.
  layer1_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, num_channels, depth_1], stddev=0.1))
  layer1_biases =  tf.Variable(tf.constant(1.0, shape=[depth_1]))

  layer2_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth_1, depth_2], stddev=0.1))
  layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth_2]))

  layer3_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth_2, depth_3], stddev=0.1))
  layer3_biases = tf.Variable(tf.constant(1.0, shape=[depth_3]))

  flatted=int(round(float(multi_digit_mnist.image_width) / (pooling_stride ** 2)) * round(float(multi_digit_mnist.image_height) / (pooling_stride ** 2)) * depth_2)
  fc_layer1_weights = tf.Variable(tf.truncated_normal([flatted, flatted], stddev=0.1))
  fc_layer1_biases = tf.Variable(tf.constant(1.0, shape=[flatted]))

  fc_layer2_weights = tf.Variable(tf.truncated_normal([flatted, num_hidden], stddev=0.1))
  fc_layer2_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))

  digit_length_weights = tf.Variable(tf.truncated_normal([num_hidden, multi_digit_mnist.digit_count + 1], stddev=0.1))
  digit_length_biases = tf.Variable(tf.constant(1.0, shape=[multi_digit_mnist.digit_count + 1]))

  digits_0_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_0_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  digits_1_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_1_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  digits_2_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_2_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  digits_3_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_3_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  digits_4_weights = tf.Variable(tf.truncated_normal([num_hidden, 10], stddev=0.1))
  digits_4_biases = tf.Variable(tf.constant(1.0, shape=[10]))

  # Model.
  def model(data, is_training=True):
    stride = 2
    conv = tf.nn.conv2d(data, layer1_weights, [1, stride, stride, 1], padding='SAME')
#     conv = tf.nn.conv2d(data, layer1_weights, [1, 1, 1, 1], padding='SAME')
    conv = tf.nn.relu(conv + layer1_biases)
#     conv = tf.nn.max_pool(conv, [1, stride, stride, 1], [1, stride, stride, 1], padding='SAME')
    print("layer1 : {}".format(conv)) # (70, 70, 14, 16)
#     if is_training:
#         conv = tf.nn.dropout(conv, 0.8)

    conv = tf.nn.conv2d(conv, layer2_weights, [1, stride, stride, 1], padding='SAME')
#     conv = tf.nn.conv2d(conv, layer2_weights, [1, 1, 1, 1], padding='SAME')
    conv = tf.nn.relu(conv + layer2_biases)
#     conv = tf.nn.max_pool(conv, [1, stride, stride, 1], [1, stride, stride, 1], padding='SAME')
    print("layer2 : {}".format(conv)) # (70, 35, 7, 16)
#     if is_training:
#         conv = tf.nn.dropout(conv, 0.8)

#     conv = tf.nn.conv2d(conv, layer3_weights, [1, stride, stride, 1], padding='SAME')
#     conv = tf.nn.conv2d(conv, layer3_weights, [1, 1, 1, 1], padding='SAME')
#     conv = tf.nn.relu(conv + layer3_biases)
#     conv = tf.nn.max_pool(conv, [1, stride, 1, 1], [1, stride, 1, 1], padding='SAME')
#     print("layer3 : {}".format(conv))
#     if is_training:
#         conv = tf.nn.dropout(conv, 0.8)

    shape = conv.get_shape().as_list()
    reshape = tf.reshape(conv, [shape[0], shape[1] * shape[2] * shape[3]])
    # (32, 1152) = 18 * 4 * 16
    print("reshape : {}".format(reshape))
    # (944, 256) = 17 * 3 * 16
    print("fc_layer1_weights : {}".format(fc_layer1_weights.get_shape()))
#     conv = tf.nn.relu(tf.matmul(reshape, fc_layer1_weights) + fc_layer1_biases)
    print("fc_layer1 : {}".format(conv)) # (70, 64)
    conv = tf.nn.relu(tf.matmul(reshape, fc_layer2_weights) + fc_layer2_biases)
    print("fc_layer2 : {}".format(conv)) # (70, 64)

    digit_length_logit = tf.matmul(conv, digit_length_weights) + digit_length_biases
    print("digit length layer : {}".format(digit_length_logit)) # (16, 6)
    
    digit_length_logit_inner = tf.to_float(tf.argmax(tf.nn.softmax(digit_length_logit), dimension=1))
    print("digit_length_logit_inner : {}".format(digit_length_logit_inner)) # (16, )

    digits_0_logit = tf.matmul(conv, digits_0_weights) + digits_0_biases
    print("digit[0] layer : {}".format(digits_0_logit)) # (16, 10)
    
    digits_1_mult = tf.maximum(digit_length_logit_inner - 1, 0)
    digits_1_mult_packed = tf.transpose(tf.pack([digits_1_mult,digits_1_mult,digits_1_mult,digits_1_mult,digits_1_mult,digits_1_mult,digits_1_mult,digits_1_mult,digits_1_mult,digits_1_mult]))
    print("digits_1_mult_packed : {}".format(digits_1_mult_packed)) # (16, 10)
    digits_1_logit = (tf.matmul(conv, digits_1_weights) + digits_1_biases) * digits_1_mult_packed
    print("digit[1] layer : {}".format(digits_1_logit)) # (16, 10)
    
    digits_2_mult = tf.maximum(digit_length_logit_inner - 2, 0)
    digits_2_mult_packed = tf.transpose(tf.pack([digits_2_mult,digits_2_mult,digits_2_mult,digits_2_mult,digits_2_mult,digits_2_mult,digits_2_mult,digits_2_mult,digits_2_mult,digits_2_mult]))
    print("digits_2_mult_packed : {}".format(digits_2_mult_packed)) # (16, 10)
    digits_2_logit = (tf.matmul(conv, digits_2_weights) + digits_2_biases) * digits_2_mult_packed
    print("digit[2] layer : {}".format(digits_2_logit)) # (16, 10)
    
    digits_3_mult = tf.maximum(digit_length_logit_inner - 3, 0)
    digits_3_mult_packed = tf.transpose(tf.pack([digits_3_mult,digits_3_mult,digits_3_mult,digits_3_mult,digits_3_mult,digits_3_mult,digits_3_mult,digits_3_mult,digits_3_mult,digits_3_mult]))
    print("digits_3_mult_packed : {}".format(digits_3_mult_packed)) # (16, 10)
    digits_3_logit = (tf.matmul(conv, digits_3_weights) + digits_3_biases) * digits_3_mult_packed
    print("digit[3] layer : {}".format(digits_3_logit)) # (16, 10)
    
    digits_4_mult = tf.maximum(digit_length_logit_inner - 4, 0)
    digits_4_mult_packed = tf.transpose(tf.pack([digits_4_mult,digits_4_mult,digits_4_mult,digits_4_mult,digits_4_mult,digits_4_mult,digits_4_mult,digits_4_mult,digits_4_mult,digits_4_mult]))
    print("digits_4_mult_packed : {}".format(digits_4_mult_packed)) # (16, 10)
    digits_4_logit = (tf.matmul(conv, digits_4_weights) + digits_4_biases) * digits_4_mult_packed
    print("digit[4] layer : {}".format(digits_4_logit)) # (16, 10)

    return digit_length_logit,digits_0_logit,digits_1_logit,digits_2_logit,digits_3_logit,digits_4_logit;


  # Training computation.
  logits = model(tf_train_dataset, True)
  loss = tf.reduce_mean(
#                         tf.nn.softmax_cross_entropy_with_logits(logits[0], tf_train_length_labels)
                        tf.nn.softmax_cross_entropy_with_logits(logits[1], tf_train_digits_labels[:,0])
#                         + tf.nn.softmax_cross_entropy_with_logits(logits[2], tf_train_digits_labels[:,1])
#                         + tf.nn.softmax_cross_entropy_with_logits(logits[3], tf_train_digits_labels[:,2])
#                         + tf.nn.softmax_cross_entropy_with_logits(logits[4], tf_train_digits_labels[:,3])
#                         + tf.nn.softmax_cross_entropy_with_logits(logits[5], tf_train_digits_labels[:,4])
#                         + beta * tf.nn.l2_loss(layer1_weights) 
#                         + beta * tf.nn.l2_loss(layer2_weights) 
#                         + beta * tf.nn.l2_loss(layer3_weights) 
#                         + beta * tf.nn.l2_loss(fc_layer1_weights) 
#                         + beta * tf.nn.l2_loss(fc_layer2_weights) 
#                         + beta * tf.nn.l2_loss(digits_0_weights) 
#                         + beta * tf.nn.l2_loss(digits_1_weights) 
#                         + beta * tf.nn.l2_loss(digits_2_weights) 
#                         + beta * tf.nn.l2_loss(digits_3_weights) 
#                         + beta * tf.nn.l2_loss(digits_4_weights) 
#                         + beta * tf.nn.l2_loss(digit_length_weights) 
                       )


  print("logits[0] : {}".format(logits[0]))
  print("tf_train_length_labels : {}".format(tf_train_length_labels))
  print("logits[1] : {}".format(logits[1]))
  print("tf_train_digits_labels[:,0] : {}".format(tf_train_digits_labels[:,0]))

  # Optimizer.
  global_step = tf.Variable(0, trainable=False)
  starter_learning_rate = 0.1
  learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step,100, 0.96, staircase=True)
  optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

  # Predictions for the training, validation, and test data.
  train_length_prediction = tf.nn.softmax(logits[0])
  train_digits_0_prediction = tf.nn.softmax(logits[1])
  train_digits_1_prediction = tf.nn.softmax(logits[2])
  train_digits_2_prediction = tf.nn.softmax(logits[3])
  train_digits_3_prediction = tf.nn.softmax(logits[4])
  train_digits_4_prediction = tf.nn.softmax(logits[5])

  print("train_length_prediction : {}".format(train_length_prediction))
  print("train_digits_0_prediction : {}".format(train_digits_0_prediction))

  #valid_prediction = tf.argmax(model(multi_digit_mnist.validation_data, False), 1)
  #test_prediction = tf.argmax(model(multi_digit_mnist.test_data, False), 1)

with tf.Session(graph=graph) as session:
  tf.initialize_all_variables().run()
  print('Initialized')
  for step in range(num_steps):
    offset = (step * batch_size) % (multi_digit_mnist.train_data.shape[0] - batch_size)
    batch_data = multi_digit_mnist.train_data[offset:(offset + batch_size), :, :, :]
    batch_length_labels = multi_digit_mnist.train_label_length[offset:(offset + batch_size), 0, :]
    batch_digits_labels = multi_digit_mnist.train_label_digits[offset:(offset + batch_size), :]
    
#     print(batch_data.shape)
#     print(batch_length_labels.shape)
#     print(batch_digits_labels.shape)
    
    feed_dict = {tf_train_dataset : batch_data, tf_train_length_labels : batch_length_labels, tf_train_digits_labels : batch_digits_labels}
    _, l, p_length, p_0, p_1, p_2, p_3, p_4 = session.run(
      [optimizer, loss, train_length_prediction, train_digits_0_prediction, train_digits_1_prediction, train_digits_2_prediction, train_digits_3_prediction, train_digits_4_prediction], feed_dict=feed_dict)
#     print('Minibatch loss at step %d: %f' % (step, l))
#     print(batch_digits_labels[:,0])
#     print(p_0)
#     print("digits_0_weights.l2_loss : {}".format(tf.nn.l2_loss(digits_0_weights).eval()))
#     print("digits_0_biases.l2_loss : {}".format(tf.nn.l2_loss(digits_0_biases).eval()))
    if (step % 50 == 0):
      print('Minibatch loss at step %d: %f' % (step, l))
      #accuracy_result = accuracy(p_length, p_0, p_1, p_2, p_3, p_4, batch_length_labels, batch_digits_labels)
      accuracy_result = accuracy_length(p_length, p_0, p_1, p_2, p_3, p_4, batch_length_labels, batch_digits_labels)
      print('Minibatch accuracy_length: %.1f%%' % accuracy_result)
      for k in range(0,1):
          accuracy_result = accuracy_digit(p_length, [p_0, p_1, p_2, p_3, p_4], batch_length_labels, batch_digits_labels, k)
          print("Minibatch accuracy_digit_{}".format(k) + ": %.1f%%" % accuracy_result)
      #print('Validation accuracy: %.1f%%' % accuracy(valid_prediction.eval(), valid_labels))
  #print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

(70000, 784)
(70000, 1)
(70000, 1)
(49000, 28, 28, 1)
(49000, 1, 2)
(49000, 1, 10)
(14000, 28, 28, 1)
(14000, 1, 2)
(14000, 1, 10)
(7000, 28, 28, 1)
(7000, 1, 2)
(7000, 1, 10)
tf_train_dataset : Tensor("Placeholder:0", shape=(16, 28, 28, 1), dtype=float32)
tf_train_length_labels : Tensor("Placeholder_1:0", shape=(16, 2), dtype=float32)
tf_train_digits_labels : Tensor("Placeholder_2:0", shape=(16, 1, 10), dtype=float32)
layer1 : Tensor("Relu:0", shape=(16, 14, 14, 16), dtype=float32)
layer2 : Tensor("Relu_1:0", shape=(16, 7, 7, 16), dtype=float32)
reshape : Tensor("Reshape:0", shape=(16, 784), dtype=float32)
fc_layer1_weights : (784, 784)
fc_layer1 : Tensor("Relu_1:0", shape=(16, 7, 7, 16), dtype=float32)
fc_layer2 : Tensor("Relu_2:0", shape=(16, 64), dtype=float32)
digit length layer : Tensor("add_3:0", shape=(16, 2), dtype=float32)
digit_length_logit_inner : Tensor("ToFloat:0", shape=(16,), dtype=float32)
digit[0] layer : Tensor("add_4:0", shape=(16, 10), dtype=float32)
digits_1_mult_

## Step 2: Train a model on a realistic dataset.

Once you have settled on a good architecture, you can train your model on real data. In particular, [the SVHN dataset](https://www.google.com/url?q=http://ufldl.stanford.edu/housenumbers/&sa=D&ust=1476236462762000&usg=AFQjCNGwcYwoDgoe0HcsyDz68YlW8fMgbA) is a good large scale dataset collected from house numbers in Google Street View. Training on this more challenging dataset, where the digits are not neatly lined-up and have various skews, fonts and colors, likely means you have to do some hyperparameter exploration to do well.

- _How does your model perform on a realistic dataset?_
- _What changes did you have to make, if any?_

## Step 3 (optional): Put the model into an Android app.

Do this step only if you have access to an Android device. If you don’t, you may either:
- i) take pictures of numbers that you find around you, and run them through your classifier on your computer to produce example results, or,
- ii) use OpenCV / SimpleCV / Pygame to capture live images from a webcam.

Loading a TensorFlow model into a camera app on Android is demonstrated in [the TensorFlow Android demo app](https://www.google.com/url?q=https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/android&sa=D&ust=1476236462766000&usg=AFQjCNHmG7Hq8LBapwPWMAshjdU7AwABwA), which you can simply modify.

- _Is your model able to perform equally well on captured pictures or a live camera stream?_
- _Document how you built the interface to your model._

## Step 4: Explore!

There are many things you can do once you have the basic classifier in place. One example would be to also localize where the numbers are on the image. The SVHN dataset provides bounding boxes that you can tune to train a localizer. Simply training a regression loss to the coordinates of the bounding box is one way to get decent localization.

Once you have the data localized, you can for example try turn it into an augmented reality app by overlaying your answer on the image like the Word Lens app does.

Those are just examples of extensions you can look into. Use your imagination!

- _Make sure to report what extension(s) you have implemented and how they worked._