Why accuracy of CNN with BatchNormLayer will change slightly after restoring? #57

wagamamaz · 2017-01-07T07:32:04Z

Hi everyone, I found a interesting thing but I don't know the reason.
When I restore a CNN network with BatchNormLayer from npz file, the accuracy is slightly different, my code as attached. Hope someone can help me, thanks in advance.

Here is my result:
Result of training is_test_only = False: (note: I set n_epoch=1 and use only a small part of trainig data for fast debugging.)

Epoch 1 of 1 took 28.871696s
   train loss: 0.823357
   train acc: 0.757412
   val loss: 0.804865
   val acc: 0.768029
Evaluation
   test loss: 0.829894
   test acc: 0.760116

Result of restoring the network from npz file, is_test_only = True:

Evaluation
   test loss: 0.832265
   test acc: 0.760917

@sczhengyabin I saw you set variables = tf.GraphKeys.GLOBAL_VARIABLES in BatchNormLayer (layers.py line 1825), but I found it will get 8 parameters ... are you sure it is correct? I tried the following setting, but the accuracy still has slightly difference ...

if variables = tf.GraphKeys.GLOBAL_VARIABLES, it has 8 variables <- @sczhengyabin using this.
if variables = tf.GraphKeys.TRAINABLE_VARIABLES, it has 2 variables (gamma, beta)   
or variables = [beta, gamma, moving_mean, moving_variance] ?

@boscotsang as I discuss with you in pull/42, the testing and training cost are all drop normally, but I really don't understand why the accuracies are different after restoring and what variables should be included in the BatchNormLayer.
My code
environment: TensorFlow 0.12 and TensorLayer 1.3.0

#! /usr/bin/python
# -*- coding: utf8 -*-

import numpy as np
import tensorflow as tf
import tensorlayer as tl
from tensorlayer.layers import set_keep
import time

is_test_only = False # if True, restore and test without training

X_train, y_train, X_val, y_val, X_test, y_test = \
                tl.files.load_mnist_dataset(shape=(-1, 28, 28, 1))

X_train = np.asarray(X_train, dtype=np.float32)[0:10000]#<-- small training set for fast debugging
y_train = np.asarray(y_train, dtype=np.int64)[0:10000]
X_val = np.asarray(X_val, dtype=np.float32)
y_val = np.asarray(y_val, dtype=np.int64)
X_test = np.asarray(X_test, dtype=np.float32)
y_test = np.asarray(y_test, dtype=np.int64)

print('X_train.shape', X_train.shape)
print('y_train.shape', y_train.shape)
print('X_val.shape', X_val.shape)
print('y_val.shape', y_val.shape)
print('X_test.shape', X_test.shape)
print('y_test.shape', y_test.shape)
print('X %s   y %s' % (X_test.dtype, y_test.dtype))

sess = tf.InteractiveSession()

batch_size = 128
x = tf.placeholder(tf.float32, shape=[batch_size, 28, 28, 1])
y_ = tf.placeholder(tf.int64, shape=[batch_size,])

def inference(x, is_train, reuse=None):
    gamma_init = tf.random_normal_initializer(1., 0.02)
    with tf.variable_scope("CNN", reuse=reuse):
        tl.layers.set_name_reuse(reuse)
        network = tl.layers.InputLayer(x, name='input_layer')
        network = tl.layers.Conv2d(network, n_filter=32, filter_size=(5, 5), strides=(1, 1),
                act=None, b_init=None, padding='SAME', name='cnn_layer1')
        network = tl.layers.BatchNormLayer(network, act=tf.nn.relu,
                gamma_init=gamma_init, is_train=is_train, name='batch1')

        network = tl.layers.MaxPool2d(network, filter_size=(2, 2), strides=(2, 2),
                padding='SAME', name='pool_layer1')
        network = tl.layers.Conv2d(network, n_filter=64, filter_size=(5, 5), strides=(1, 1),
                act=None, b_init=None, padding='SAME', name='cnn_layer2')
        network = tl.layers.BatchNormLayer(network, act=tf.nn.relu,
                gamma_init=gamma_init, is_train=is_train, name='batch2')

        network = tl.layers.MaxPool2d(network, filter_size=(2, 2), strides=(2, 2),
                padding='SAME', name='pool_layer2')
        ## end of conv
        network = tl.layers.FlattenLayer(network, name='flatten_layer')
        if is_train:
            network = tl.layers.DropoutLayer(network, keep=0.5, is_fix=True, name='drop1')
        network = tl.layers.DenseLayer(network, n_units=256,
                                        act = tf.nn.relu, name='relu1')
        if is_train:
            network = tl.layers.DropoutLayer(network, keep=0.5, is_fix=True, name='drop2')
        network = tl.layers.DenseLayer(network, n_units=10,
                                        act = tf.identity, name='output_layer')   
    return network


# train phase
network = inference(x, is_train=True, reuse=False)
y = network.outputs
cost = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(y, y_))
correct_prediction = tf.equal(tf.argmax(y, 1), y_)
acc = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# test phase
network_test = inference(x, is_train=False, reuse=True)
y_t = network_test.outputs
cost_t = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(y_t, y_))
correct_prediction = tf.equal(tf.argmax(y_t, 1), y_)
acc_t = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))


# train
if is_test_only:
    n_epoch = 1
else:
    n_epoch = 1
learning_rate = 0.0001
print_freq = 1

train_params = network.all_params
train_op = tf.train.AdamOptimizer(learning_rate, beta1=0.9, beta2=0.999,
    epsilon=1e-08, use_locking=False).minimize(cost, var_list=train_params)

tl.layers.initialize_global_variables(sess)

if is_test_only:
    load_params = tl.files.load_npz(name='_model_test.npz')
    tl.files.assign_params(sess, load_params, network)

network.print_params(True)
network.print_layers()

print('   learning_rate: %f' % learning_rate)
print('   batch_size: %d' % batch_size)

if not is_test_only:
    for epoch in range(n_epoch):
        start_time = time.time()
        for X_train_a, y_train_a in tl.iterate.minibatches(
                                    X_train, y_train, batch_size, shuffle=True):
            sess.run(train_op, feed_dict={x: X_train_a, y_: y_train_a})

        if epoch + 1 == 1 or (epoch + 1) % print_freq == 0:
            print("Epoch %d of %d took %fs" % (epoch + 1, n_epoch, time.time() - start_time))
            train_loss, train_acc, n_batch = 0, 0, 0
            for X_train_a, y_train_a in tl.iterate.minibatches(
                                    X_train, y_train, batch_size, shuffle=True):
                err, ac = sess.run([cost_t, acc_t], feed_dict={x: X_train_a, y_: y_train_a})
                train_loss += err; train_acc += ac; n_batch += 1
            print("   train loss: %f" % (train_loss/ n_batch))
            print("   train acc: %f" % (train_acc/ n_batch))
            val_loss, val_acc, n_batch = 0, 0, 0
            for X_val_a, y_val_a in tl.iterate.minibatches(
                                        X_val, y_val, batch_size, shuffle=True):
                err, ac = sess.run([cost_t, acc_t], feed_dict={x: X_val_a, y_: y_val_a})
                val_loss += err; val_acc += ac; n_batch += 1
            print("   val loss: %f" % (val_loss/ n_batch))
            print("   val acc: %f" % (val_acc/ n_batch))

print('Evaluation')
test_loss, test_acc, n_batch = 0, 0, 0
for X_test_a, y_test_a in tl.iterate.minibatches(
                            X_test, y_test, batch_size, shuffle=False):
    err, ac = sess.run([cost_t, acc_t], feed_dict={x: X_test_a, y_: y_test_a})
    test_loss += err; test_acc += ac; n_batch += 1
print("   test loss: %f" % (test_loss/n_batch))
print("   test acc: %f" % (test_acc/n_batch))

network.print_params(True)

tl.files.save_npz(network.all_params, name='_model_test.npz', sess=sess)

The text was updated successfully, but these errors were encountered:

zsdonghao · 2017-01-07T13:33:50Z

please follow the lastest implementatiom BatchNormLayer and BatchNormLayer5 in https://github.com/zsdonghao/tensorlayer/blob/master/tensorlayer/layers.py

---previous answer ---
Hi, I just make a commit for TF12, please have a try and let me know if there are any other issues.

class BatchNormLayer5(Layer):   #
    """
    The :class:`BatchNormLayer` class is a normalization layer, see ``tf.nn.batch_normalization`` and ``tf.nn.moments``.

    Batch normalization on fully-connected or convolutional maps.

    Parameters
    -----------
    layer : a :class:`Layer` instance
        The `Layer` class feeding into this layer.
    decay : float
        A decay factor for ExponentialMovingAverage.
    epsilon : float
        A small float number to avoid dividing by 0.
    act : activation function.
    is_train : boolean
        Whether train or inference.
    beta_init : beta initializer
        The initializer for initializing beta
    gamma_init : gamma initializer
        The initializer for initializing gamma
    name : a string or None
        An optional name to attach to this layer.

    References
    ----------
    - `Source <https://github.com/ry/tensorflow-resnet/blob/master/resnet.py>`_
    - `stackoverflow <http://stackoverflow.com/questions/38312668/how-does-one-do-inference-with-batch-normalization-with-tensor-flow>`_
    """
    def __init__(
        self,
        layer = None,
        decay = 0.999,
        epsilon = 0.00001,
        act = tf.identity,
        is_train = False,
        beta_init = tf.zeros_initializer,
        # gamma_init = tf.ones_initializer,
        gamma_init = tf.random_normal_initializer(mean=1.0, stddev=0.002),
        name ='batchnorm_layer',
    ):
        Layer.__init__(self, name=name)
        self.inputs = layer.outputs
        print("  tensorlayer:Instantiate BatchNormLayer %s: decay: %f, epsilon: %f, act: %s, is_train: %s" %
                            (self.name, decay, epsilon, act.__name__, is_train))
        x_shape = self.inputs.get_shape()
        params_shape = x_shape[-1:]

        from tensorflow.python.training import moving_averages
        from tensorflow.python.ops import control_flow_ops

        with tf.variable_scope(name) as vs:
            axis = list(range(len(x_shape) - 1))

            ## 1. beta, gamma
            beta = tf.get_variable('beta', shape=params_shape,
                               initializer=beta_init,
                               trainable=is_train)#, restore=restore)

            gamma = tf.get_variable('gamma', shape=params_shape,
                                initializer=gamma_init, trainable=is_train,
                                )#restore=restore)

            ## 2. moving variables during training (not update by gradient!)
            moving_mean = tf.get_variable('moving_mean',
                                      params_shape,
                                      initializer=tf.zeros_initializer,
                                      trainable=False,)#   restore=restore)
            moving_variance = tf.get_variable('moving_variance',
                                          params_shape,
                                          initializer=tf.constant_initializer(1.),
                                          trainable=False,)#   restore=restore)

            batch_mean, batch_var = tf.nn.moments(self.inputs, axis)
            ## 3.
            # These ops will only be preformed when training.
            def mean_var_with_update():
                try:    # TF12
                    update_moving_mean = moving_averages.assign_moving_average(
                                    moving_mean, batch_mean, decay, zero_debias=False)     # if zero_debias=True, has bias
                    update_moving_variance = moving_averages.assign_moving_average(
                                    moving_variance, batch_var, decay, zero_debias=False) # if zero_debias=True, has bias
                    # print("TF12 moving")
                except Exception as e:  # TF11
                    update_moving_mean = moving_averages.assign_moving_average(
                                    moving_mean, batch_mean, decay)
                    update_moving_variance = moving_averages.assign_moving_average(
                                    moving_variance, batch_var, decay)
                    # print("TF11 moving")

                with tf.control_dependencies([update_moving_mean, update_moving_variance]):
                    # return tf.identity(update_moving_mean), tf.identity(update_moving_variance)
                    return tf.identity(batch_mean), tf.identity(batch_var)

            if is_train:
                mean, var = mean_var_with_update()
            else:
                mean, var = (batch_mean, batch_var) # hao

            normed = tf.nn.batch_normalization(
              x=self.inputs,
              mean=mean,
              variance=var,
              offset=beta,
              scale=gamma,
              variance_epsilon=epsilon,
              name="tf_bn"
            )
            self.outputs = act( normed )
            variables = [beta, gamma]

        self.all_layers = list(layer.all_layers)
        self.all_params = list(layer.all_params)
        self.all_drop = dict(layer.all_drop)
        self.all_layers.extend( [self.outputs] )
        self.all_params.extend( variables )

zsdonghao mentioned this issue Jan 7, 2017

Will tf.python.control_flow_ops.cond in BatchNormLayer be deprecated? #56

Closed

zsdonghao closed this as completed Jan 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why accuracy of CNN with BatchNormLayer will change slightly after restoring? #57

Why accuracy of CNN with BatchNormLayer will change slightly after restoring? #57

wagamamaz commented Jan 7, 2017

zsdonghao commented Jan 7, 2017 •

edited

Loading

Why accuracy of CNN with BatchNormLayer will change slightly after restoring? #57

Why accuracy of CNN with BatchNormLayer will change slightly after restoring? #57

Comments

wagamamaz commented Jan 7, 2017

zsdonghao commented Jan 7, 2017 • edited Loading

zsdonghao commented Jan 7, 2017 •

edited

Loading