Skip to content

Different results for FusedBatchNorm between constants and variables #8624

@sraimund

Description

@sraimund

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am using the latest TensorFlow Model Garden release and TensorFlow 2.
  • I am reporting the issue to the correct repository. (Model Garden official or research directory)
  • I checked to make sure that this issue has not already been filed.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/tree/master/research/object_detection

2. Describe the bug

When evaluating a retrained Mask R-CNN model, I noticed minimal differences in a few of the predicted masks and in detection scores (order: 1e-06) between the training model and the frozen inference model. I dug a bit deeper and found out that the differences are caused by the FusedBatchNorm operation which behaves differently between tf.Variable and tf.constant. Variables are used for the mean and variance parameters for FusedBatchNorm in the training model, whereas constants are used for those parameters in the inference model.

3. Steps to reproduce

This is a simplified example demonstrating the different behaviors:

import tensorflow.compat.v1 as tf
from tensorflow.python.ops import gen_nn_ops

mean_value = 0.000001
variance_value = 0.999999

inputs = [[[[0.999999]]]]
gamma = [0.999999]
beta = [0.999999]
mean_constant = tf.constant([mean_value])
variance_constant = tf.constant([variance_value])
mean_variable = tf.Variable([mean_value])
variance_variable = tf.Variable([variance_value])

y = gen_nn_ops.fused_batch_norm_v3(inputs, gamma, beta, mean_constant, 
    variance_constant, epsilon=1.001e-05, is_training=False)

sess = tf.Session()
y_constant = sess.run(y[0])

y = gen_nn_ops.fused_batch_norm_v3(inputs, gamma, beta, mean_variable, 
    variance_variable, epsilon=1.001e-05, is_training=False)

sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
y_variable = sess.run(y[0])

print (y_constant)
# 1.9999915
print (y_variable)
# 1.9999914

4. Expected behavior

There are no differences in results for FusedBatchNorm between constants and variables.

5. Additional context

The difference caused by the FusedBatchNorm operation may be even one decimal place higher with real data. Note that there is no difference when enabling eager execution, thus also in TensorFlow 2. But in TensorFlow 1.15.0, on which object detection relies on, it affects all models using the operation (e.g. FirstStageFeatureExtractor/resnet_v1_101/conv1/BatchNorm/FusedBatchNormV3), and also previous versions using FusedBatchNorm or FusedBatchNormV2. In my tests with Mask R-CNN, the effect was only barely noticeable for images in the COCO mask metrics (segm), but larger in the COCO detection metrics (bbox).

This question might be related:
https://stackoverflow.com/questions/52843778/tensorflow-tf-nn-conv2d-giving-different-results-for-variable-vs-constant

6. System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10
  • Mobile device name if the issue happens on a mobile device: -
  • TensorFlow installed from (source or binary): binary via pip
  • TensorFlow version (use command below): 1.15.0
  • Python version: 3.7.3
  • Bazel version (if compiling from source): -
  • GCC/Compiler version (if compiling from source): -
  • CUDA/cuDNN version: 10.0/7.6.0
  • GPU model and memory: GTX 1080

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions