-
Notifications
You must be signed in to change notification settings - Fork 45.3k
Description
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- I am using the latest TensorFlow Model Garden release and TensorFlow 2.
- I am reporting the issue to the correct repository. (Model Garden official or research directory)
- I checked to make sure that this issue has not already been filed.
1. The entire URL of the file you are using
https://github.com/tensorflow/models/tree/master/research/object_detection
2. Describe the bug
When evaluating a retrained Mask R-CNN model, I noticed minimal differences in a few of the predicted masks and in detection scores (order: 1e-06) between the training model and the frozen inference model. I dug a bit deeper and found out that the differences are caused by the FusedBatchNorm operation which behaves differently between tf.Variable and tf.constant. Variables are used for the mean and variance parameters for FusedBatchNorm in the training model, whereas constants are used for those parameters in the inference model.
3. Steps to reproduce
This is a simplified example demonstrating the different behaviors:
import tensorflow.compat.v1 as tf
from tensorflow.python.ops import gen_nn_ops
mean_value = 0.000001
variance_value = 0.999999
inputs = [[[[0.999999]]]]
gamma = [0.999999]
beta = [0.999999]
mean_constant = tf.constant([mean_value])
variance_constant = tf.constant([variance_value])
mean_variable = tf.Variable([mean_value])
variance_variable = tf.Variable([variance_value])
y = gen_nn_ops.fused_batch_norm_v3(inputs, gamma, beta, mean_constant,
variance_constant, epsilon=1.001e-05, is_training=False)
sess = tf.Session()
y_constant = sess.run(y[0])
y = gen_nn_ops.fused_batch_norm_v3(inputs, gamma, beta, mean_variable,
variance_variable, epsilon=1.001e-05, is_training=False)
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
y_variable = sess.run(y[0])
print (y_constant)
# 1.9999915
print (y_variable)
# 1.9999914
4. Expected behavior
There are no differences in results for FusedBatchNorm between constants and variables.
5. Additional context
The difference caused by the FusedBatchNorm operation may be even one decimal place higher with real data. Note that there is no difference when enabling eager execution, thus also in TensorFlow 2. But in TensorFlow 1.15.0, on which object detection relies on, it affects all models using the operation (e.g. FirstStageFeatureExtractor/resnet_v1_101/conv1/BatchNorm/FusedBatchNormV3), and also previous versions using FusedBatchNorm or FusedBatchNormV2. In my tests with Mask R-CNN, the effect was only barely noticeable for images in the COCO mask metrics (segm), but larger in the COCO detection metrics (bbox).
This question might be related:
https://stackoverflow.com/questions/52843778/tensorflow-tf-nn-conv2d-giving-different-results-for-variable-vs-constant
6. System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10
- Mobile device name if the issue happens on a mobile device: -
- TensorFlow installed from (source or binary): binary via pip
- TensorFlow version (use command below): 1.15.0
- Python version: 3.7.3
- Bazel version (if compiling from source): -
- GCC/Compiler version (if compiling from source): -
- CUDA/cuDNN version: 10.0/7.6.0
- GPU model and memory: GTX 1080