Inference time jumps for varying batch size #53725

parneetk · 2022-01-11T15:36:08Z

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes [Link below]
OS Platform and Distribution (Ubuntu 18.04):
TensorFlow installed from (source or binary): binary
TensorFlow version: Tested on 2.6, 2.4
Python version: 3.6
CUDA/cuDNN version: Tested on 1) CUDA 10.1+cudnn 7, 2) CUDA 11.1 + cudnn 8
GPU model and memory: Tested on 1) Tesla P100, 16GB , 2) Quadro P4000, 8GB

Current behavior

When running classifier inference on GPU, if an input batch size is seen for the first time, the inference time is more than expected. In subsequent runs for the same input batch size, the inference time reduces. When the inference time jump is observed, the load shifts to CPU (GPU usage drops in nvidia-smi) while on subsequent inferences the load is on GPU.

Example 1:

For a random batch size, the inference time on run 2 reduces because it is seen in run 1. In run 3, all the batch sizes are seen in run 2 and their inference time reduces.

Example 2:

For 10 random batch sizes, the inference time in run 2 reduces because all these batches are seen in run 1. In run 3, all the batch sizes are seen in run 2 and their inference time reduces.

Why is this relevant?

Suppose we have a video sequence with varying number of objects every few frames (i.e., the batch size=number of objects varies every few frames). Every time there are total number of objects in a frame that have not been seen before, there is a jump in inference time. For example, if there are 10 objects in the first 30 frames and then there are 8 objects in the next frame, an inference time jump is observed for this frame. In a product with real time expectations, this can have system level implementations on other modules.

Expected behavior

Inference time jumps should not be observed with varying batch sizes as it can have system level implications. This behavior is not observed with PyTorch.
An easy solution is to run classifier inference with all batch sizes on dummy images during initialization. For example, we can run inference for batch sizes from 1 to 64 if maximum expected objects are 64. However, that's more of a hack. I am interested in understanding the reason behind this issue. Looks like it has something to do with memory allocation - but why is it dependent on batch size? Is there a better Tensorflow configuration or inference function or memory allocation that can help resolve it?

Standalone code to reproduce the issue

Jupyter notebook in Colab:
https://colab.research.google.com/drive/1fHy3HcrYBskMLy-nNn12bbwBI-QIxg1c?usp=sharing

This notebook uses model() for inference. Using model.predict(), gives similar results.

To enable GPU in Colab, select GPU

Runtime -> Change Runtime Type -> Hardware Accelerator -> Select GPU from drop down

The text was updated successfully, but these errors were encountered:

sushreebarsa · 2022-01-12T07:58:35Z

@parneetk I tried to replicate this issue on colab using TF v2.7.0 , tf-nightly(2.9.0-dev20220111) and faced different error in
tf-nightly, please find the gist here for reference. Please confirm the same.Thanks!

parneetk · 2022-01-13T20:48:52Z

@sushreebarsa Thank you for looking into this!
I did not install tf-nightly and used default TF of colab environment (v2.7.0). Not sure why colab shows an error for you and not me. I verified again and it seems to run without any errors.

parneetk added the type:performance Performance Issue label Jan 11, 2022

google-ml-butler bot assigned sushreebarsa Jan 11, 2022

sushreebarsa added 2.6.0 comp:gpu GPU related issues labels Jan 12, 2022

sushreebarsa added the stat:awaiting response Status - Awaiting response from author label Jan 12, 2022

tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Jan 17, 2022

sushreebarsa assigned Saduf2019 and unassigned sushreebarsa Jan 18, 2022

Saduf2019 assigned jvishnuvardhan and unassigned Saduf2019 Jan 19, 2022

jvishnuvardhan assigned sanjoy and unassigned jvishnuvardhan Jan 31, 2022

jvishnuvardhan added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jan 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference time jumps for varying batch size #53725

Inference time jumps for varying batch size #53725

parneetk commented Jan 11, 2022 •

edited

sushreebarsa commented Jan 12, 2022

parneetk commented Jan 13, 2022 •

edited

Inference time jumps for varying batch size #53725

Inference time jumps for varying batch size #53725

Comments

parneetk commented Jan 11, 2022 • edited

sushreebarsa commented Jan 12, 2022

parneetk commented Jan 13, 2022 • edited

parneetk commented Jan 11, 2022 •

edited

parneetk commented Jan 13, 2022 •

edited