Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference time jumps for varying batch size #53725

Open
parneetk opened this issue Jan 11, 2022 · 2 comments
Open

Inference time jumps for varying batch size #53725

parneetk opened this issue Jan 11, 2022 · 2 comments
Assignees
Labels
2.6.0 comp:gpu GPU related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower type:performance Performance Issue

Comments

@parneetk
Copy link

parneetk commented Jan 11, 2022

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes [Link below]
  • OS Platform and Distribution (Ubuntu 18.04):
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version: Tested on 2.6, 2.4
  • Python version: 3.6
  • CUDA/cuDNN version: Tested on 1) CUDA 10.1+cudnn 7, 2) CUDA 11.1 + cudnn 8
  • GPU model and memory: Tested on 1) Tesla P100, 16GB , 2) Quadro P4000, 8GB

Current behavior

When running classifier inference on GPU, if an input batch size is seen for the first time, the inference time is more than expected. In subsequent runs for the same input batch size, the inference time reduces. When the inference time jump is observed, the load shifts to CPU (GPU usage drops in nvidia-smi) while on subsequent inferences the load is on GPU.

Example 1:

figure1_tf_expt_1

For a random batch size, the inference time on run 2 reduces because it is seen in run 1. In run 3, all the batch sizes are seen in run 2 and their inference time reduces.

Example 2:

figure2_tf_expt_2

For 10 random batch sizes, the inference time in run 2 reduces because all these batches are seen in run 1. In run 3, all the batch sizes are seen in run 2 and their inference time reduces.

Why is this relevant?

Suppose we have a video sequence with varying number of objects every few frames (i.e., the batch size=number of objects varies every few frames). Every time there are total number of objects in a frame that have not been seen before, there is a jump in inference time. For example, if there are 10 objects in the first 30 frames and then there are 8 objects in the next frame, an inference time jump is observed for this frame. In a product with real time expectations, this can have system level implementations on other modules.

Expected behavior

Inference time jumps should not be observed with varying batch sizes as it can have system level implications. This behavior is not observed with PyTorch.
An easy solution is to run classifier inference with all batch sizes on dummy images during initialization. For example, we can run inference for batch sizes from 1 to 64 if maximum expected objects are 64. However, that's more of a hack. I am interested in understanding the reason behind this issue. Looks like it has something to do with memory allocation - but why is it dependent on batch size? Is there a better Tensorflow configuration or inference function or memory allocation that can help resolve it?

Standalone code to reproduce the issue

Jupyter notebook in Colab:
https://colab.research.google.com/drive/1fHy3HcrYBskMLy-nNn12bbwBI-QIxg1c?usp=sharing

This notebook uses model() for inference. Using model.predict(), gives similar results.

To enable GPU in Colab, select GPU

Runtime -> Change Runtime Type -> Hardware Accelerator -> Select GPU from drop down

@parneetk parneetk added the type:performance Performance Issue label Jan 11, 2022
@sushreebarsa sushreebarsa added 2.6.0 comp:gpu GPU related issues labels Jan 12, 2022
@sushreebarsa
Copy link
Contributor

@parneetk I tried to replicate this issue on colab using TF v2.7.0 , tf-nightly(2.9.0-dev20220111) and faced different error in
tf-nightly, please find the gist here for reference. Please confirm the same.Thanks!

@sushreebarsa sushreebarsa added the stat:awaiting response Status - Awaiting response from author label Jan 12, 2022
@parneetk
Copy link
Author

parneetk commented Jan 13, 2022

@sushreebarsa Thank you for looking into this!
I did not install tf-nightly and used default TF of colab environment (v2.7.0). Not sure why colab shows an error for you and not me. I verified again and it seems to run without any errors.

@tensorflowbutler tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Jan 17, 2022
@jvishnuvardhan jvishnuvardhan added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.6.0 comp:gpu GPU related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower type:performance Performance Issue
Projects
None yet
Development

No branches or pull requests

6 participants