Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add batch inference #8721

Closed
1 of 5 tasks
Burhan-Q opened this issue Mar 6, 2024 · 11 comments · Fixed by #8817
Closed
1 of 5 tasks

Add batch inference #8721

Burhan-Q opened this issue Mar 6, 2024 · 11 comments · Fixed by #8817
Assignees
Labels
enhancement New feature or request TODO Items that needs completing

Comments

@Burhan-Q
Copy link
Member

Burhan-Q commented Mar 6, 2024

Search before asking

  • I have searched the YOLOv8 issues and found no similar feature requests.

Description

Related to PR #8058

Related comment from @glenn-jocher

...yes some predictions sources will run in batches, but I think a main one that's missing is the glob or directory inference, though txt list of sources may also be missing. Yes please open an issue and tag us in it along with @adrianboguszewski from Intel. Thanks!

Use case

Note

Currently batch inference is supported for certain types of input sources, this issue is to include the additional sources listed below

Example of existing batch inference support

import cv2 as cv
from ultralytics import YOLO

model = YOLO("yolov8n.pt")
im1 = "ultralytics/assets/bus.jpg"
im2 = "ultralytics/assets/zidane.jpg"

img1 = cv.imread(im1)
img2 = cv.imread(im2)

# Multi-image
results = model.predict(source=[img1, img2, img1, img1, img2, img1, img2, img2],)

>>> 0: 640x640 4 persons, 1 bus, 
1: 640x640 2 persons, 1 tie, 
2: 640x640 4 persons, 1 bus, 
3: 640x640 4 persons, 1 bus, 
4: 640x640 2 persons, 1 tie, 
5: 640x640 4 persons, 1 bus, 
6: 640x640 2 persons, 1 tie, 
7: 640x640 2 persons, 1 tie, 
155.0ms

[r.speed for r in results]

>>> [
{'preprocess': 2.5001466274261475, 'inference': 19.374817609786987, 'postprocess': 10.336160659790039}, 
{'preprocess': 2.5001466274261475, 'inference': 19.374817609786987, 'postprocess': 10.336160659790039}, 
{'preprocess': 2.5001466274261475, 'inference': 19.374817609786987, 'postprocess': 10.336160659790039}, 
{'preprocess': 2.5001466274261475, 'inference': 19.374817609786987, 'postprocess': 10.336160659790039}, 
{'preprocess': 2.5001466274261475, 'inference': 19.374817609786987, 'postprocess': 10.336160659790039}, 
{'preprocess': 2.5001466274261475, 'inference': 19.374817609786987, 'postprocess': 10.336160659790039}, 
{'preprocess': 2.5001466274261475, 'inference': 19.374817609786987, 'postprocess': 10.336160659790039}, 
{'preprocess': 2.5001466274261475, 'inference': 19.374817609786987, 'postprocess': 10.336160659790039}
]

# Single image
results2 = model.predict(source=[img1,])

>>> 0: 640x480 4 persons, 1 bus, 1 stop sign, 192.1ms

results2[0].speed
>>> {'preprocess': 2.998828887939453, 'inference': 192.11935997009277, 'postprocess': 8.776426315307617}

Add batch inference for (at least) the following sources:

  • glob or directories
  • list of text sources
  • ...

Additional

No response

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!
@Burhan-Q Burhan-Q added TODO Items that needs completing enhancement New feature or request labels Mar 6, 2024
@tienhoang1994
Copy link

image
batch predict has no effect with my test like this (i test same video input so frame input already in same shape, but the document said that it auto set batchzise if i put list of image as input @glenn-jocher

@tienhoang1994
Copy link

i also tried with list.streams yolov8 provided, it also no improvement in both speed and hardware consuming

@Burhan-Q
Copy link
Member Author

Burhan-Q commented Mar 7, 2024

@tienhoang1994 you have 9 frames that complete processing in ~90 ms. Now you should test the time for inference on a single frame to compare like the example from the issue comment.

This issue was opened to track the work towards supporting additional batch inference sources. You can follow here to check progress but currently not all sources will support batch inference.

@tienhoang1994
Copy link

tienhoang1994 commented Mar 7, 2024 via email

@glenn-jocher
Copy link
Member

Thanks for the update, @tienhoang1994! 🚀 If you're seeing ~10ms for a single frame and ~90ms for 9 frames, it seems like the batch processing is indeed working as expected, offering a more efficient throughput compared to single-frame processing. The slight overhead might be due to initial setup or IO delays which can be amortized over larger batch sizes. If you have specific performance targets or further questions, feel free to share!

@Burhan-Q
Copy link
Member Author

Burhan-Q commented Mar 7, 2024

@tienhoang1994 the screenshot you shared shows inference for all 9 frames input as a batch.
image

The ~9-10 ms inference is for each image in the batch. What I was saying and showed my initial comment is that the total batch inference time was 155.0ms however when passing only a single image, the inference time was 192.1ms. If you see my initial comment and expand the section ► Example of existing batch inference support to see the full details of the results I'm talking about

@Laughing-q Laughing-q self-assigned this Mar 8, 2024
@tienhoang1994
Copy link

tienhoang1994 commented Mar 8, 2024

@Burhan-Q ► Example of existing batch inference support you mentioned above is exactly what i expect. Total inference time batch of 8 imgs mostly same as inference 1 img(155 - 192ms).
image
i have tested again for you to see my result. it seem like batch not run all frame in parallel, and same as loop 9 time of predict single img(~90ms for batch of 9 imgs vs ~10ms for single). Thank you guys for your support, plz correct me if i have misunderstanding
image

@Burhan-Q
Copy link
Member Author

Burhan-Q commented Mar 8, 2024

@tienhoang1994 I follow your logic and results. I think maybe there is some subjectivity in the assessment of an appreciable difference. If I look at the initial batch results (your first comment) with a total of ~95ms inference versus the looping result (you most recent comment) at ~115ms, this is a difference of 20ms and ~17% faster. It's not a lot, but to me, it's measurably faster.

On my system, I think the difference is more noticeable, as inference on one (1) image was 192ms yet the 8-image batch inference was 155ms. This processes 8x images in ~20% faster time than a single image. Rechecking, here are my results but using timeit.repeat:

Code

import timeit
import cv2 as cv
from ultralytics import YOLO
from functools import partial

model = YOLO("yolov8n.pt")

im1 = "ultralytics/assets/bus.jpg"
im2 = "ultralytics/assets/zidane.jpg"

img1 = cv.imread(im1)
img2 = cv.imread(im2)

p1 = partial(model.predict,(im1,)) # 1-image batch
p2 = partial(model.predict, [img1, img2, img1, img1, img2, img1, img2, img2]) # 8-image batch

timeit.repeat(p1, repeat=3, number=3)
timeit.repeat(p2, repeat=3, number=3)

1 image batch inference

screenshot 1 image batch

image

  • first iteration inference time is 141ms
  • average of remaining 7 repeats is ~8ms

8 image batch inference

screenshot 8 image batch

image

  • first iteration inference time is 152ms
  • average of remaining 7 repeats is ~15ms

Ignoring the initial "slow" (warmup) result, the 8-image batch takes just under 2x longer than the 1-image batch in my recent test, but it is also processing 8x input data. Calculating the inference speed of individual images in the 8-image batch, I'll use the final 8-image batch inference time, I'll divide the total inference time (14.6ms) by the number of images (8) which is (1.8ms) per image for batch inference. Looking at the (last entry) 1-image batch inference time (8.6ms), you can see that the per-image inference time for the batch of 8 images is much faster.

@tienhoang1994
Copy link

tienhoang1994 commented Mar 12, 2024

sorry for wasting your time. maybe i found the problem is about GPU.
image
on the left - weak GPU 1060 i am using on this Post has 7ms per img vs 5.6ms per img in batch of 8 (which i dont feel the power of batch inference)
on the right - strong GPU 3090 has 5.1ms per img vs 1.1ms per img in batch of 8 (much better)

@glenn-jocher
Copy link
Member

@tienhoang1994 hey there! No worries at all, you're not wasting our time. We're here to help! 😊 It looks like you've made an interesting observation regarding the impact of GPU capabilities on batch inference performance. Indeed, the difference in processing power between a GTX 1060 and an RTX 3090 can significantly affect the efficiency gains from batch processing.

The RTX 3090, with its higher compute capability and memory bandwidth, can better leverage parallel processing, making the advantages of batch inference more pronounced. On the other hand, the GTX 1060, while still a capable GPU, might not exhibit as dramatic improvements due to its hardware limitations.

Here's a quick example to illustrate how you might adjust batch sizes based on your GPU's capabilities:

from ultralytics import YOLO

# Load your model
model = YOLO('yolov8n.pt')

# Define your source
source = ['path/to/image1.jpg', 'path/to/image2.jpg']  # and so on...

# Predict with batch inference
results = model.predict(source=source)

Remember, finding the optimal batch size for your specific hardware setup can maximize your inference efficiency. Keep experimenting, and thanks for sharing your findings! 👍

@Burhan-Q Burhan-Q linked a pull request Mar 12, 2024 that will close this issue
@Burhan-Q
Copy link
Member Author

Additional batch inferencing included with #8817

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request TODO Items that needs completing
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants