Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference time 16 bit vs 32 bit #22

Closed
nikhilchh opened this issue Dec 8, 2021 · 12 comments
Closed

Inference time 16 bit vs 32 bit #22

nikhilchh opened this issue Dec 8, 2021 · 12 comments

Comments

@nikhilchh
Copy link

In the video.py file i hardcoded half = False so as to avoid any conversion to half precision.
But still the inference time was same ?

Is it because the weights by default are half precision and therefore I cannot measure how much time your model will take if it was 32 bit ?

@wmcnally
Copy link
Owner

wmcnally commented Dec 8, 2021

full precision is the default setting. To run with half precision, pass the argument --half

@nikhilchh
Copy link
Author

with and without --half the model takes same time for a forward pass.

Isn't the time for forward pass suppose to reduce with --half ?

GPU: GeForce RTX 3080

@wmcnally
Copy link
Owner

wmcnally commented Dec 8, 2021

I thought so too, but I got the same result on my Titan Xp.

@nikhilchh
Copy link
Author

So:

Is it because the weights by default are half precision and therefore I cannot measure how much time the model will take if it was 32 bit ?

@wmcnally
Copy link
Owner

wmcnally commented Dec 8, 2021

I wouldn't think so, but maybe? You could test it by instantiating a new model with half and full precision in the main of yolo.py and time the forward pass there.

@nikhilchh
Copy link
Author

nikhilchh commented Dec 8, 2021

I tried that now:

    import time

    device = torch.device("cuda")
    model = Model(opt.cfg, 3, 18, None, 34).to(device)
    model.half()
    img = torch.randn(1, 3, 1280, 1280).to(device)
    t0 = time.time()
    
    y = model(img.half())
    t1 = time.time()
    print("Forward pass took :", t1-t0)

Observations:
1-Time is same in both cases (half and without half)
2- Time is very high (.23 second). Expected was .0015. Correction : Expectation is .015 second. i.e. 15 ms

@wmcnally
Copy link
Owner

wmcnally commented Dec 8, 2021

The first pass is always slow. Ignore the first pass and then average over many passes (e.g., 100) using a for loop. Why are you expecting 1.5 ms?

@nikhilchh
Copy link
Author

Sorry. I was expecting 15 ms.
Let me loop it

@nikhilchh
Copy link
Author

So its 11 ms for half and 20ms for 32 bit.

@wmcnally
Copy link
Owner

wmcnally commented Dec 8, 2021

Interesting. On my Titan Xp I got 20ms for half and 24 ms for full, so your hypothesis might be correct. Note that my numbers reported here are slower than what's reported in Table 1 because for Table 1 it's computed using rectangular images (which are only 1280 on one side and the other side is smaller).

@nikhilchh
Copy link
Author

What i reported is for Kapao-S.

Another question.
In table 1:
Are all models using same dtype ? Or Kapao with 16 bit and DEKR and HighHRNET 32 bit (to maintain their accuracy numbers)?

@wmcnally
Copy link
Owner

wmcnally commented Dec 8, 2021

All models in Table 1 were evaluated using float32. When I use the --half argument with val.py I actually get a slower forward pass time on my Titan Xp.. not sure why.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants