Inference time 16 bit vs 32 bit #22

nikhilchh · 2021-12-08T11:43:06Z

In the video.py file i hardcoded half = False so as to avoid any conversion to half precision.
But still the inference time was same ?

Is it because the weights by default are half precision and therefore I cannot measure how much time your model will take if it was 32 bit ?

wmcnally · 2021-12-08T13:58:39Z

full precision is the default setting. To run with half precision, pass the argument --half

nikhilchh · 2021-12-08T14:23:39Z

with and without --half the model takes same time for a forward pass.

Isn't the time for forward pass suppose to reduce with --half ?

GPU: GeForce RTX 3080

wmcnally · 2021-12-08T14:31:27Z

I thought so too, but I got the same result on my Titan Xp.

nikhilchh · 2021-12-08T14:33:27Z

So:

Is it because the weights by default are half precision and therefore I cannot measure how much time the model will take if it was 32 bit ?

wmcnally · 2021-12-08T14:37:25Z

I wouldn't think so, but maybe? You could test it by instantiating a new model with half and full precision in the main of yolo.py and time the forward pass there.

nikhilchh · 2021-12-08T15:53:54Z

I tried that now:

    import time

    device = torch.device("cuda")
    model = Model(opt.cfg, 3, 18, None, 34).to(device)
    model.half()
    img = torch.randn(1, 3, 1280, 1280).to(device)
    t0 = time.time()
    
    y = model(img.half())
    t1 = time.time()
    print("Forward pass took :", t1-t0)

Observations:
1-Time is same in both cases (half and without half)
2- Time is very high (.23 second). Expected was .0015. Correction : Expectation is .015 second. i.e. 15 ms

wmcnally · 2021-12-08T15:55:41Z

The first pass is always slow. Ignore the first pass and then average over many passes (e.g., 100) using a for loop. Why are you expecting 1.5 ms?

nikhilchh · 2021-12-08T15:56:23Z

Sorry. I was expecting 15 ms.
Let me loop it

nikhilchh · 2021-12-08T16:00:14Z

So its 11 ms for half and 20ms for 32 bit.

wmcnally · 2021-12-08T16:12:25Z

Interesting. On my Titan Xp I got 20ms for half and 24 ms for full, so your hypothesis might be correct. Note that my numbers reported here are slower than what's reported in Table 1 because for Table 1 it's computed using rectangular images (which are only 1280 on one side and the other side is smaller).

nikhilchh · 2021-12-08T16:27:59Z

What i reported is for Kapao-S.

Another question.
In table 1:
Are all models using same dtype ? Or Kapao with 16 bit and DEKR and HighHRNET 32 bit (to maintain their accuracy numbers)?

wmcnally · 2021-12-08T16:45:36Z

All models in Table 1 were evaluated using float32. When I use the --half argument with val.py I actually get a slower forward pass time on my Titan Xp.. not sure why.

nikhilchh closed this as completed Dec 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference time 16 bit vs 32 bit #22

Inference time 16 bit vs 32 bit #22

nikhilchh commented Dec 8, 2021

wmcnally commented Dec 8, 2021 •

edited

nikhilchh commented Dec 8, 2021

wmcnally commented Dec 8, 2021

nikhilchh commented Dec 8, 2021

wmcnally commented Dec 8, 2021 •

edited

nikhilchh commented Dec 8, 2021 •

edited

wmcnally commented Dec 8, 2021

nikhilchh commented Dec 8, 2021

nikhilchh commented Dec 8, 2021

wmcnally commented Dec 8, 2021

nikhilchh commented Dec 8, 2021

wmcnally commented Dec 8, 2021

Inference time 16 bit vs 32 bit #22

Inference time 16 bit vs 32 bit #22

Comments

nikhilchh commented Dec 8, 2021

wmcnally commented Dec 8, 2021 • edited

nikhilchh commented Dec 8, 2021

wmcnally commented Dec 8, 2021

nikhilchh commented Dec 8, 2021

wmcnally commented Dec 8, 2021 • edited

nikhilchh commented Dec 8, 2021 • edited

wmcnally commented Dec 8, 2021

nikhilchh commented Dec 8, 2021

nikhilchh commented Dec 8, 2021

wmcnally commented Dec 8, 2021

nikhilchh commented Dec 8, 2021

wmcnally commented Dec 8, 2021

wmcnally commented Dec 8, 2021 •

edited

wmcnally commented Dec 8, 2021 •

edited

nikhilchh commented Dec 8, 2021 •

edited