Automatic mixed precision (AMP) training is now natively supported and a stable feature. #557

Lornatang · 2020-07-30T03:18:12Z

🚀 Feature

AMP allows users to easily enable automatic mixed precision training enabling higher performance and memory savings of up to 50% on Tensor Core GPUs. Using the natively supported torch.cuda.amp API, AMP provides convenience methods for mixed precision, where some operations use the torch.float32 (float) datatype and other operations use torch.float16 (half). Some ops, like linear layers and convolutions, are much faster in float16. Other ops, like reductions, often require the dynamic range of float32. Mixed precision tries to match each op to its appropriate datatype.

Official example

Motivation

In PyTorch1.6, the mixed precision calculation has been integrated, and there is no need to download the Nvdia/apex library.

Pitch

Update the code in training, remove apex.

Alternatives

No changes on the original basis.

Additional context

Refer to my recently updated PR.

glenn-jocher · 2020-07-30T19:18:03Z

@Lornatang yes thank you, this is a worthy addition. I see your PR!

rafale77 · 2020-08-27T05:36:00Z

Just curious about this: Is there any gain or difference between inferring/evaluating with AMP vs. fp16 since the models appear to be trained with AMP? Would the scores be better? why have all the evaluation results run on fp16?

glenn-jocher · 2020-08-27T17:55:09Z

@rafale77 the models are saved as FP16, so any checkpoint that is saved and loaded won't have any FP32 values.

rafale77 · 2020-08-27T18:44:27Z

I see. There is therefore no point of inferring with AMP. Thanks for the quick answer.

glenn-jocher · 2020-08-27T19:09:38Z

@rafale77 sort of.

test.py serves a dual purpose: standlone mAP (loading a checkpoint from hard drive), and mAP during training (accepting a model as an argument, called by train.py).

detect.py only ever loads models from the hard drive as FP16.

An added complication is CPU inference on both, which requires FP32, currently and for the foreseeable future.

If there is a simpler solution to handle these various cases, I'm open to ideas. We did do mAP comparison tests before adopting FP16 as the native checkpointing standard though, and we observed no mAP difference in our mAP and the pycocotools results.

EDIT: In the CPU case, models are converted to .float() before inference.

rafale77 · 2020-08-28T00:41:45Z

Yes I understand how test.py is used. I was just looking to see if there is any benefit to running the model with AMP since I saw that the detect.py was running with fp16. If the models are saved in fp16 then any precision loss, if any, would have already been incurred and upscaling to fp32 would just be a waste of memory for no benefit I suppose. Training could likely still benefit some before the pre-tained model is saved...
My setup is not well suited for testing so I couldn't verify this, that's why I was asking. I could only see that AMP increases the memory usage for inference.

lucasjinreal · 2020-08-28T09:00:39Z

@rafale77 How does the model infer in fp16 mode in some GPU don't have fp16 support?

rafale77 · 2020-08-28T14:15:23Z

@rafale77 How does the model infer in fp16 mode in some GPU don't have fp16 support?

I don't know of GPU which don't support FP16. If what you mean is the GPUs without tensor cores then indeed you can expect some very poor performance in terms of inference speed vs FP32 so you are better off turning on AMP or artificially upscaling the model and inputs to FP32.

glenn-jocher · 2020-08-28T20:21:39Z

@jinfagang @rafale77 I think all GPUs will see memory savings with FP16 inference. GPUs without tensor cores will not see any speedup however.

I'm not aware of any scenario where FP16 would hurt speed or memory, for any GPU.

rafale77 · 2020-08-28T20:54:41Z

There are actually. For example the 1080Ti: https://www.techpowerup.com/gpu-specs/geforce-gtx-1080-ti.c2877
Has a FP16 computing capability 1/64th of FP32. This was intentionally designed so as not to have them compete with Titan/Tesla units.

glenn-jocher · 2020-08-28T22:20:45Z

@rafale77 oh, so you're saying a 1080Ti card would show slower pytorch inference at model.half() than at model.float()? Have you observed this in practice yourself (or if anyone else has seen this please let us know)?

rafale77 · 2020-08-28T22:38:04Z

I thought this was a well known fact... see this other article:

https://towardsdatascience.com/rtx-2060-vs-gtx-1080ti-in-deep-learning-gpu-benchmarks-cheapest-rtx-vs-most-expensive-gtx-card-cd47cd9931d2

glenn-jocher · 2020-08-28T23:57:13Z

@rafale77 oh, thanks for the link, I did not know that. Well, that's unfortunate. The slowdown doesn't seem to be too bad on the gtx cards though, maybe 10%.

rafale77 · 2020-08-29T00:06:40Z

It depends on what you run. You can see for example the shocking fact that a dual batch size 1080TI in FP16 is slower than a single 1080TI in FP32.

gowna-m · 2020-09-21T05:18:43Z

Just curious about this: Is there any gain or difference between inferring/evaluating with AMP vs. fp16 since the models appear to be trained with AMP? Would the scores be better? why have all the evaluation results run on fp16?
the models are saved as FP16, so any checkpoint that is saved and loaded won't have any FP32 values.

Could you get me a better overview on this.
@glenn-jocher @rafale77 Does training with AMP reduce only the training time and the memory usage OR does it have any impact on the latency? When not using AMP, will the models be saved as FP32? I have trained my model with AMP, saved the model and inferencing it without AMP (Does this mean I'm inferencing on FP16). Will there be any reduction in the inference time if pruning is added on top of it when using test.py while inferencing?

glenn-jocher · 2020-09-21T20:55:27Z

@gowna-m AMP is enabled by default for all model training on GPU. All YOLOv5 checkpoints are saved in FP16. All GPU inference is performed in FP16. See d4c6674

github-actions · 2020-10-22T00:48:49Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Lornatang added the enhancement New feature or request label Jul 30, 2020

Lornatang mentioned this issue Jul 30, 2020

Update to torch1.6 #558

Closed

Lornatang mentioned this issue Jul 31, 2020

PyTorch 1.6.0 update with native AMP #573

Merged

github-actions bot added the Stale Stale and schedule for closing soon label Oct 22, 2020

github-actions bot closed this as completed Oct 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic mixed precision (AMP) training is now natively supported and a stable feature. #557

Automatic mixed precision (AMP) training is now natively supported and a stable feature. #557

Lornatang commented Jul 30, 2020

glenn-jocher commented Jul 30, 2020

rafale77 commented Aug 27, 2020

glenn-jocher commented Aug 27, 2020

rafale77 commented Aug 27, 2020

glenn-jocher commented Aug 27, 2020 •

edited

Loading

rafale77 commented Aug 28, 2020

lucasjinreal commented Aug 28, 2020

rafale77 commented Aug 28, 2020

glenn-jocher commented Aug 28, 2020

rafale77 commented Aug 28, 2020

glenn-jocher commented Aug 28, 2020

rafale77 commented Aug 28, 2020

glenn-jocher commented Aug 28, 2020

rafale77 commented Aug 29, 2020 •

edited

Loading

gowna-m commented Sep 21, 2020

glenn-jocher commented Sep 21, 2020 •

edited

Loading

github-actions bot commented Oct 22, 2020

Automatic mixed precision (AMP) training is now natively supported and a stable feature. #557

Automatic mixed precision (AMP) training is now natively supported and a stable feature. #557

Comments

Lornatang commented Jul 30, 2020

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

glenn-jocher commented Jul 30, 2020

rafale77 commented Aug 27, 2020

glenn-jocher commented Aug 27, 2020

rafale77 commented Aug 27, 2020

glenn-jocher commented Aug 27, 2020 • edited Loading

rafale77 commented Aug 28, 2020

lucasjinreal commented Aug 28, 2020

rafale77 commented Aug 28, 2020

glenn-jocher commented Aug 28, 2020

rafale77 commented Aug 28, 2020

glenn-jocher commented Aug 28, 2020

rafale77 commented Aug 28, 2020

glenn-jocher commented Aug 28, 2020

rafale77 commented Aug 29, 2020 • edited Loading

gowna-m commented Sep 21, 2020

glenn-jocher commented Sep 21, 2020 • edited Loading

github-actions bot commented Oct 22, 2020

glenn-jocher commented Aug 27, 2020 •

edited

Loading

rafale77 commented Aug 29, 2020 •

edited

Loading

glenn-jocher commented Sep 21, 2020 •

edited

Loading