All losses become NaN after about 1 epoch of training #8

jwookyoo · 2021-08-26T09:25:42Z

Hi,

Thank you for sharing this great work!

When I ran the training code, I got NaN for all losses after about 1 epoch of training.
This problem is reproduced whenever I run the training code. (I have tested it three times.)

I followed the same environment setting with anaconda, and also used the same hyper-parameters.
(The only difference is that our PyTorch version is 1.7.1 and yours is 1.7.0, and all other modules are the same as yours.)

Please share your idea about this problem, if you have any. Thanks!

anthonyhu · 2021-08-27T08:55:11Z

Hey!

Interesting. Do you run into the same issue if you first load the weights of the encoder (from FIERY Static, the single-timeframe bird's-eye view model).

To do so, add the following lines in baseline.yml

PRETRAINED:
   LOAD_WEIGHTS: True
   PATH: './static_lift_splat_setting.ckpt'

jwookyoo · 2021-08-31T02:47:42Z

I loaded the weights first following your suggestion, and the training works now (without NaN)! Thanks a lot!!

jwookyoo · 2021-08-31T03:07:36Z

Can I ask one more question? :-) How can I train the FIERY Static weights from scratch?

anthonyhu · 2021-08-31T08:20:32Z

Of course. To train the FIERY Static from scratch, point the training script to the following config: https://github.com/wayveai/fiery/blob/master/fiery/configs/literature/static_lss_setting.yml

jwookyoo · 2021-09-01T05:37:29Z

I see. Thanks a lot!

anthonyhu · 2021-09-01T07:52:57Z

You're welcome!

jwookyoo closed this as completed Aug 31, 2021

jwookyoo reopened this Aug 31, 2021

anthonyhu closed this as completed Sep 1, 2021

anthonyhu mentioned this issue Apr 1, 2022

NAN loss after one epoch #20

Open

kaanakan mentioned this issue May 15, 2022

Reproducing Results on Nuscenes #26

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

All losses become NaN after about 1 epoch of training #8

All losses become NaN after about 1 epoch of training #8

jwookyoo commented Aug 26, 2021

anthonyhu commented Aug 27, 2021

jwookyoo commented Aug 31, 2021

jwookyoo commented Aug 31, 2021

anthonyhu commented Aug 31, 2021

jwookyoo commented Sep 1, 2021

anthonyhu commented Sep 1, 2021

All losses become NaN after about 1 epoch of training #8

All losses become NaN after about 1 epoch of training #8

Comments

jwookyoo commented Aug 26, 2021

anthonyhu commented Aug 27, 2021

jwookyoo commented Aug 31, 2021

jwookyoo commented Aug 31, 2021

anthonyhu commented Aug 31, 2021

jwookyoo commented Sep 1, 2021

anthonyhu commented Sep 1, 2021