RuntimeError: Function 'PowBackward0' returned nan values in its 0th output. #38

m1kit · 2021-05-19T02:28:01Z

During the learning process, the following error occurs and learning is interrupted.

[TRAIN] Iter: 40300 Loss: 0.011321269907057285  PSNR: 23.059185028076172
 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                       | 20356/180001 [1:25:30<11:07:17,  3.99it/s][W python_anomaly_mode.cpp:104] Warning: Error detected in PowBackward0. Traceback of forward call that caused the error:
  File "run_nerf.py", line 858, in <module>
    train()
  File "run_nerf.py", line 751, in train
    img_loss0 = img2mse(extras['rgb0'], target_s)
  File "/app/nerf/run_nerf_helpers.py", line 12, in <lambda>
    img2mse = lambda x, y : torch.mean((x - y) ** 2)
 (function _print_stack)
 11%|█████████████████████▎                                                                                                                                                                       | 20356/180001 [1:25:30<11:10:36,  3.97it/s]
Traceback (most recent call last):
  File "run_nerf.py", line 858, in <module>
    train()
  File "run_nerf.py", line 755, in train
    loss.backward()
  File "/opt/conda/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/opt/conda/lib/python3.8/site-packages/torch/autograd/__init__.py", line 145, in backward
    Variable._execution_engine.run_backward(
RuntimeError: Function 'PowBackward0' returned nan values in its 0th output.

Here's my configuration.

expname = mydata_test
basedir = ./logs
datadir = ./data/nerf_llff_data/mydata
dataset_type = llff

factor = 8
llffhold = 8

N_rand = 1024
N_samples = 64
N_importance = 64

use_viewdirs = True
raw_noise_std = 1e0

The text was updated successfully, but these errors were encountered:

m1kit · 2021-05-19T02:38:03Z

Maybe this can be fixed by adding eps here?

xiaohulihutu · 2021-09-22T09:41:58Z

Have the same issue here, Any solutions?

xiaohulihutu · 2021-09-22T09:42:49Z

Maybe this can be fixed by adding eps here?
Sorry sir, what is eps?

m1kit · 2021-09-23T18:56:17Z

eps means epsilon ε. it means very small value like 0.0000001

yenchenlin · 2021-09-23T19:10:44Z

Do you have code to reproduce the error?

xiaohulihutu · 2021-09-24T06:31:07Z

I saw m1kit was testing his own set and my error popped up when I was training my own set. I used Colmap to get the camera position info and it can run like 20k iterations, but it will stop randomly at a point saying RuntimeError: Function 'PowBackward0' returned nan values in its 0th output.

I tried fern sample set and no error at all (sometimes it shows GPU om, but no error when I reduced the settings). I did not change much in code except adding a ParallelData to use all four GPUs at the same time.

I just wondering what did m1kit do and waiting for his response.

m1kit · 2021-09-24T11:18:47Z

Unfortunately, for personal reasons, I cannot provide the dataset that caused this error. To be honest, it was 4 months ago, so it's hard to remember how to reproduce it in detail. I apologize for not being able to help you.

AugustasMacijauskas · 2021-11-06T14:10:52Z

Hello, I encountered the same problem when using SCNeRF, which borrows heavily from this repository, to train on custom data.

Data

The data can be accessed through this google drive link: https://drive.google.com/drive/folders/1SUzKMn6oD4inzN-m7RmHVl7gGEnq-Iv4?usp=sharing

Logs

[TRAIN] Iter: 209100 Loss: 0.006338230334222317  PSNR: 25.197158813476562
[TRAIN] Iter: 209200 Loss: 0.007395393215119839  PSNR: 24.48368263244629
[TRAIN] Iter: 209300 Loss: 0.007888318039476871  PSNR: 24.342876434326172
[TRAIN] Iter: 209400 Loss: 0.00826267059892416  PSNR: 24.05372428894043
[TRAIN] Iter: 209500 Loss: 0.0067442795261740685  PSNR: 24.944828033447266
Starts Validation Rendering
VAL PSNR 144: 22.382625579833984
Validation PRD : 0.4792793095111847
  File "run_nerf.py", line 1052, in <module>
    train()
  File "run_nerf.py", line 506, in train
    train_loss_0 = img2mse(extras['rgb0'], target_s)
  File "/home/julius_m/code/SCNeRF/NeRF/run_nerf_helpers.py", line 10, in <lambda>
    img2mse = lambda x, y : torch.mean((x - y) ** 2)
 (function _print_stack)
 26%|██████████████████████████████████████████▋                                                                                                                        | 209573/800000 [8:00:36<22:34:01,  7.27it/s]
Traceback (most recent call last):
  File "run_nerf.py", line 1052, in <module>
    train()
  File "run_nerf.py", line 606, in train
    train_loss.backward()
  File "/home/julius_m/miniconda3/envs/icn/lib/python3.8/site-packages/torch/_tensor.py", line 255, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/julius_m/miniconda3/envs/icn/lib/python3.8/site-packages/torch/autograd/__init__.py", line 147, in backward
    Variable._execution_engine.run_backward(
RuntimeError: Function 'PowBackward0' returned nan values in its 0th output.
! [Numerical Error] rgb_map contains nan or inf.
! [Numerical Error] disp_map contains nan or inf.
! [Numerical Error] acc_map contains nan or inf.
! [Numerical Error] raw contains nan or inf.
! [Numerical Error] rgb0 contains nan or inf.
! [Numerical Error] disp0 contains nan or inf.
! [Numerical Error] acc0 contains nan or inf.
! [Numerical Error] z_std contains nan or inf.

Launch script

cd NeRF

python run_nerf.py \
    --config configs/llff_data/lamp.txt \
    --expname lamp \
    --chunk 8192 \
    --N_rand 1024 \
    --camera_model pinhole_rot_noise_10k_rayo_rayd \
    --ray_loss_type proj_ray_dist \
    --multiplicative_noise True \
    --i_ray_dist_loss 10 \
    --grid_size 10 \
    --ray_dist_loss_weight 0.0001 \
    --N_iters 800001 \
    --use_custom_optim True \
    --ray_o_noise_scale 1e-3 \
    --ray_d_noise_scale 1e-3 \
    --non_linear_weight_decay 0.1 \
    --add_ie 200000 \
    --add_od 400000 \
    --add_prd 600000

Config

Note: make sure to change the datadir to where you downloaded the above data.

configs/llff_data/lamp.txt

expname = lamp
basedir = ./logs
datadir = <path_to_lamp_dir>/lamp
dataset_type = llff

factor = 8
llffhold = 8

N_rand = 1024
N_samples = 64
N_importance = 64

use_viewdirs = True
raw_noise_std = 1e0

cduguet · 2021-11-08T20:01:07Z

I can confirm this problem is happening to me on https://github.com/apchenstu/mvsnerf, trying out with either the lego synthetic dataset, or the orchid llff dataset.

I'll try to see how to make this reproducible.

davodogster · 2022-01-19T02:11:35Z

Hello, I encountered the same problem when using SCNeRF, which borrows heavily from this repository, to train on custom data.

Data

The data can be accessed through this google drive link: https://drive.google.com/drive/folders/1SUzKMn6oD4inzN-m7RmHVl7gGEnq-Iv4?usp=sharing

Hi @AugustasMacijauskas did you have any success training with your custom dataset?

AugustasMacijauskas · 2022-01-20T11:31:40Z

@davodogster No, I lost my patience and moved on to other things. I was also having a hard time figuring out how to debug this efficiently, since training for a few hours before it crashes and then changing one line of code and seeing if that helps is not going to work.

snknitin · 2022-11-01T07:09:54Z

If it is an error in the 0th output, that means your weights are still not fully updated so some values in some batch's predictions , during your first epoch are nans. So it's not your inputs, but your model predictions that are nans. Could be an overflow or underflow error. This will make any loss function give you a tensor(nan).What you can do is put a check for when loss is nan and let the weights adjust themselves

criterion = SomeLossFunc()
eps = 1e-6
loss = criterion(preds,targets)
if loss.isnan(): loss=eps
else: loss = loss.item()
loss = loss+ L1_loss + ...

dunbar12138 mentioned this issue Jan 20, 2023

" allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass RuntimeError: Function 'PowBackward0' returned nan values in its 0th output" dunbar12138/DSNeRF#85

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Function 'PowBackward0' returned nan values in its 0th output. #38

RuntimeError: Function 'PowBackward0' returned nan values in its 0th output. #38

m1kit commented May 19, 2021

m1kit commented May 19, 2021 •

edited

Loading

xiaohulihutu commented Sep 22, 2021

xiaohulihutu commented Sep 22, 2021

m1kit commented Sep 23, 2021

yenchenlin commented Sep 23, 2021

xiaohulihutu commented Sep 24, 2021

m1kit commented Sep 24, 2021

AugustasMacijauskas commented Nov 6, 2021

cduguet commented Nov 8, 2021

davodogster commented Jan 19, 2022 •

edited

Loading

Data

AugustasMacijauskas commented Jan 20, 2022

snknitin commented Nov 1, 2022 •

edited

Loading

RuntimeError: Function 'PowBackward0' returned nan values in its 0th output. #38

RuntimeError: Function 'PowBackward0' returned nan values in its 0th output. #38

Comments

m1kit commented May 19, 2021

m1kit commented May 19, 2021 • edited Loading

xiaohulihutu commented Sep 22, 2021

xiaohulihutu commented Sep 22, 2021

m1kit commented Sep 23, 2021

yenchenlin commented Sep 23, 2021

xiaohulihutu commented Sep 24, 2021

m1kit commented Sep 24, 2021

AugustasMacijauskas commented Nov 6, 2021

Data

Logs

Launch script

Config

cduguet commented Nov 8, 2021

davodogster commented Jan 19, 2022 • edited Loading

Data

AugustasMacijauskas commented Jan 20, 2022

snknitin commented Nov 1, 2022 • edited Loading

m1kit commented May 19, 2021 •

edited

Loading

davodogster commented Jan 19, 2022 •

edited

Loading

snknitin commented Nov 1, 2022 •

edited

Loading