fwi throws out of memory error on a 32gb GPU; paper mentions 24gb GPU. #29

kkothari93 · 2020-11-24T17:22:11Z

From the experiment_scripts/ folder, I try to run train_inverse_helmholtz.py experiment as follows.
python3 train_inverse_helmholtz.py --experiment_name fwi --batch_size 1

The supplementary section 5.3 of the paper states that a single 24GB GPU was used for running this experiment whereas I am using a 32GB V100 which should be sufficient. However, even with a batch size of 1 I get the following error:

RuntimeError: CUDA out of memory. Tried to allocate 14.00 MiB (GPU 0; 31.75 GiB total capacity; 28.32 GiB already allocated; 11.75 MiB free; 30.49 GiB reserved in total by PyTorch)

Here is the full trace:

Traceback (most recent call last):
  File "train_inverse_helmholtz.py", line 78, in <module>
    training.train(model=model, train_dataloader=dataloader, epochs=opt.num_epochs, lr=opt.lr,
  File "../siren/training.py", line 73, in train
    losses = loss_fn(model_output, gt)
  File "../siren/loss_functions.py", line 188, in helmholtz_pml
    b, _ = diff_operators.jacobian(modules.compl_mul(B, dudx2), x)
  File "../siren/diff_operators.py", line 53, in jacobian
    jac[:, :, i, :] = grad(y_flat, x, torch.ones_like(y_flat), create_graph=True)[0]
  File ".../anaconda3/envs/tf-gpu2/lib/python3.8/site-packages/torch/autograd/__init__.py", line 202, in grad
    return Variable._execution_engine.run_backward(
RuntimeError: CUDA out of memory. Tried to allocate 14.00 MiB (GPU 0; 31.75 GiB total capacity; 28.32 GiB already allocated; 11.75 MiB free; 30.49 GiB reserved in total by PyTorch)

Can you please help?

The text was updated successfully, but these errors were encountered:

gabewb · 2021-03-04T19:26:19Z

Yeah, I'm likewise not seeing the memory requirements scaling down with lower batch sizes on some experiments. I run out of memory with batch size 1 on train_img.py (I have a 6GB GPU).

Batch size memory scaling does work for train_sdf.py (point cloud). I'm able to get that <6GB with a batch size of 100000.

pielbia · 2022-01-17T15:26:55Z

Same here, I have also tried to reduce the batch size in train_inverse_helmholtz.py to no avail. Also running a 32 GB GPU and also getting a CUDA run out of memory error.

xefonon · 2022-01-17T15:27:48Z

Hi, I'm getting exactly the same problem. I tried using the python garbage collector (ie. gc.collect() ) and torch.cuda.empty_cache() but it still crashes with OOM. @vsitzmann any suggestions?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fwi throws out of memory error on a 32gb GPU; paper mentions 24gb GPU. #29

fwi throws out of memory error on a 32gb GPU; paper mentions 24gb GPU. #29

kkothari93 commented Nov 24, 2020 •

edited

Loading

gabewb commented Mar 4, 2021 •

edited

Loading

pielbia commented Jan 17, 2022

xefonon commented Jan 17, 2022

fwi throws out of memory error on a 32gb GPU; paper mentions 24gb GPU. #29

fwi throws out of memory error on a 32gb GPU; paper mentions 24gb GPU. #29

Comments

kkothari93 commented Nov 24, 2020 • edited Loading

gabewb commented Mar 4, 2021 • edited Loading

pielbia commented Jan 17, 2022

xefonon commented Jan 17, 2022

kkothari93 commented Nov 24, 2020 •

edited

Loading

gabewb commented Mar 4, 2021 •

edited

Loading