You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From the experiment_scripts/ folder, I try to run train_inverse_helmholtz.py experiment as follows. python3 train_inverse_helmholtz.py --experiment_name fwi --batch_size 1
The supplementary section 5.3 of the paper states that a single 24GB GPU was used for running this experiment whereas I am using a 32GB V100 which should be sufficient. However, even with a batch size of 1 I get the following error:
RuntimeError: CUDA out of memory. Tried to allocate 14.00 MiB (GPU 0; 31.75 GiB total capacity; 28.32 GiB already allocated; 11.75 MiB free; 30.49 GiB reserved in total by PyTorch)
Here is the full trace:
Traceback (most recent call last):
File "train_inverse_helmholtz.py", line 78, in <module>
training.train(model=model, train_dataloader=dataloader, epochs=opt.num_epochs, lr=opt.lr,
File "../siren/training.py", line 73, in train
losses = loss_fn(model_output, gt)
File "../siren/loss_functions.py", line 188, in helmholtz_pml
b, _ = diff_operators.jacobian(modules.compl_mul(B, dudx2), x)
File "../siren/diff_operators.py", line 53, in jacobian
jac[:, :, i, :] = grad(y_flat, x, torch.ones_like(y_flat), create_graph=True)[0]
File ".../anaconda3/envs/tf-gpu2/lib/python3.8/site-packages/torch/autograd/__init__.py", line 202, in grad
return Variable._execution_engine.run_backward(
RuntimeError: CUDA out of memory. Tried to allocate 14.00 MiB (GPU 0; 31.75 GiB total capacity; 28.32 GiB already allocated; 11.75 MiB free; 30.49 GiB reserved in total by PyTorch)
Can you please help?
The text was updated successfully, but these errors were encountered:
Yeah, I'm likewise not seeing the memory requirements scaling down with lower batch sizes on some experiments. I run out of memory with batch size 1 on train_img.py (I have a 6GB GPU).
Batch size memory scaling does work for train_sdf.py (point cloud). I'm able to get that <6GB with a batch size of 100000.
Same here, I have also tried to reduce the batch size in train_inverse_helmholtz.py to no avail. Also running a 32 GB GPU and also getting a CUDA run out of memory error.
Hi, I'm getting exactly the same problem. I tried using the python garbage collector (ie. gc.collect() ) and torch.cuda.empty_cache() but it still crashes with OOM. @vsitzmann any suggestions?
From the
experiment_scripts/
folder, I try to runtrain_inverse_helmholtz.py
experiment as follows.python3 train_inverse_helmholtz.py --experiment_name fwi --batch_size 1
The supplementary section 5.3 of the paper states that a single 24GB GPU was used for running this experiment whereas I am using a 32GB V100 which should be sufficient. However, even with a batch size of 1 I get the following error:
RuntimeError: CUDA out of memory. Tried to allocate 14.00 MiB (GPU 0; 31.75 GiB total capacity; 28.32 GiB already allocated; 11.75 MiB free; 30.49 GiB reserved in total by PyTorch)
Here is the full trace:
Can you please help?
The text was updated successfully, but these errors were encountered: