Run Zero123 under 13GB RAM #187

generatorman · 2023-06-27T11:13:37Z

Trying to run zero123 on Colab free tier fails because loading the model uses up all 12.7GB of RAM and crashes. Using some techniques to avoid loading the full model into RAM on the way to the GPU will unlock broader use of this exciting model.

DSaurus · 2023-06-27T12:56:46Z

Hi, @generatorman. We are actively addressing this issue, and you can refer to this pull request for more details. You can also consider reducing the num_samples_per_ray to 256 and downsampling the resolution of images by adjusting the width and height parameters.

generatorman · 2023-06-27T13:30:27Z

Thank you for the response. The PR you linked to seems related to VRAM usage - the issue I'm facing is with RAM. For example, running the following command quickly uses up 13GB of RAM and crashes, without using any VRAM at all.

!python launch.py --config configs/zero123.yaml --train --gpu 0 system.renderer.num_samples_per_ray=256 data.width=64 data.height=64

So currently it's bottlenecked by RAM usage rather than VRAM usage. Is there any quick fix I could apply?

DSaurus · 2023-06-27T14:54:18Z

I think loading the zero123 guidance model requires lots of RAM. To address this, you could consider modify torch.load(..., map_location='cpu') command to torch.load(..., map_location='cuda:0), which could potentially alleviate the memory consumption. Another alternative solution is to load an fp16 model instead of an fp32 model.

claforte · 2023-06-30T21:29:34Z

@generatorman Honestly, you're going to need plenty of RAM and VRAM to run this kind of model. It's inevitable at this stage. Over time the efficiency of the code will probably improve, but for now, you need a good GPU and a powerful system.

I recommend we close this issue for now.

y22ma · 2023-07-20T18:10:29Z

Any idea what the minimum model would be required?

davideuler · 2023-10-27T19:40:19Z

24GB is not enough, I run it on nvidia A10, it failed as OOM:

    return self._call_impl(*args, **kwargs)
  File "/home/dreamer/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/dreamer/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 460, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/dreamer/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 96.00 MiB. GPU 0 has a total capacty of 22.02 GiB of which 85.19 MiB is free. Process 22828 has 21.93 GiB memory in use. Of the allocated memory 19.16 GiB is allocated by PyTorch, and 299.45 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

davideuler · 2023-10-28T09:15:25Z

It seems 40GB VRAM is enough. I run it on A100 40G successfully.
And it shows 32-39G VRAM is used in nvidia-smi output.

 | Name       | Type                          | Params
-------------------------------------------------------------
0 | geometry   | ImplicitVolume                | 12.6 M
1 | material   | DiffuseWithPointLightMaterial | 0
2 | background | SolidColorBackground          | 0
3 | renderer   | NeRFVolumeRenderer            | 0
-------------------------------------------------------------
12.6 M    Trainable params
0         Non-trainable params
12.6 M    Total params
50.450    Total estimated model params size (MB)
[INFO] Validation results will be saved to outputs/zero123/[64, 128, 256]_1_clipdrop-background-removal.png_prog0@20231028-091058/save
[INFO] Loading Zero123 ...
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.53 M params.
Keeping EMAs of 688.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
100%|███████████████████████████████████████| 890M/890M [00:56<00:00, 16.6MiB/s]
[INFO] Loaded Zero123!
/home/dreamer/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=11` in the `DataLoader` to improve performance.
/home/dreamer/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=11` in the `DataLoader` to improve performance.
Epoch 0: |                                                        | 174/? [01:52<00:00,  1.55it/s, train/loss=12.50]Epoch 0: |                                                        | 175/? [01:52<00:00,  1.55it/s, train/loss=11.20]Epoch 0: |                                                        | 200/? [02:10<00:00,  1.53it/s, train/loss=11.20]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run Zero123 under 13GB RAM #187

Run Zero123 under 13GB RAM #187

generatorman commented Jun 27, 2023

DSaurus commented Jun 27, 2023 •

edited

generatorman commented Jun 27, 2023

DSaurus commented Jun 27, 2023

claforte commented Jun 30, 2023

y22ma commented Jul 20, 2023

davideuler commented Oct 27, 2023 •

edited

davideuler commented Oct 28, 2023 •

edited

Run Zero123 under 13GB RAM #187

Run Zero123 under 13GB RAM #187

Comments

generatorman commented Jun 27, 2023

DSaurus commented Jun 27, 2023 • edited

generatorman commented Jun 27, 2023

DSaurus commented Jun 27, 2023

claforte commented Jun 30, 2023

y22ma commented Jul 20, 2023

davideuler commented Oct 27, 2023 • edited

davideuler commented Oct 28, 2023 • edited

DSaurus commented Jun 27, 2023 •

edited

davideuler commented Oct 27, 2023 •

edited

davideuler commented Oct 28, 2023 •

edited