Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run Zero123 under 13GB RAM #187

Open
generatorman opened this issue Jun 27, 2023 · 7 comments
Open

Run Zero123 under 13GB RAM #187

generatorman opened this issue Jun 27, 2023 · 7 comments

Comments

@generatorman
Copy link

Trying to run zero123 on Colab free tier fails because loading the model uses up all 12.7GB of RAM and crashes. Using some techniques to avoid loading the full model into RAM on the way to the GPU will unlock broader use of this exciting model.

@DSaurus
Copy link
Collaborator

DSaurus commented Jun 27, 2023

Hi, @generatorman. We are actively addressing this issue, and you can refer to this pull request for more details. You can also consider reducing the num_samples_per_ray to 256 and downsampling the resolution of images by adjusting the width and height parameters.

@generatorman
Copy link
Author

Thank you for the response. The PR you linked to seems related to VRAM usage - the issue I'm facing is with RAM. For example, running the following command quickly uses up 13GB of RAM and crashes, without using any VRAM at all.

!python launch.py --config configs/zero123.yaml --train --gpu 0 system.renderer.num_samples_per_ray=256 data.width=64 data.height=64

So currently it's bottlenecked by RAM usage rather than VRAM usage. Is there any quick fix I could apply?

@DSaurus
Copy link
Collaborator

DSaurus commented Jun 27, 2023

I think loading the zero123 guidance model requires lots of RAM. To address this, you could consider modify torch.load(..., map_location='cpu') command to torch.load(..., map_location='cuda:0), which could potentially alleviate the memory consumption. Another alternative solution is to load an fp16 model instead of an fp32 model.

@claforte
Copy link
Collaborator

@generatorman Honestly, you're going to need plenty of RAM and VRAM to run this kind of model. It's inevitable at this stage. Over time the efficiency of the code will probably improve, but for now, you need a good GPU and a powerful system.

I recommend we close this issue for now.

@y22ma
Copy link

y22ma commented Jul 20, 2023

Any idea what the minimum model would be required?

@davideuler
Copy link

davideuler commented Oct 27, 2023

24GB is not enough, I run it on nvidia A10, it failed as OOM:

    return self._call_impl(*args, **kwargs)
  File "/home/dreamer/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/dreamer/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 460, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/dreamer/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 96.00 MiB. GPU 0 has a total capacty of 22.02 GiB of which 85.19 MiB is free. Process 22828 has 21.93 GiB memory in use. Of the allocated memory 19.16 GiB is allocated by PyTorch, and 299.45 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

@davideuler
Copy link

davideuler commented Oct 28, 2023

It seems 40GB VRAM is enough. I run it on A100 40G successfully.
And it shows 32-39G VRAM is used in nvidia-smi output.

 | Name       | Type                          | Params
-------------------------------------------------------------
0 | geometry   | ImplicitVolume                | 12.6 M
1 | material   | DiffuseWithPointLightMaterial | 0
2 | background | SolidColorBackground          | 0
3 | renderer   | NeRFVolumeRenderer            | 0
-------------------------------------------------------------
12.6 M    Trainable params
0         Non-trainable params
12.6 M    Total params
50.450    Total estimated model params size (MB)
[INFO] Validation results will be saved to outputs/zero123/[64, 128, 256]_1_clipdrop-background-removal.png_prog0@20231028-091058/save
[INFO] Loading Zero123 ...
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.53 M params.
Keeping EMAs of 688.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
100%|███████████████████████████████████████| 890M/890M [00:56<00:00, 16.6MiB/s]
[INFO] Loaded Zero123!
/home/dreamer/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=11` in the `DataLoader` to improve performance.
/home/dreamer/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=11` in the `DataLoader` to improve performance.
Epoch 0: |                                                        | 174/? [01:52<00:00,  1.55it/s, train/loss=12.50]Epoch 0: |                                                        | 175/? [01:52<00:00,  1.55it/s, train/loss=11.20]Epoch 0: |                                                        | 200/? [02:10<00:00,  1.53it/s, train/loss=11.20] 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants