[Bug]: train_num_rays ZeroDivisionError: division by zero #47

Katehuuh · 2023-11-02T04:35:03Z

Hello, i could be wrong on config but the last command 4. Mesh Extraction give me error:

...

..Wonder3D\instant-nsr-pl\systems\neus_ortho.py", line 139, in training_step
    train_num_rays = int(self.train_num_rays * (self.train_num_samples / out['num_samples_full'].sum().item()))
ZeroDivisionError: division by zero
[W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]

The text was updated successfully, but these errors were encountered:

Katehuuh · 2023-11-02T04:35:12Z

my Installation on commit df03557,Win,Python 3.10...

git clone https://github.com/xxlong0/Wonder3D.git && cd Wonder3D
python -m venv venv && venv\Scripts\activate

mkdir ckpts\unet
curl -L -o ckpts\unet\diffusion_pytorch_model.bin https://huggingface.co/spaces/flamehaze1115/Wonder3D-demo/resolve/main/ckpts/unet/diffusion_pytorch_model.bin
curl -L -o ckpts\unet\config.json https://huggingface.co/spaces/flamehaze1115/Wonder3D-demo/resolve/main/ckpts/unet/config.json
curl -L -o ckpts\random_states_0.pkl https://huggingface.co/spaces/flamehaze1115/Wonder3D-demo/resolve/main/ckpts/random_states_0.pkl
curl -L -o ckpts\scaler.pt https://huggingface.co/spaces/flamehaze1115/Wonder3D-demo/resolve/main/ckpts/scaler.pt
curl -L -o ckpts\scheduler.bin https://huggingface.co/spaces/flamehaze1115/Wonder3D-demo/resolve/main/ckpts/scheduler.bin
mkdir sam_pt
curl -L -o sam_pt\sam_vit_h_4b8939.pth https://huggingface.co/spaces/flamehaze1115/Wonder3D-demo/resolve/main/sam_pt/sam_vit_h_4b8939.pth

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip3 install -U xformers --index-url https://download.pytorch.org/whl/cu118

pip install einops omegaconf pytorch-lightning==1.9.5 torch_efficient_distloss nerfacc==0.3.3 PyMCubes trimesh fire diffusers==0.19.3 transformers bitsandbytes accelerate gradio rembg segment_anything chardet streamlit tensorboard tensorboardX

"%ProgramFiles%\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvars32.bat" x64
pip install ninja git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch

# pip install https://huggingface.co/r4ziel/xformers_pre_built/resolve/main/triton-2.0.0-cp310-cp310-win_amd64.whl
# pull/34
# python gradio_app.py

accelerate launch --config_file 1gpu.yaml test_mvdiffusion_seq.py --config configs/mvdiffusion-joint-ortho-6views.yaml
cd instant-nsr-pl
python launch.py --config configs/neuralangelo-ortho-wmask.yaml --gpu 0 --train dataset.root_dir=path\to\Wonder3D\outputs\cropsize-192-cfg3.0 dataset.scene=owl

Full error output

(venv) C:\Wonder3D\instant-nsr-pl>python launch.py --config configs/neuralangelo-ortho-wmask.yaml --gpu 0 --train dataset.root_dir=C:\Wonder3D\outputs\cropsize-192-cfg3.0 dataset.scene=owl
Global seed set to 42
C:\Wonder3D\venv\lib\site-packages\torch\nn\utils\weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
Using finite difference to compute gradients with eps=progressive
Using 16bit None Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
C:\Wonder3D\outputs\cropsize-192-cfg3.0\owl
(1024, 1024, 3)
the loaded normals are defined in the system of front view
C:\Wonder3D\outputs\cropsize-192-cfg3.0\owl
(1024, 1024, 3)
the loaded normals are defined in the system of front view
C:\Wonder3D\outputs\cropsize-192-cfg3.0\owl
(1024, 1024, 3)
the loaded normals are defined in the system of front view
C:\Wonder3D\outputs\cropsize-192-cfg3.0\owl
(1024, 1024, 3)
the loaded normals are defined in the system of front view
C:\Wonder3D\outputs\cropsize-192-cfg3.0\owl
(1024, 1024, 3)
the loaded normals are defined in the system of front view
C:\Wonder3D\outputs\cropsize-192-cfg3.0\owl
(1024, 1024, 3)
the loaded normals are defined in the system of front view
C:\Wonder3D\outputs\cropsize-192-cfg3.0\owl
(1024, 1024, 3)
the loaded normals are defined in the system of front view
C:\Wonder3D\outputs\cropsize-192-cfg3.0\owl
(1024, 1024, 3)
the loaded normals are defined in the system of front view
C:\Wonder3D\outputs\cropsize-192-cfg3.0\owl
(1024, 1024, 3)
the loaded normals are defined in the system of front view
C:\Wonder3D\outputs\cropsize-192-cfg3.0\owl
(1024, 1024, 3)
the loaded normals are defined in the system of front view
C:\Wonder3D\outputs\cropsize-192-cfg3.0\owl
(1024, 1024, 3)
the loaded normals are defined in the system of front view
C:\Wonder3D\outputs\cropsize-192-cfg3.0\owl
(1024, 1024, 3)
the loaded normals are defined in the system of front view
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name  | Type             | Params
-------------------------------------------
0 | cos   | CosineSimilarity | 0
1 | model | NeuSModel        | 7.7 M
-------------------------------------------
7.7 M     Trainable params
0         Non-trainable params
7.7 M     Total params
15.371    Total estimated model params size (MB)
Epoch 0: : 0it [00:00, ?it/s]Update finite_difference_eps to 0.027204705103003882
Traceback (most recent call last):
  File "C:\Wonder3D\instant-nsr-pl\launch.py", line 125, in <module>
    main()
  File "C:\Wonder3D\instant-nsr-pl\launch.py", line 114, in main
    trainer.fit(system, datamodule=dm)
  File "C:\Wonder3D\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 608, in fit
    call._call_and_handle_interrupt(
  File "C:\Wonder3D\venv\lib\site-packages\pytorch_lightning\trainer\call.py", line 38, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "C:\Wonder3D\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 650, in _fit_impl
    self._run(model, ckpt_path=self.ckpt_path)
  File "C:\Wonder3D\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1112, in _run
    results = self._run_stage()
  File "C:\Wonder3D\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1191, in _run_stage
    self._run_train()
  File "C:\Wonder3D\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1214, in _run_train
    self.fit_loop.run()
  File "C:\Wonder3D\venv\lib\site-packages\pytorch_lightning\loops\loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "C:\Wonder3D\venv\lib\site-packages\pytorch_lightning\loops\fit_loop.py", line 267, in advance
    self._outputs = self.epoch_loop.run(self._data_fetcher)
  File "C:\Wonder3D\venv\lib\site-packages\pytorch_lightning\loops\loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "C:\Wonder3D\venv\lib\site-packages\pytorch_lightning\loops\epoch\training_epoch_loop.py", line 213, in advance
    batch_output = self.batch_loop.run(kwargs)
  File "C:\Wonder3D\venv\lib\site-packages\pytorch_lightning\loops\loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "C:\Wonder3D\venv\lib\site-packages\pytorch_lightning\loops\batch\training_batch_loop.py", line 88, in advance
    outputs = self.optimizer_loop.run(optimizers, kwargs)
  File "C:\Wonder3D\venv\lib\site-packages\pytorch_lightning\loops\loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "C:\Wonder3D\venv\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 202, in advance
    result = self._run_optimization(kwargs, self._optimizers[self.optim_progress.optimizer_position])
  File "C:\Wonder3D\venv\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 249, in _run_optimization
    self._optimizer_step(optimizer, opt_idx, kwargs.get("batch_idx", 0), closure)
  File "C:\Wonder3D\venv\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 370, in _optimizer_step
    self.trainer._call_lightning_module_hook(
  File "C:\Wonder3D\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1356, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
  File "C:\Wonder3D\venv\lib\site-packages\pytorch_lightning\core\module.py", line 1754, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "C:\Wonder3D\venv\lib\site-packages\pytorch_lightning\core\optimizer.py", line 169, in step
    step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
  File "C:\Wonder3D\venv\lib\site-packages\pytorch_lightning\strategies\strategy.py", line 234, in optimizer_step
    return self.precision_plugin.optimizer_step(
  File "C:\Wonder3D\venv\lib\site-packages\pytorch_lightning\plugins\precision\native_amp.py", line 75, in optimizer_step
    closure_result = closure()
  File "C:\Wonder3D\venv\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 149, in __call__
    self._result = self.closure(*args, **kwargs)
  File "C:\Wonder3D\venv\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 135, in closure
    step_output = self._step_fn()
  File "C:\Wonder3D\venv\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 419, in _training_step
    training_step_output = self.trainer._call_strategy_hook("training_step", *kwargs.values())
  File "C:\Wonder3D\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1494, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "C:\Wonder3D\venv\lib\site-packages\pytorch_lightning\strategies\dp.py", line 134, in training_step
    return self.model(*args, **kwargs)
  File "C:\Wonder3D\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Wonder3D\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Wonder3D\venv\lib\site-packages\torch\nn\parallel\data_parallel.py", line 183, in forward
    return self.module(*inputs[0], **module_kwargs[0])
  File "C:\Wonder3D\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Wonder3D\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Wonder3D\venv\lib\site-packages\pytorch_lightning\overrides\data_parallel.py", line 77, in forward
    output = super().forward(*inputs, **kwargs)
  File "C:\Wonder3D\venv\lib\site-packages\pytorch_lightning\overrides\base.py", line 98, in forward
    output = self._forward_module.training_step(*inputs, **kwargs)
  File "C:\Wonder3D\instant-nsr-pl\systems\neus_ortho.py", line 139, in training_step
    train_num_rays = int(self.train_num_rays * (self.train_num_samples / out['num_samples_full'].sum().item()))
ZeroDivisionError: division by zero
[W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]

kotaxyz · 2023-11-02T06:22:14Z

i get the same error , may i ask what gpu you use

Katehuuh · 2023-11-02T06:23:52Z

i get the same error , may i ask what gpu you use

4090

kotaxyz · 2023-11-02T06:26:31Z

cool , so its not related to out of memory issues i was worrying about that because i have rtx 3060ti

kotaxyz · 2023-11-02T06:28:32Z

untill the dev fixes it you can use this colab version
https://colab.research.google.com/github/camenduru/Wonder3D-colab/blob/main/Wonder3D_mesh_colab.ipynb

xxlong0 · 2023-11-02T13:52:09Z

Hello. You may try NeuS-based reconstruction if you meet problems of instant-nsr-pl.

xxlong0 · 2023-11-02T13:52:46Z

Hello, i could be wrong on config but the last command 4. Mesh Extraction give me error:

...

..Wonder3D\instant-nsr-pl\systems\neus_ortho.py", line 139, in training_step
    train_num_rays = int(self.train_num_rays * (self.train_num_samples / out['num_samples_full'].sum().item()))
ZeroDivisionError: division by zero
[W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]

Could you please more info about any edits?

Katehuuh · 2023-11-03T00:09:39Z

Hello. You may try NeuS-based reconstruction if you meet problems of instant-nsr-pl.

cd ./NeuS
python exp_runner.py --mode train --conf ./confs/wmask.conf --case owl --data_dir C:\path\to\Wonder3D\outputs\cropsize-192-cfg3.0

NeuS give me pyparsing.exceptions.ParseSyntaxException: , found '=' (at char 1049), (line:60, col:14).

Added config hard coded, working but had to modify:

        self.conf = ConfigFactory.from_dict({
            "general": {
                "base_exp_dir": "./exp/neus/owl/",
                "recording": ["./", "./models"]
            },
            "dataset": {
                "data_dir": "./outputs/",
                "object_name": "owl",
                "object_viewidx": 1,
                "imSize": [256, 256],
                "load_color": True,
                "stage": "coarse",
                "mtype": "mlp",
                "normal_system": "front",
                "num_views": 6
            },
            "train": {
                "learning_rate": 5e-4,
                "learning_rate_alpha": 0.05,
                "end_iter": 1000,
                "batch_size": 512,
                "validate_resolution_level": 1,
                "warm_up_end": 500,
                "anneal_end": 0,
                "use_white_bkgd": True,
                "save_freq": 5000,
                "val_freq": 5000,
                "val_mesh_freq": 5000,
                "report_freq": 100,
                "color_weight": 1.0,
                "igr_weight": 0.1,
                "mask_weight": 1.0,
                "normal_weight": 1.0,
                "sparse_weight": 0.1
            },
            "model": {
                "nerf":{
                    "D" : 8,
                    "d_in" : 4,
                    "d_in_view" : 3,
                    "W" : 256,
                    "multires" : 10,
                    "multires_view" : 4,
                    "output_ch" : 4,
                    "skips":[4],
                    "use_viewdirs" : True
                },
                'sdf_network': {
                    'd_out':257, 
                    'd_in':3, 
                    'd_hidden':256, 
                    'n_layers':8, 
                    'skip_in':[4], 
                    'multires':6, 
                    'bias':0.5, 
                    'scale':1.0, 
                    'geometric_init':True, 
                    'weight_norm':True
                },
                'variance_network': {
                    'init_val':0.3
                },
                'rendering_network': {
                    'd_feature':256, 
                    'mode':'no_view_dir', 
                    'd_in':6, 
                    'd_out':3, 
                    'd_hidden':256, 
                    'n_layers':4, 
                    'weight_norm':True, 
                    'multires_view':0, 
                    'squeeze_out':True
                },
                'neus_renderer': {
                    'n_samples':64, 
                    'n_importance':64, 
                    'n_outside':0, 
                    'up_sample_steps':4, 
                    'perturb':1.0, 
                    'sdf_decay_param':100
                }
            }
        })

num_workers=os.cpu_count() // 2

I did not make any edits on config, my logs are the only step i done.

xxlong0 · 2023-11-03T01:06:59Z

C:\path\to\Wonder3D\outputs\cropsize-192-cfg3.0

Hello, I don't meet such a problem in my side. Maybe you can check whether your windows path manner is right or not.

luopeiyu · 2023-11-06T10:53:54Z

i have meet the same problem and finally fix it.
the problem is something incompatible with pytorh_lightning on windows
i have goto the original instant-nsr-pl github repository and found there exist some issues with similar problem , and the author has publish a branch named "fix-win-data" for these problem , then i compare the changes form:

bennyguo/instant-nsr-pl@main...fix-data-win

and then i remove all ".to(self.rank)" and "device=self.dataset.all_images.device" from the code, and add ".to(self.device)" to the data that need to send to gpu.

and final , it work

xxlong0 · 2023-11-06T13:01:41Z

i have meet the same problem and finally fix it. the problem is something incompatible with pytorh_lightning on windows i have goto the original instant-nsr-pl github repository and found there exist some issues with similar problem , and the author has publish a branch named "fix-win-data" for these problem , then i compare the changes form:

bennyguo/instant-nsr-pl@main...fix-data-win

and then i remove all ".to(self.rank)" and "device=self.dataset.all_images.device" from the code, and add ".to(self.device)" to the data that need to send to gpu.

and final , it work

@luopeiyu Thanks very much for your information, we will check and try to update the instant-nsr-pl.

472756921 · 2023-11-07T11:15:11Z

i have meet the same problem and finally fix it. the problem is something incompatible with pytorh_lightning on windows i have goto the original instant-nsr-pl github repository and found there exist some issues with similar problem , and the author has publish a branch named "fix-win-data" for these problem , then i compare the changes form:

bennyguo/instant-nsr-pl@main...fix-data-win

and then i remove all ".to(self.rank)" and "device=self.dataset.all_images.device" from the code, and add ".to(self.device)" to the data that need to send to gpu.

and final , it work

Do you have a detailed case?
I tried, but it didn't work.....

Katehuuh closed this as completed Nov 3, 2023

tomyu168 mentioned this issue Jun 30, 2024

Instant-NSR Mesh Extraction 失败，我是windows平台 pengHTYX/Era3D#27

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: train_num_rays ZeroDivisionError: division by zero #47

[Bug]: train_num_rays ZeroDivisionError: division by zero #47

Katehuuh commented Nov 2, 2023

Katehuuh commented Nov 2, 2023

kotaxyz commented Nov 2, 2023

Katehuuh commented Nov 2, 2023

kotaxyz commented Nov 2, 2023

kotaxyz commented Nov 2, 2023

xxlong0 commented Nov 2, 2023

xxlong0 commented Nov 2, 2023

Katehuuh commented Nov 3, 2023 •

edited

Loading

xxlong0 commented Nov 3, 2023

luopeiyu commented Nov 6, 2023

xxlong0 commented Nov 6, 2023

472756921 commented Nov 7, 2023

[Bug]: train_num_rays ZeroDivisionError: division by zero #47

[Bug]: train_num_rays ZeroDivisionError: division by zero #47

Comments

Katehuuh commented Nov 2, 2023

Katehuuh commented Nov 2, 2023

kotaxyz commented Nov 2, 2023

Katehuuh commented Nov 2, 2023

kotaxyz commented Nov 2, 2023

kotaxyz commented Nov 2, 2023

xxlong0 commented Nov 2, 2023

xxlong0 commented Nov 2, 2023

Katehuuh commented Nov 3, 2023 • edited Loading

xxlong0 commented Nov 3, 2023

luopeiyu commented Nov 6, 2023

xxlong0 commented Nov 6, 2023

472756921 commented Nov 7, 2023

Katehuuh commented Nov 3, 2023 •

edited

Loading