RhiResult(-3) error when running NGP example on Windows with Vulkan #7674

arrrmin · 2023-03-28T12:35:51Z

Describe the bug
I get a Dispatch error : RhiResult(-3) when trying to run taichi_ngp.py on Windows using Vulkan.

To Reproduce
Tried running the script taichi_ngp.py on Windows 11 with python 3.10.9 and taichi 1.5.0.
CPU: Ryzen 9 7900X
GPU: AMD 7900XT

Log/Screenshots

(taichi) C:\Users\armin\projects\external\taichi>C:/Users/armin/Programs/miniconda3/envs/taichi/python.exe c:/Users/armin/projects/external/taichi/python/taichi/examples/rendering/taichi_ngp.py
[Taichi] version 1.5.0, llvm 15.0.1, commit 7b885c28, win, python 3.10.9
[W 03/28/23 14:37:12.283 28600] [cuda_driver.cpp:taichi::lang::CUDADriverBase::load_lib@36] nvcuda.dll lib not found.
[Taichi] Starting on arch=vulkan
Loading model from ./npy_models/lego.npy
[E 03/28/23 14:37:12.541 28600] [runtime.cpp:taichi::lang::gfx::GfxRuntime::launch_kernel@576] Dispatch error : RhiResult(-3)


Traceback (most recent call last):
  File "c:\Users\armin\projects\external\taichi\python\taichi\examples\rendering\taichi_ngp.py", line 1111, in <module>
    main(cmd_args)
  File "c:\Users\armin\projects\external\taichi\python\taichi\examples\rendering\taichi_ngp.py", line 1081, in main
    ngp.load_model(npy_file)
  File "c:\Users\armin\projects\external\taichi\python\taichi\examples\rendering\taichi_ngp.py", line 332, in load_model
    self.hash_embedding.from_numpy(
  File "C:\Users\armin\Programs\miniconda3\envs\taichi\lib\site-packages\taichi\lang\util.py", line 311, in wrapped
    return func(*args, **kwargs)
  File "C:\Users\armin\Programs\miniconda3\envs\taichi\lib\site-packages\taichi\lang\field.py", line 357, in from_numpy
    self._from_external_arr(arr)
  File "C:\Users\armin\Programs\miniconda3\envs\taichi\lib\site-packages\taichi\lang\util.py", line 311, in wrapped
    return func(*args, **kwargs)
  File "C:\Users\armin\Programs\miniconda3\envs\taichi\lib\site-packages\taichi\lang\field.py", line 347, in _from_external_arr
    ext_arr_to_tensor(arr, self)
  File "C:\Users\armin\Programs\miniconda3\envs\taichi\lib\site-packages\taichi\lang\kernel_impl.py", line 1023, in wrapped
    return primal(*args, **kwargs)
  File "C:\Users\armin\Programs\miniconda3\envs\taichi\lib\site-packages\taichi\lang\kernel_impl.py", line 950, in __call__
    return self.runtime.compiled_functions[key](*args)
  File "C:\Users\armin\Programs\miniconda3\envs\taichi\lib\site-packages\taichi\lang\kernel_impl.py", line 853, in func__
    raise e from None
  File "C:\Users\armin\Programs\miniconda3\envs\taichi\lib\site-packages\taichi\lang\kernel_impl.py", line 850, in func__
    t_kernel(launch_ctx)
RuntimeError: [runtime.cpp:taichi::lang::gfx::GfxRuntime::launch_kernel@576] Dispatch error : RhiResult(-3)

Additional comments

(taichi) C:\Users\armin\projects\external\taichi>ti diagnose
[Taichi] version 1.5.0, llvm 15.0.1, commit 7b885c28, win, python 3.10.9

*******************************************
**      Taichi Programming Language      **
*******************************************

Docs:   https://docs.taichi-lang.org/
GitHub: https://github.com/taichi-dev/taichi/
Forum:  https://forum.taichi.graphics/

Taichi system diagnose:

python: 3.10.9 | packaged by conda-forge | (main, Jan 11 2023, 15:15:40) [MSC v.1916 64 bit (AMD64)]
system: win32
executable: C:\Users\armin\Programs\miniconda3\envs\taichi\python.exe
platform: Windows-10-10.0.22621-SP0
architecture: 64bit WindowsPE
uname: uname_result(system='Windows', node='Armin-PC', release='10', version='10.0.22621', machine='AMD64')
locale: en_AT.cp1252
PATH: C:\Users\armin\Programs\miniconda3\envs\taichi;C:\Users\armin\Programs\miniconda3\envs\taichi\Library\mingw-w64\bin;C:\Users\armin\Programs\miniconda3\envs\taichi\Library\usr\bin;C:\Users\armin\Programs\miniconda3\envs\taichi\Library\bin;C:\Users\armin\Programs\miniconda3\envs\taichi\Scripts;C:\Users\armin\Programs\miniconda3\envs\taichi\bin;C:\Users\armin\Programs\miniconda3\condabin;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0;C:\Windows\System32\OpenSSH;C:\Program Files\Git\cmd;C:\Users\armin\AppData\Local\Microsoft\WindowsApps;C:\Users\armin\AppData\Local\gitkraken\bin;C:\Users\armin\AppData\Local\Programs\Microsoft VS Code\bin;C:\Users\armin\Programs\miniconda3;C:\Users\armin\Programs\miniconda3\Scripts;C:\Users\armin\Programs\miniconda3\Library\bin;.;C:\Users\armin\AppData\Local\Programs\Julia-1.8.5\bin;C:\Users\armin\Programs\miniconda3\envs\taichi\Lib\site-packages\taichi\_lib
PYTHONPATH: ['C:\\Users\\armin\\Programs\\miniconda3\\envs\\taichi\\Scripts\\ti.exe', 'C:\\Users\\armin\\Programs\\miniconda3\\envs\\taichi\\python310.zip', 'C:\\Users\\armin\\Programs\\miniconda3\\envs\\taichi\\DLLs', 'C:\\Users\\armin\\Programs\\miniconda3\\envs\\taichi\\lib', 'C:\\Users\\armin\\Programs\\miniconda3\\envs\\taichi', 'C:\\Users\\armin\\Programs\\miniconda3\\envs\\taichi\\lib\\site-packages']

`lsb_release` not available: [WinError 2] The system cannot find the file specified


import: <module 'taichi' from 'C:\\Users\\armin\\Programs\\miniconda3\\envs\\taichi\\lib\\site-packages\\taichi\\__init__.py'>

cc: False
cpu: True
metal: False
opengl: True
[W 03/28/23 14:28:12.986 29280] [cuda_driver.cpp:taichi::lang::CUDADriverBase::load_lib@36] nvcuda.dll lib not found.
cuda: False
vulkan: True

`glewinfo` not available: [WinError 2] The system cannot find the file specified

`nvidia-smi` not available: [WinError 2] The system cannot find the file specified
[Taichi] version 1.5.0, llvm 15.0.1, commit 7b885c28, win, python 3.10.9

[Taichi] version 1.5.0, llvm 15.0.1, commit 7b885c28, win, python 3.10.9
[Taichi] Starting on arch=x64

[Taichi] version 1.5.0, llvm 15.0.1, commit 7b885c28, win, python 3.10.9
[Taichi] Starting on arch=opengl

[W 03/28/23 14:28:15.777 32208] [cuda_driver.cpp:taichi::lang::CUDADriverBase::load_lib@36] nvcuda.dll lib not found.
[W 03/28/23 14:28:15.778 32208] [misc.py:adaptive_arch_select@772] Arch=[<Arch.cuda: 5>] is not supported, falling back to CPU
[Taichi] version 1.5.0, llvm 15.0.1, commit 7b885c28, win, python 3.10.9
[Taichi] Starting on arch=x64

[Taichi] version 1.5.0, llvm 15.0.1, commit 7b885c28, win, python 3.10.9

*******************************************
**      Taichi Programming Language      **
*******************************************

Docs:   https://docs.taichi-lang.org/
GitHub: https://github.com/taichi-dev/taichi/
Forum:  https://forum.taichi.graphics/

                                TAICHI EXAMPLES
+-----------------------------------------------------------------------------+
| 0: ad_gravity           | 25: laplace             | 50: physarum            |
| 1: circle_packing_image | 26: laplace_equation    | 51:                     |
|                         |                         | poisson_disk_sampling   |
| 2: comet                | 27: mandelbrot_zoom     | 52: print_offset        |
| 3: cornell_box          | 28: marching_squares    | 53: rasterizer          |
| 4: diff_sph             | 29: mass_spring_3d_ggui | 54: regression          |
| 5: euler                | 30: mass_spring_game    | 55: sdf_renderer        |
| 6: explicit_activation  | 31:                     | 56: simple_derivative   |
|                         | mass_spring_game_ggui   |                         |
| 7: export_mesh          | 32: mciso_advanced      | 57: simple_texture      |
| 8: export_ply           | 33: mgpcg               | 58: simple_uv           |
| 9: export_videos        | 34: mgpcg_advanced      | 59: snow_phaseField     |
| 10: fem128              | 35: minimal             | 60: stable_fluid        |
| 11: fem128_ggui         | 36: minimization        | 61: stable_fluid_ggui   |
| 12: fem99               | 37: mpm128              | 62: stable_fluid_graph  |
| 13: fractal             | 38: mpm128_ggui         | 63: taichi_bitmasked    |
| 14: fractal3d_ggui      | 39: mpm3d               | 64: taichi_dynamic      |
| 15: fullscreen          | 40: mpm3d_ggui          | 65: taichi_logo         |
| 16: game_of_life        | 41: mpm88               | 66: taichi_ngp          |
| 17: gui_image_io        | 42: mpm88_graph         | 67: taichi_sparse       |
| 18: gui_widgets         | 43: mpm99               | 68: texture_graph       |
| 19: implicit_fem        | 44:                     | 69: tutorial            |
|                         | mpm_lagrangian_forces   |                         |
| 20:                     | 45: nbody               | 70:                     |
| implicit_mass_spring    |                         | two_stream_instability  |
| 21:                     | 46: odop_solar          | 71: vortex_rings        |
| initial_value_problem   |                         |                         |
| 22: jacobian            | 47: oit_renderer        | 72: waterwave           |
| 23:                     | 48: patterns            |                         |
| karman_vortex_street    |                         |                         |
| 24: keyboard            | 49: pbf2d               |                         |
+-----------------------------------------------------------------------------+
42
Running example minimal ...
[Taichi] Starting on arch=x64
42.0
>>> Running time: 0.22s

Consider attaching this log when maintainers ask about system information.
>>> Running time: 6.75s

The text was updated successfully, but these errors were encountered:

bobcao3 · 2023-03-28T16:10:38Z

-3 means error not supported, so we accidentally used a unsupported / not activated feature I think...

We will look into it

bobcao3 · 2023-03-30T03:44:24Z

We have located the bug, it appears to be an issue with the maxComputeWorkGroupCount values on AMD GPUs. We should be able to fix this quickly

chenzhekl · 2023-09-10T14:36:24Z

Any updates for this issue? This seems to have affected Intel cards as well.

chenzhekl · 2023-09-15T13:22:48Z

Here is a minimal script to reproduce the issue:

import numpy as np
import taichi as ti

ti.init(arch=ti.gpu)


np_arr = np.ones((100000000,), dtype=np.float32)
ti_field = ti.field(dtype=ti.f32, shape=(np_arr.shape[0],))


@ti.kernel
def run(dst: ti.template(), src: ti.types.ndarray()):
    for I in dst:
        dst[I] = src[I]


run(ti_field, np_arr)

Once the length of np_arr exceeds a point, we get RhiResult(-3) for the vulkan backend.

If this issue is low priority on your list, could you please advise me how I may contribute a PR for the issue? @bobcao3

Thanks

dme49 · 2024-06-20T10:13:56Z

No PR, sorry, but if somebody wants to pick this up (e.g. as part of PR #7333), the following patch fixes this for me (linux, mesa radv vulkan, amd). The problem is in the const-range case of spir-v range-for codegen, which can currently ask for an unbounded number of workgroups. This patch just applies a fixed cap that matches the dynamic case. I'm assuming the performance implications don't matter, but you may know better.

diff --git a/taichi/codegen/spirv/spirv_codegen.cpp b/taichi/codegen/spirv/spirv_codegen.cpp
index e1e1124fd..b9167f6e0 100644
--- a/taichi/codegen/spirv/spirv_codegen.cpp
+++ b/taichi/codegen/spirv/spirv_codegen.cpp
@@ -2000,7 +2000,17 @@ class TaskCodegen : public IRVisitor {
           ir_->i32_type(), stmt->begin_value, false);  // Named Constant
       total_elems = ir_->int_immediate_number(ir_->i32_type(), num_elems,
                                               false);  // Named Constant
-      task_attribs_.advisory_total_num_threads = num_elems;
+      // To avoid exceeding device limits, we must cap total_num_threads so
+      // that the eventual num_workgroups = total_num_threads/block_dim is in
+      // range.  Use the same kMaxNumThreadsGridStrideLoop cap as the dynamic
+      // case; that's probably a bit conservative for typical (num_elems,
+      // block_dim) combinations, but there's little to be gained by adapting
+      // to the actual limit even if that were readily available.
+      task_attribs_.advisory_total_num_threads = std::min(
+          kMaxNumThreadsGridStrideLoop, num_elems);
+      TI_DEBUG("num_elems={} block_dim={} -> advisory_total_num_threads={}",
+               num_elems, stmt->block_dim,
+               task_attribs_.advisory_total_num_threads);
     } else {
       spirv::Value end_expr_value;
       if (stmt->end_stmt) {

bobcao3 self-assigned this Mar 30, 2023

bobcao3 added bug We've confirmed that this is an BUG vulkan Vulkan backend labels Mar 30, 2023

deehrlic mentioned this issue May 11, 2023

Calling taichi's zero_field function with default trace size causes Vulkan kernel error PolyPhyHub/PolyPhy#46

Open

chenzhekl mentioned this issue Sep 10, 2023

Dispatch error when using Vulkan backend on Intel integrated graphics Linyou/taichi-ngp-renderer#8

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RhiResult(-3) error when running NGP example on Windows with Vulkan #7674

RhiResult(-3) error when running NGP example on Windows with Vulkan #7674

arrrmin commented Mar 28, 2023 •

edited

Loading

bobcao3 commented Mar 28, 2023

bobcao3 commented Mar 30, 2023

chenzhekl commented Sep 10, 2023

chenzhekl commented Sep 15, 2023 •

edited

Loading

dme49 commented Jun 20, 2024

RhiResult(-3) error when running NGP example on Windows with Vulkan #7674

RhiResult(-3) error when running NGP example on Windows with Vulkan #7674

Comments

arrrmin commented Mar 28, 2023 • edited Loading

bobcao3 commented Mar 28, 2023

bobcao3 commented Mar 30, 2023

chenzhekl commented Sep 10, 2023

chenzhekl commented Sep 15, 2023 • edited Loading

dme49 commented Jun 20, 2024

arrrmin commented Mar 28, 2023 •

edited

Loading

chenzhekl commented Sep 15, 2023 •

edited

Loading