Skip to content

bug: CUDA kernels not available on DGX Spark GB10 (sm_121) — GPU engine initializes but all ops fail #7

@dndungu

Description

@dndungu

Problem

GPU engine initializes successfully (cuda.Available() = true, NewGPUEngine succeeds), but ALL kernel operations fail with "kernels not available":

{"level":"INFO","msg":"GPU engine initialized for training"}
{"level":"ERROR","msg":"training failed","err":"dlinear engine: add trend bias: add kernel: kernels not available"}

Environment

  • ztensor v0.3.0 (also tested v0.3.2-dev)
  • DGX Spark: NVIDIA GB10, sm_121, CUDA 13.0, Driver 580.142
  • Go 1.26.1 linux/arm64

Impact

The GPU sits idle at 0% utilization. All DL training runs on CPU. PatchTST and iTransformer take 10-100x longer than necessary on 28K+ row datasets.

The GB10 (Grace-Hopper) uses sm_121 compute capability. ztensor may need to compile PTX or SASS for this architecture, or enable JIT PTX compilation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions