Skip to content

fix(compute): GPUEngine.Reshape honors dst argument (#81)#82

Merged
dndungu merged 1 commit intomainfrom
fix/issue-81-reshape-dst
Apr 10, 2026
Merged

fix(compute): GPUEngine.Reshape honors dst argument (#81)#82
dndungu merged 1 commit intomainfrom
fix/issue-81-reshape-dst

Conversation

@dndungu
Copy link
Copy Markdown
Contributor

@dndungu dndungu commented Apr 10, 2026

Summary

  • Fixes GPUEngine.Reshape silently ignores dst argument (compute.Engine contract violation) #81. The zero-copy GPUStorage and Float16Storage fast-paths in GPUEngine.Reshape ignored the caller-provided dst and returned a fresh tensor aliasing the source storage. Callers that passed a pre-allocated dst and discarded the return value silently received stale pre-allocated storage — exactly the failure mode that froze PatchTST GPU training (loss=0.268357 across all epochs) on DGX GB10.
  • Fix: when dst is provided, mutate dst[0] to alias the reshaped view (SetStorage + SetShape + SetStrides) and return dst[0]. When no dst is provided, preserve current behavior. New aliasReshapeDst helper centralizes the dst-honoring logic for both Float16Storage and GPUStorage[T] fast-paths.

Evidence (from #79 / zerfoo/zerfoo#371 investigation)

First-batch GPU probe in PatchTST backward, captured on DGX GB10:

post-dFlat-matmul      ptr=0x...628c000 l2=0.0833596 allZero=false
post-dFlat-reshape-dX  ptr=0x...f244000 l2=0         allZero=true
encBwd:entry:dX        ptr=0x...f244000 l2=0         allZero=true

MatMul wrote real gradients into fc.dFlat. The very next call,
engine.Reshape(ctx, fc.dFlat, [1600,64], fc.dX), returned a fresh tensor
aliasing fc.dFlat but left fc.dX (the dst) untouched at its pre-allocated
zeros. Caller discarded the return → encoderBackward saw all-zero input →
every downstream gradient was zero → frozen loss.

Tests

compute/gpu_reshape_dst_test.go:

  • TestGPUEngine_Reshape_HonorsDst: pre-allocates dst with a -999.0 poison pattern, runs Reshape with dst, asserts (a) ret == dst, (b) shape becomes [2,8], (c) dst data reflects src (1..16) not poison. This is the regression test for the silent-zero trap.
  • TestGPUEngine_Reshape_NoDst: preserves the no-dst fast-path behavior — returns a fresh tensor with correct shape and source data.

Both tests skip when CUDA is unavailable (consistent with other GPU tests in this package).

Test plan

  • go build ./... clean
  • go vet ./... clean
  • gofmt clean
  • Full go test ./... ztensor test suite passes on CPU host
  • CI green (incl. CUDA-enabled runner if configured)

Refs

The zero-copy GPUStorage and Float16Storage fast-paths in
GPUEngine.Reshape used to construct a fresh tensor aliasing the source
storage and silently drop the caller-provided dst, violating the
compute.Engine.Reshape contract shared with CPUEngine. Callers that
followed the common pattern of passing a pre-allocated dst and
discarding the return value silently received stale pre-allocated
storage.

This is the root cause of zerfoo PatchTST GPU training convergence
freeze (loss=0.268357 across all epochs on DGX GB10) tracked in #79.
A one-line zerfoo workaround landed in zerfoo/zerfoo#371; this PR
fixes the contract on the ztensor side so the next caller cannot hit
the same trap.

Fix: when dst is provided, mutate dst[0] to alias the reshaped view
(SetStorage + SetShape + SetStrides) and return dst[0]. When no dst
is provided, preserve current behavior (return a fresh tensor wrapping
the view). The new aliasReshapeDst helper centralizes the dst-honoring
logic for both Float16Storage and GPUStorage[T] fast-paths.

Adds compute/gpu_reshape_dst_test.go with two regression tests:
- TestGPUEngine_Reshape_HonorsDst: pre-allocates dst with a poison
  pattern, runs Reshape with dst, asserts (a) ret == dst, (b) shape
  is updated, (c) dst data reflects src not poison.
- TestGPUEngine_Reshape_NoDst: preserves the no-dst fast-path
  behavior — returns a fresh tensor with correct shape and data.

Both tests skip when CUDA is unavailable. Full ztensor test suite
passes on CPU host.

Closes #81
Refs #79
Refs zerfoo/zerfoo#371
@dndungu dndungu merged commit 18a53fe into main Apr 10, 2026
1 check passed
@dndungu dndungu deleted the fix/issue-81-reshape-dst branch April 10, 2026 00:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GPUEngine.Reshape silently ignores dst argument (compute.Engine contract violation)

1 participant