Skip to content

feat(compute): GPU alloc/upload for fused encoder activation#88

Merged
dndungu merged 1 commit intomainfrom
feat/fused-encoder-alloc
Apr 10, 2026
Merged

feat(compute): GPU alloc/upload for fused encoder activation#88
dndungu merged 1 commit intomainfrom
feat/fused-encoder-alloc

Conversation

@dndungu
Copy link
Copy Markdown
Contributor

@dndungu dndungu commented Apr 10, 2026

Summary

  • Add AllocDeviceFloat32 and CopyToDevice to FusedEncoderProvider interface
  • Implement on GPUEngine using pool allocation and runtime.Memcpy
  • Enables callers to create persistent GPU copies of CPU-backed weight tensors

Without this, the fused encoder path always falls back because tensorDevPtr returns nil for CPU-backed tensors created by extractGPUParams.

Test plan

  • go build ./... passes
  • go test ./compute/ passes

…rProvider

Enable callers to allocate persistent GPU buffers and upload weight data
for the fused encoder kernel. Without this, CPU-backed weight tensors
have no device pointer and the fused path always falls back to per-op.

- AllocDeviceFloat32: pool-managed GPU allocation
- CopyToDevice: host-to-device memcpy for float32 arrays
@dndungu dndungu merged commit 8d6c90b into main Apr 10, 2026
1 check passed
@dndungu dndungu deleted the feat/fused-encoder-alloc branch April 10, 2026 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant