feat(compute): GPU alloc/upload for fused encoder activation by dndungu · Pull Request #88 · zerfoo/ztensor

dndungu · 2026-04-10T21:19:26Z

Summary

Add AllocDeviceFloat32 and CopyToDevice to FusedEncoderProvider interface
Implement on GPUEngine using pool allocation and runtime.Memcpy
Enables callers to create persistent GPU copies of CPU-backed weight tensors

Without this, the fused encoder path always falls back because tensorDevPtr returns nil for CPU-backed tensors created by extractGPUParams.

Test plan

go build ./... passes
go test ./compute/ passes

…rProvider Enable callers to allocate persistent GPU buffers and upload weight data for the fused encoder kernel. Without this, CPU-backed weight tensors have no device pointer and the fused path always falls back to per-op. - AllocDeviceFloat32: pool-managed GPU allocation - CopyToDevice: host-to-device memcpy for float32 arrays

dndungu merged commit 8d6c90b into main Apr 10, 2026
1 check passed

dndungu deleted the feat/fused-encoder-alloc branch April 10, 2026 21:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(compute): GPU alloc/upload for fused encoder activation#88

feat(compute): GPU alloc/upload for fused encoder activation#88
dndungu merged 1 commit intomainfrom
feat/fused-encoder-alloc

dndungu commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dndungu commented Apr 10, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant