New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use local memory in kernels ($1000 bounty) #557
Conversation
Forgive me if I ask rudimentary question. |
triton isn't supported anymore, you'd have to fix it. |
@geohot Can you clarify if the goal is to call cuBLAS from tinygrad (e.g. with cupy) or custom GEMM kernel generation that is faster than cuBLAS? The former seems too straightforward, while the latter seems too complex |
custom: tinygrad generate GEMM kernel using local memory, faster than cuBLAS/ pytorch. |
Changes made in
|
So stale now. |
Claim the bounty by implementing this and having tinygrad generate GEMM kernels for NVIDIA that are faster than torch/cuBLAS.
Clean code only, must be merged to claim bounty.