Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use local memory in kernels ($1000 bounty) #557

Closed
wants to merge 93 commits into from
Closed

Conversation

geohot
Copy link
Collaborator

@geohot geohot commented Feb 14, 2023

Claim the bounty by implementing this and having tinygrad generate GEMM kernels for NVIDIA that are faster than torch/cuBLAS.

Clean code only, must be merged to claim bounty.

@geohot geohot changed the title Use local memory in kernels Use local memory in kernels ($1000 bounty) May 25, 2023
@geohot geohot added the bounty Active Bounty label May 25, 2023
@sytandas
Copy link
Contributor

sytandas commented May 27, 2023

Forgive me if I ask rudimentary question.
During execution of -> tinygrad/accel/triton/ops_triton.py
Error -> ImportError: cannot import name 'ExplicitExecAST' from 'tinygrad.ops'
At here (tinygrad/tinygrad/ops.py) I can't find anything 'ExplicitExecAST'.
What I am missing?

@geohot
Copy link
Collaborator Author

geohot commented May 27, 2023

triton isn't supported anymore, you'd have to fix it.

@arandog
Copy link

arandog commented May 28, 2023

@geohot Can you clarify if the goal is to call cuBLAS from tinygrad (e.g. with cupy) or custom GEMM kernel generation that is faster than cuBLAS? The former seems too straightforward, while the latter seems too complex

@sytandas
Copy link
Contributor

sytandas commented May 28, 2023

custom: tinygrad generate GEMM kernel using local memory, faster than cuBLAS/ pytorch.

@tinyb0t
Copy link

tinyb0t commented Jun 1, 2023

Changes made in tinygrad/:

------------------------------------------------------------
files                             insertions       deletions
------------------------------------------------------------
tinygrad/ast.py                            6               0
tinygrad/llops/ops_gpu.py                218              31
tinygrad/runtime/cuda.py                   9               6
tinygrad/runtime/metal.py                 43               2
tinygrad/runtime/opencl.py                13               1
tinygrad/shape/__init__.py                 1               1
------------------------------------------------------------
total                                    290              41
------------------------------------------------------------
lines added in the tinygrad folder: 249

@geohot
Copy link
Collaborator Author

geohot commented Jul 31, 2023

So stale now.

@geohot geohot closed this Jul 31, 2023
@geohot geohot deleted the fast_triton branch October 18, 2023 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bounty Active Bounty
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants