Use local memory in kernels ($1000 bounty) #557

geohot · 2023-02-14T17:40:00Z

Claim the bounty by implementing this and having tinygrad generate GEMM kernels for NVIDIA that are faster than torch/cuBLAS.

Clean code only, must be merged to claim bounty.

…nits

sytandas · 2023-05-27T02:03:41Z

Forgive me if I ask rudimentary question.
During execution of -> tinygrad/accel/triton/ops_triton.py
Error -> ImportError: cannot import name 'ExplicitExecAST' from 'tinygrad.ops'
At here (tinygrad/tinygrad/ops.py) I can't find anything 'ExplicitExecAST'.
What I am missing?

geohot · 2023-05-27T02:57:22Z

triton isn't supported anymore, you'd have to fix it.

arandog · 2023-05-28T07:27:43Z

@geohot Can you clarify if the goal is to call cuBLAS from tinygrad (e.g. with cupy) or custom GEMM kernel generation that is faster than cuBLAS? The former seems too straightforward, while the latter seems too complex

sytandas · 2023-05-28T07:47:01Z

custom: tinygrad generate GEMM kernel using local memory, faster than cuBLAS/ pytorch.

tinyb0t · 2023-06-01T14:36:16Z

Changes made in tinygrad/:

------------------------------------------------------------
files                             insertions       deletions
------------------------------------------------------------
tinygrad/ast.py                            6               0
tinygrad/llops/ops_gpu.py                218              31
tinygrad/runtime/cuda.py                   9               6
tinygrad/runtime/metal.py                 43               2
tinygrad/runtime/opencl.py                13               1
tinygrad/shape/__init__.py                 1               1
------------------------------------------------------------
total                                    290              41
------------------------------------------------------------
lines added in the tinygrad folder: 249

geohot · 2023-07-31T18:11:45Z

So stale now.

geohot added 30 commits February 12, 2023 11:27

for triton speed, we need blocks

3fb1160

3.7 TFLOPS

277c770

wtf 5+ TFLOPS

a9932bc

slowww

739bef7

10 tflops

430d876

compiled ptx works, but is slower

aaa908c

disparity from launch timings

0696ad6

one more cuda -> cl

1ba0bf9

cuda timing

e52b02f

the setup

abdd626

local cache

84cedf6

generation works

651d527

generation works

740d8b3

other upcast

e60523f

add barrier

94664a9

double with hack

7930c2e

gemm

f7211c1

better thought through

fa3ab5e

minor bugfixes

59b8308

5.7 GFLOPS

eca3229

int -> size_t was free speed

8a6b1e3

minor fixes

60f6f62

fix wrong axis

4e3855b

working

d82947e

correct variable

dc3d63f

output match

62c3a97

float4

0c9f614

fix cuda

0c9f4d5

fix float4 in cuda

4240514

record 7TFLOPS

eadcc99

geohot added 20 commits February 17, 2023 14:06

4.2 TFLOPS

cd7aaab

slow ass simd garbage

7ccb45f

global improved nothing

a43757e

metal llvm dump

a9cbd4f

4.3 TFLOPS with simd

5e6a548

6.7 TFLOPS

f0e96fd

7.8 TFLOPS

48c13b2

ugh that's faster

570f48f

a lil faster

4b26408

dump shader

865a5dd

all done

d5251dc

metal has only fmadd

4469495

questioning benchmarks

efcfcfe

benchmark is scam

e5868fa

okay, that's faster. i think it's 4 SIMD that can't use the 8 logic u…

342a7ad

…nits

5.2 GFLOPS with half

1235164

8.1 TFLOPS

21bb8ca

8.3 TFLOPS

95515bb

compile metal with xcode

ca6d0d7

async junk

cff479a

geohot changed the title ~~Use local memory in kernels~~ Use local memory in kernels ($1000 bounty) May 25, 2023

geohot added the bounty Active Bounty label May 25, 2023

geohot closed this Jul 31, 2023

geohot deleted the fast_triton branch October 18, 2023 17:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use local memory in kernels ($1000 bounty) #557

Use local memory in kernels ($1000 bounty) #557

geohot commented Feb 14, 2023 •

edited

sytandas commented May 27, 2023 •

edited

geohot commented May 27, 2023

arandog commented May 28, 2023

sytandas commented May 28, 2023 •

edited

tinyb0t commented Jun 1, 2023

geohot commented Jul 31, 2023

Use local memory in kernels ($1000 bounty) #557

Use local memory in kernels ($1000 bounty) #557

Conversation

geohot commented Feb 14, 2023 • edited

sytandas commented May 27, 2023 • edited

geohot commented May 27, 2023

arandog commented May 28, 2023

sytandas commented May 28, 2023 • edited

tinyb0t commented Jun 1, 2023

geohot commented Jul 31, 2023

geohot commented Feb 14, 2023 •

edited

sytandas commented May 27, 2023 •

edited

sytandas commented May 28, 2023 •

edited