Move Reduceops into `ast_parse` - pop ctx method #4407

0xtimmy · 2024-05-03T17:37:19Z

this is the method I settled on for the best way to properly order multiple reduceops

it renders them in ast_parse by "undoing" any uops that belong to a prior loop context, renders the full reduction, then puts those uops back

the "undoing" portion might be bandaid-ey; the main point is to render the entire reduceop and then put in the right place in the uop graph. undoing and then redoing seemed more intuitive because it also removed any context from the rendering of the second reduceop (ex. in x.std(), both reduce loops need to load in x. but when the recursion hits the mean reduce, it will already have x in self.load_cache from the variance reduce)

render_reduceop could also build a sort of subgraph and then ast_parse could handle where to insert it, but that would require a bigger diff

github-actions · 2024-05-03T17:39:24Z

Changes

Name                              Lines    Diff    Tokens/Line    Diff
------------------------------  -------  ------  -------------  ------
tinygrad/codegen/kernel.py          455      +0           18.5    +0.0
tinygrad/codegen/linearizer.py      360     +19           19.1    -0.2
tinygrad/codegen/uops.py            323      +0           17.2    +0.1


total lines changes: +19

Qazalin

tests failing https://github.com/tinygrad/tinygrad/actions/runs/8942779865/job/24566055877?pr=4407#step:18:465

I still think pre-sorting self.reduceops with dfs is preferable.
A lot of what you're trying to handle in the linearizer is already taken care of when building the AST.

0xtimmy · 2024-05-03T18:41:38Z

hmmm fair enough, I'll see if I can make something to do the dfs quickly

chaosagent · 2024-05-04T10:03:55Z

i'm working on parallel reduces (two reduceops in one loop). my understanding is you are working on series reduces.

what do you think about self.reduceops = [[r1, r2], [r3], [r4, r5, r6]] where (r1, r2) and (r4, r5, r6) are parallel and [a, b, c] are in series? the self.reduceops list would be toposorted upon linearizer creation. then we can just write for reduceops in self.reduceops: self.render_reduceop(reduceops, ...).

0xtimmy · 2024-05-04T13:48:26Z

yeah I like that better, here's the draft pr for all my linearizer changes #4409

0xtimmy added 3 commits May 3, 2024 13:09

pop ctx method

d40eeee

linters

2556bef

Merge remote-tracking branch 'upstream/master' into pop-ctx

0ad8ac2

Qazalin reviewed May 3, 2024

View reviewed changes

0xtimmy closed this May 4, 2024

0xtimmy deleted the pop-ctx branch May 4, 2024 13:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move Reduceops into `ast_parse` - pop ctx method #4407

Move Reduceops into `ast_parse` - pop ctx method #4407

0xtimmy commented May 3, 2024

github-actions bot commented May 3, 2024

Qazalin left a comment

0xtimmy commented May 3, 2024

chaosagent commented May 4, 2024

0xtimmy commented May 4, 2024

Move Reduceops into ast_parse - pop ctx method #4407

Move Reduceops into ast_parse - pop ctx method #4407

Conversation

0xtimmy commented May 3, 2024

github-actions bot commented May 3, 2024

Changes

Qazalin left a comment

Choose a reason for hiding this comment

0xtimmy commented May 3, 2024

chaosagent commented May 4, 2024

0xtimmy commented May 4, 2024

Move Reduceops into `ast_parse` - pop ctx method #4407

Move Reduceops into `ast_parse` - pop ctx method #4407