Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move Reduceops into ast_parse - pop ctx method #4407

Closed
wants to merge 3 commits into from

Conversation

0xtimmy
Copy link
Contributor

@0xtimmy 0xtimmy commented May 3, 2024

this is the method I settled on for the best way to properly order multiple reduceops

it renders them in ast_parse by "undoing" any uops that belong to a prior loop context, renders the full reduction, then puts those uops back

the "undoing" portion might be bandaid-ey; the main point is to render the entire reduceop and then put in the right place in the uop graph. undoing and then redoing seemed more intuitive because it also removed any context from the rendering of the second reduceop (ex. in x.std(), both reduce loops need to load in x. but when the recursion hits the mean reduce, it will already have x in self.load_cache from the variance reduce)

render_reduceop could also build a sort of subgraph and then ast_parse could handle where to insert it, but that would require a bigger diff

Copy link
Contributor

github-actions bot commented May 3, 2024

Changes

Name                              Lines    Diff    Tokens/Line    Diff
------------------------------  -------  ------  -------------  ------
tinygrad/codegen/kernel.py          455      +0           18.5    +0.0
tinygrad/codegen/linearizer.py      360     +19           19.1    -0.2
tinygrad/codegen/uops.py            323      +0           17.2    +0.1


total lines changes: +19

Copy link
Collaborator

@Qazalin Qazalin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests failing https://github.com/tinygrad/tinygrad/actions/runs/8942779865/job/24566055877?pr=4407#step:18:465

I still think pre-sorting self.reduceops with dfs is preferable.
A lot of what you're trying to handle in the linearizer is already taken care of when building the AST.

@0xtimmy
Copy link
Contributor Author

0xtimmy commented May 3, 2024

hmmm fair enough, I'll see if I can make something to do the dfs quickly

@chaosagent
Copy link
Contributor

i'm working on parallel reduces (two reduceops in one loop). my understanding is you are working on series reduces.

what do you think about self.reduceops = [[r1, r2], [r3], [r4, r5, r6]] where (r1, r2) and (r4, r5, r6) are parallel and [a, b, c] are in series? the self.reduceops list would be toposorted upon linearizer creation. then we can just write for reduceops in self.reduceops: self.render_reduceop(reduceops, ...).

@0xtimmy
Copy link
Contributor Author

0xtimmy commented May 4, 2024

yeah I like that better, here's the draft pr for all my linearizer changes #4409

@0xtimmy 0xtimmy closed this May 4, 2024
@0xtimmy 0xtimmy deleted the pop-ctx branch May 4, 2024 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants