Open
Description
The latest work in #2487 has left a lot of space for optimization. Here are some of the things I noted whilst working on it that we should address in a later version:
- Solve the many "incoming data arrows" in __DataOnlySteps in plan diagrams - the __DataOnlyStep can often depend on the same underlying step many times unnecessarily
- We should make it so that
dataOnly
is a dependency option rather than a step at creation time, and then we should turn it into a step only when it's needed rather than always having it be a step and then optimizing it away when it's not needed as we do currently - There's a lot more space to optimize data only steps, but there's complexity around identifying when they can optimally be optimized (without causing issues downstream). Current theory is that we can leave them in the plan diagram as is but "group" them, at execution time we can then combine them all together and run them through just one execute method (i.e. treat it as a larger batch size, the reverse of what we do for filtering errors out of the EVs passed into
execute()
) and then split them back up again into the ExecutionValues for storage. Really we should deduplicate where we can though, otherwise downstream plans cannot optimize so well. - Polymorphic paths are possibly not needed, POSSIBLY we should consider only tracking the latest type. However, this is risky, because it may mean that we end up executing things we don't need to.
- We should revisit hoist and pushdown around polymorphic/polymorphicPartition layer plans - we only want to take on as much work as necessary, but we also want to enable solid deduplication
- I also have a note "should polymorphicPartition be a defer root" but I'm not 100% on what I was thinking there.
-
polymorphicPathList
is hideous, in particular this line of code is almost unbearably awful:
crystal/grafast/grafast/src/engine/LayerPlan.ts
Lines 818 to 822 in eedd2fd
Metadata
Metadata
Assignees
Type
Projects
Status
🌳 Triage