New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scale writers when write is partitioned #10791
Comments
+1 to this idea |
Relevant thread: https://trinodb.slack.com/archives/CFLB9AMBN/p1642961339264200 |
This might be implemented by introducing a special This solution is suitable for piplinemode. For tardigrade I think insert tasks for a partition would probably need to be scaled differently (cc. @losipiuk @arhimondr) |
For fault tolerant execution it should be possible to split partition dynamically. This mechanism could also be useful for handling skews in joins. Though it is unclear when we are gonna be able to start working on it. |
Ran into this recently and thought it would be helpful to add some context on our case. We write to iceberg using transforms in our partition column (hour(), day() etc.). when using transforms in partition columns trino uses iceberg's bucket function for partitioning, this means that the planner can not choose writer scaling or redistributed writes. This leads to writer skew when we write a single large partition. StepsHeres a break down of what is happening:
Heres a diagram that maps out all codepaths related to partitioned writes in iceberg. |
I'd like to work on this issue and I'm curious about the adaptive exchange idea and how it differs from the current implementation of writer scaling. I've tried a change which forces writer scaling and it works pretty well for iceberg, however we sometimes run into OOMs for hive based writes. |
Support writer scaling when write is partitioned, including partitioned or bucketed Hive or Iceberg tables, or OPTIMIZE (which may force repartitioning, overriding, preferred partitioning tuning knobs).
cc @sopel39
The text was updated successfully, but these errors were encountered: