New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make ConnectorAwareNodeManager#getWorkerNodes not include coordinator #7007
Conversation
core/trino-main/src/main/java/io/trino/connector/ConnectorAwareNodeManager.java
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/connector/ConnectorAwareNodeManager.java
Outdated
Show resolved
Hide resolved
19ada97
to
f801851
Compare
core/trino-main/src/main/java/io/trino/connector/ConnectorAwareNodeManager.java
Show resolved
Hide resolved
f801851
to
ecf0d2e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, the original code is operating as expected. Don't check this in yet.
The filtering for the coordinator should be handled here:
https://github.com/trinodb/trino/blob/master/core/trino-main/src/main/java/io/trino/execution/scheduler/TopologyAwareNodeSelectorFactory.java#L150
And all throughout this class:
https://github.com/trinodb/trino/blob/master/core/trino-main/src/main/java/io/trino/execution/scheduler/NodeScheduler.java#L49
If you put this check in, it will invalidate the effectiveness of the node-scheduler.include-coordinator
config option.
If one of those original handling sites are not working properly, we should see if we can fix those rather than introduce a new filtering location.
@erichwang |
Oh you are right, I was looking at the other flow. But this does raise an important question on how this was originally being used. For example:
|
trino/plugin/trino-hive/src/main/java/io/trino/plugin/hive/rubix/TrinoClusterManager.java Line 52 in f6422d0
That one I wrote myself and yet I did skipped the problem then somehow ;).
Judging by commit message that introduced
It seems that we don't want to return coordinator as part of trino/plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveNodePartitioningProvider.java Line 86 in 930b0b0
This code was introduced recently and it works correctly. It generates arbitrary number (large enough) for bucket count. Number should be large enough to ensure parallelism on all worker nodes during insert. |
In that case, we probably want to follow up afterwards to remove the coordinator filter so that it doesn't confuse future readers.
I see the Atop connector as similar to the Jmx connector, which is designed to return stats about the nodes themselves. As a consumer for this type of data, I would definitely prefer to see the coordinator included in the output as well in those cases (barring any technical challenges). Also note, the Jmx connector reaches out explicitly to allNodes when generating splits. Anyways, this is just my preference for what I believe to be useful, but the more important thing is to be consistent.
I see, it's not actually used for strict bucket calculation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the previous comments I made, but I don't believe any of those are blocking these changes.
@erichwang it's removed, see https://github.com/qubole/rubix/blob/3149b6385f6685f5fe934551126b6593f59da9c8/rubix-prestosql/src/main/java/com/qubole/rubix/prestosql/ClusterManagerNodeGetter.java#L29 |
No description provided.