Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ConnectorAwareNodeManager#getWorkerNodes not include coordinator #7007

Merged
merged 1 commit into from Feb 26, 2021

Conversation

sopel39
Copy link
Member

@sopel39 sopel39 commented Feb 23, 2021

No description provided.

@cla-bot cla-bot bot added the cla-signed label Feb 23, 2021
@sopel39 sopel39 added the bug Something isn't working label Feb 23, 2021
Copy link
Contributor

@erichwang erichwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, the original code is operating as expected. Don't check this in yet.

The filtering for the coordinator should be handled here:
https://github.com/trinodb/trino/blob/master/core/trino-main/src/main/java/io/trino/execution/scheduler/TopologyAwareNodeSelectorFactory.java#L150

And all throughout this class:
https://github.com/trinodb/trino/blob/master/core/trino-main/src/main/java/io/trino/execution/scheduler/NodeScheduler.java#L49

If you put this check in, it will invalidate the effectiveness of the node-scheduler.include-coordinator config option.

If one of those original handling sites are not working properly, we should see if we can fix those rather than introduce a new filtering location.

@sopel39
Copy link
Member Author

sopel39 commented Feb 24, 2021

@erichwang ConnectorAwareNodeManager is used before scheduling happens. It can be used by connectors to generate splits (which are later scheduled) and assign them to nodes. ConnectorAwareNodeManager wasn't honoring node-scheduler.include-coordinator at all and returned coordinator as part of getWorkersNodes result set.

@erichwang
Copy link
Contributor

erichwang commented Feb 24, 2021

Oh you are right, I was looking at the other flow.

But this does raise an important question on how this was originally being used. For example:

@sopel39
Copy link
Member Author

sopel39 commented Feb 25, 2021

@erichwang

That one I wrote myself and yet I did skipped the problem then somehow ;). TrinoClusterManager requires nodes list without coordinator so this change is ok. In fact, in newer Rubix code it no longer filters out coordinator (assuming that getWorkerNodes works correctly, but it doesn't)

for (Node node : nodeManager.getWorkerNodes()) {

Judging by commit message that introduced getWorkerNodes (previously it was getAllNodes():

Fix atop connector to work with dedicated coordinator  Currently, if node-scheduler.include-coordinator=false, a split is still generated for the coordinator, which will then not be able to process it, failing the query.

It seems that we don't want to return coordinator as part of getWorkerNodes() when node-scheduler.include-coordinator=false

return createBucketNodeMap(nodeManager.getRequiredWorkerNodes().size() * PARTITIONED_BUCKETS_PER_NODE);

This code was introduced recently and it works correctly. It generates arbitrary number (large enough) for bucket count. Number should be large enough to ensure parallelism on all worker nodes during insert.

@erichwang
Copy link
Contributor

That one I wrote myself and yet I did skipped the problem then somehow ;). TrinoClusterManager requires nodes list without coordinator so this change is ok. In fact, in newer Rubix code it no longer filters out coordinator (assuming that getWorkerNodes works correctly, but it doesn't)

In that case, we probably want to follow up afterwards to remove the coordinator filter so that it doesn't confuse future readers.

for (Node node : nodeManager.getWorkerNodes()) {

Judging by commit message that introduced getWorkerNodes (previously it was getAllNodes():

Fix atop connector to work with dedicated coordinator  Currently, if node-scheduler.include-coordinator=false, a split is still generated for the coordinator, which will then not be able to process it, failing the query.

It seems that we don't want to return coordinator as part of getWorkerNodes() when node-scheduler.include-coordinator=false

I see the Atop connector as similar to the Jmx connector, which is designed to return stats about the nodes themselves. As a consumer for this type of data, I would definitely prefer to see the coordinator included in the output as well in those cases (barring any technical challenges). Also note, the Jmx connector reaches out explicitly to allNodes when generating splits. Anyways, this is just my preference for what I believe to be useful, but the more important thing is to be consistent.

return createBucketNodeMap(nodeManager.getRequiredWorkerNodes().size() * PARTITIONED_BUCKETS_PER_NODE);

This code was introduced recently and it works correctly. It generates arbitrary number (large enough) for bucket count. Number should be large enough to ensure parallelism on all worker nodes during insert.

I see, it's not actually used for strict bucket calculation.

Copy link
Contributor

@erichwang erichwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the previous comments I made, but I don't believe any of those are blocking these changes.

@sopel39
Copy link
Member Author

sopel39 commented Feb 26, 2021

In that case, we probably want to follow up afterwards to remove the coordinator filter so that it doesn't confuse future readers.

@erichwang it's removed, see https://github.com/qubole/rubix/blob/3149b6385f6685f5fe934551126b6593f59da9c8/rubix-prestosql/src/main/java/com/qubole/rubix/prestosql/ClusterManagerNodeGetter.java#L29

@sopel39 sopel39 merged commit 8fa406d into trinodb:master Feb 26, 2021
@sopel39 sopel39 deleted the ks/fix_coordinator branch February 26, 2021 11:43
@sopel39 sopel39 mentioned this pull request Feb 26, 2021
10 tasks
@martint martint added this to the 353 milestone Mar 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cla-signed
Development

Successfully merging this pull request may close these issues.

None yet

4 participants