Skip to content

Add a preferred tier selection strategy #18136

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jul 9, 2025

Conversation

FrankChen021
Copy link
Member

Description

Use Case

If a Druid cluster is physically deployed in two AZs, we want the queries to be executed within one AZ as possible.
This means queries from brokers to historicals should be executed on the nodes in the same AZ as possible.
See the diagram as below:

image

This PR adds a new tier selector at broker side so that we can configure brokers to select historicals based on tier name(in this use case, the tier name can configured as the AZ name).
And if there're no servers in given tier(typically this is the case when historicals in one AZ are under maintance), brokers can still select historicals in another tier(AZ).

And to use this strategy, users also needs to make changes to their load rules to ensure segments are loaded in two tier(AZ)

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new preferred tier selection strategy for broker queries, enabling brokers to select historical servers within a preferred Availability Zone when available. Key changes include:

  • Adding new configuration bindings for the Preferred Tier Selector.
  • Implementing the PreferredTierSelectorStrategy class and associated configuration.
  • Updating tests and documentation to support the new strategy.

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.

Show a summary per file
File Description
services/src/main/java/org/apache/druid/cli/CliBroker.java Binds configuration for the new preferred tier strategy.
server/src/test/java/org/apache/druid/client/selector/TierSelectorStrategyTest.java Adds unit tests to validate preferred tier behavior.
server/src/main/java/org/apache/druid/client/selector/TierSelectorStrategy.java Updates JSON subtypes to include the preferred strategy.
server/src/main/java/org/apache/druid/client/selector/ServerSelectorStrategy.java
RandomServerSelectorStrategy.java
ConnectionCountServerSelectorStrategy.java
Updates method signatures to use Collection instead of Set.
server/src/main/java/org/apache/druid/client/selector/PreferredTierSelectorStrategy.java Implements the new preferred tier selection strategy.
server/src/main/java/org/apache/druid/client/selector/PreferredTieSelectorStrategyConfig.java Adds configuration for the preferred tier strategy.
docs/configuration/index.md Documents the new configuration options for preferred tier selection.
Comments suppressed due to low confidence (4)

server/src/main/java/org/apache/druid/client/selector/PreferredTierSelectorStrategy.java:41

  • The field name 'perferredTier' appears to be misspelled; consider renaming it to 'preferredTier' for clarity and consistency.
private final String perferredTier;

server/src/main/java/org/apache/druid/client/selector/PreferredTierSelectorStrategy.java:42

  • The variable 'priortyStrategy' is misspelled; it should be renamed to 'priorityStrategy' to accurately reflect its purpose.
private final TierSelectorStrategy priortyStrategy;

server/src/main/java/org/apache/druid/client/selector/PreferredTieSelectorStrategyConfig.java:28

  • The class name 'PreferredTieSelectorStrategyConfig' seems inconsistent with the feature's focus on tier selection; consider renaming it to 'PreferredTierSelectorStrategyConfig'.
public class PreferredTieSelectorStrategyConfig

docs/configuration/index.md:1795

  • The documentation for 'druid.broker.select.tier.preferred.priority' indicates possible values 'high' and 'low', yet the implementation expects 'highest' or 'lowest'. Please update the docs to reflect the correct values.
|`druid.broker.select.tier.preferred.priority`| `high`, `low` | Optional. If there're multiple candidates in a preferred tier...

@abhishekrb19
Copy link
Contributor

@FrankChen021 thanks for the PR! Curious, how is this different from broker tiering? Wouldn’t it be possible to achieve the same kind of preferred tier selection across AZs by configuring tiers on the brokers?

@FrankChen021
Copy link
Member Author

FrankChen021 commented Jun 13, 2025

@FrankChen021 thanks for the PR! Curious, how is this different from broker tiering? Wouldn’t it be possible to achieve the same kind of preferred tier selection across AZs by configuring tiers on the brokers?

  1. broker tiering requires the router to route the queries to different brokers, which introduces extra deployment/dependency. Since router is an optional component, there should be a way that the broker can handle routing.
  2. broker tiering requires the client to set some query context parameters for routing. in another word, It's the client that is responsible for routing, which is not the case here
  3. broker tiering does not provide a way to achieve fallback routing. If historical nodes in one AZ are down, we need to route queries to historicals nodes in another AZ

@capistrant capistrant added this to the 34.0.0 milestone Jun 25, 2025
Copy link
Contributor

@capistrant capistrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks reasonable to me. Left an assortment of minor comments

ServerSelectorStrategy change from Set --> Collection in param lists. I'd like clarification on why this is needed. It seems counter intuitive to me to allow duplicate servers passed to pick

docs: Left suggestions related to the docs. Feel free to use the suggestions or to replace them with your own words! I just want it to be more clear when these configs matter vs when they'd be ignored.

);
}

private void testPreferredTierSelectorStrategy(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a javadoc specifying a list of what things are being validated in here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@@ -38,26 +38,26 @@
public interface ServerSelectorStrategy
{
@Nullable
default <T> QueryableDruidServer pick(@Nullable Query<T> query, Set<QueryableDruidServer> servers, DataSegment segment)
default <T> QueryableDruidServer pick(@Nullable Query<T> query, Collection<QueryableDruidServer> servers, DataSegment segment)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me why we are removing the Set restriction here? Your PR doesn't seem to introduce a change that requires us to now allow duplicate servers passed to pick.. or does it? Maybe I'm missing something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the PreferredTierSelectorStrategy.pick, a List is used to save candidate servers. So the Set here needs to be changed to Collection so that both List and original Set work.

FrankChen021 and others added 4 commits July 1, 2025 11:53
Co-authored-by: Lucas Capistrant <capistrant@users.noreply.github.com>
Co-authored-by: Lucas Capistrant <capistrant@users.noreply.github.com>
Co-authored-by: Lucas Capistrant <capistrant@users.noreply.github.com>
Copy link
Contributor

@capistrant capistrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After pulling this down to my local and running the UTs you added on repeat, I got lots of test fails and feel like it points to the PR as is not acting completely how you want it to. Please see my comments and let me know if I'm on the right track or if there is some misunderstanding I'm having. Thanks!

new DruidServer("test3", "localhost", null, 0, ServerType.HISTORICAL, "tier1", 2),
client
);
QueryableDruidServer yetAnotherTierMediumPriority = new QueryableDruidServer(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this introduce randomness to the tests adding a second server of equal priority? That is at least what it seemed like on my local. I got intermittent test failures for this test because the two servers with priority of 1 were flip flopping in actual return order.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like two servers with same priority bring different results with multiple runs.
fixed by setting different priorities for this cases to make sure the test result is stable.

@cryptoe cryptoe requested a review from adarshsanjeev July 7, 2025 14:52
@@ -34,7 +34,7 @@
*/
public abstract class AbstractTierSelectorStrategy implements TierSelectorStrategy
{
private final ServerSelectorStrategy serverSelectorStrategy;
protected final ServerSelectorStrategy serverSelectorStrategy;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be restored to private now that PreferredTierSelectorStrategy is delegating to highest or lowest priority strategy instead of calling server selector direct

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted. thanks for pointing out

Copy link
Contributor

@capistrant capistrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one stand-a-lone comment on flipping an instance var back to private instead of protected (matching pre PR state). That would be cool to include before merging, but is a little bit of a nit.

@capistrant
Copy link
Contributor

Left one stand-a-lone comment on flipping an instance var back to private instead of protected (matching pre PR state). That would be cool to include before merging, but is a little bit of a nit.

@FrankChen021 if you could submit a 1 line follow up for future releases to include, that would be great

@capistrant capistrant merged commit e5469a0 into apache:master Jul 9, 2025
76 checks passed
@FrankChen021
Copy link
Member Author

@capistrant You merged this right before I reverted the change you pointed out😄. let me submit in a follow up pr.

@FrankChen021 FrankChen021 mentioned this pull request Jul 9, 2025
10 tasks
@cryptoe
Copy link
Contributor

cryptoe commented Jul 21, 2025

@FrankChen021 . Could you please update the release notes with the following information :

  • What config changes are required to enable this feature
  • How does the user benefit from this feature. (this is partly there)
  • Known issues/drawbacks.

@cryptoe cryptoe mentioned this pull request Jul 21, 2025
10 tasks
@FrankChen021
Copy link
Member Author

@FrankChen021 . Could you please update the release notes with the following information :

  • What config changes are required to enable this feature

To enable it, configure druid.broker.select.tier to perferred at broker side and configure historical nodes into different tiers by setting druid.server.tier

  • How does the user benefit from this feature. (this is partly there)

This is useful for across availability zone deployment. Brokers in one AZ select historicals in the same AZ by default but still keeps the ability to select historical nodes in another AZ if historicals in the same AZ are not available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants