-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1. Implement QueryPlanning and DataSource Constraints into Resolvers and Matchers #169
Conversation
1252271
to
785417f
Compare
6a1665d
to
c728200
Compare
c728200
to
472d905
Compare
private final LogicalTable logicalTable; | ||
private final Set<Interval> intervals; | ||
private final Set<LogicalMetric> logicalMetrics; | ||
private final Granularity minumumTimeGran; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spelling: minimum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be minimumGranularity, not timeGrain. Tables can't have the 'All' granularity, but requests can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approval contingent on trivial style fixes.
|
||
/** | ||
* Constructor saves metrics, dimensions, coarsest time grain, and logical table name (for logging). | ||
* | ||
* @param request The request whose dimensions are being matched on | ||
* @param query The query whose columns are being matched | ||
* @param requestConstraints contains the request constraints extracted from DataApiRequest and TemplateDruidQuery |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spacing: requestConstraints (only one space, needs two)
@@ -18,16 +16,14 @@ | |||
* Choose the best fit Physical Table from a table group. | |||
* | |||
* @param candidateTables The tables being considered for match | |||
* @param apiRequest The ApiRequest for the query | |||
* @param query a partial query representation | |||
* @param requestConstraints contains the request constraints extracted from DataApiRequest and TemplateDruidQuery |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spacing: two expected after param name
private final LogicalTable logicalTable; | ||
private final Set<Interval> intervals; | ||
private final Set<LogicalMetric> logicalMetrics; | ||
private final Granularity minumumTimeGran; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be minimumGranularity, not timeGrain. Tables can't have the 'All' granularity, but requests can.
|
||
public Granularity getMinimumTimeGran() { | ||
return minumumTimeGran; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getter name change minimumGranularity not time grain.
@@ -31,15 +30,15 @@ | |||
* Stores the request table name and intervals and creates a predicate to test a physical table based on request | |||
* intervals. | |||
* | |||
* @param request The Api Request being matched against | |||
* @param requestConstraints contains the request constraints extracted from DataApiRequest and TemplateDruidQuery | |||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spacing: two spaces after param name
private final PartialDataHandler partialDataHandler; | ||
private final VolatileIntervalsService volatileIntervalsService; | ||
|
||
/** | ||
* Builds a table comparator that compares tables based on how much data there is in their volatile intervals. | ||
* | ||
* @param query The query to send to druid, and that requires a table | ||
* @param apiRequest The data request | ||
* @param requestConstraints contains the request constraints extracted from DataApiRequest and TemplateDruidQuery |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spacing: two spaces after param name
groupByQuery.getInnermostQuery() >> { innerQuery } | ||
groupByQuery.getMetricDimensions() >> [] | ||
dataSourceConstraint = Stub(QueryPlanningConstraint) | ||
dataSourceConstraint.getAllColumnNames() >> columnNames | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason you're using Stub rather than the Mock we usually use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't care about cardinality on calls, a Stub
is better, since it allows for just the behavior stubbing parts, w/o asserting anything about how much it's called. Not having assertions that the test doesn't really care about allows for less-brittle tests.
472d905
to
cb3834f
Compare
cb3834f
to
1885c3e
Compare
@@ -25,7 +25,7 @@ | |||
public TableDataSource(ConcretePhysicalTable physicalTable) { | |||
super(DefaultDataSourceType.TABLE, Collections.singleton(physicalTable)); | |||
|
|||
this.name = physicalTable.getFactTableName(); | |||
this.name = physicalTable.getAvailability().getDataSourceNames().stream().findFirst().get().asName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're only taking in a ConcretePhysicalTable
, could we make ImmutableAvailability
(since it's tied to only 1 availability) have a getDataSourceName
method? It would simplify this constructor quite a bit, reduce it's assumptions, and let it leverage the type system a bit better.
And while we're at it, could we rename ImmutableAvailability
to ConcreteAvailability
to better match the other concepts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes are implemented in the third PR #179
.map(Dimension::getApiName) | ||
.collect(Collectors.toSet()); | ||
Granularity minimumTableTimeGrain = requestConstraints.getMinimumGranularity(); | ||
Set<String> columnNames = requestConstraints.getAllColumnNames(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we actually inline these two variables into the places they are called? They seem to only be used for a trace
logging, and an error handler, which shouldn't be called in the common case, meaning that we generally won't need to compute these values.
.map(LogicalMetric::getName) | ||
.collect(Collectors.toCollection(LinkedHashSet::new)); | ||
Set<String> requestDimensionNames = requestConstraints.getAllDimensionNames(); | ||
Set<String> requestMetricNames = requestConstraints.getLogicalMetricNames(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can just inline these, now that they are simple
Granularity minimumTableTimeGrain, | ||
TemplateDruidQuery innerQuery | ||
QueryPlanningConstraint requestConstraints, | ||
Granularity minimumTableTimeGrain |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can unwrap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this parameter is no longer needed, because it can get pulled from the requestConstraints
// Minimum grain at which the request can be aggregated from | ||
Granularity minimumTableTimeGrain = resolveAcceptingGrain.apply(apiRequest, query); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change means the instance variable resolveAcceptingGrain
on this class is no longer needed.
* @param minimumTableTimeGrain Minimum grain that we needed to meet | ||
* @param innerQuery Innermost query for the query we were trying to match | ||
*/ | ||
public void logMatchException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once this is more fully tightened, this method is able to be inlined into the only place it's used: the catch
block of the resolve
method. So, perhaps we inline this method as well?
|
||
BinaryOperator<PhysicalTable> betterTable = getBetterTableOperator(apiRequest, query); | ||
BinaryOperator<PhysicalTable> betterTable = getBetterTableOperator(requestConstraints); | ||
PhysicalTable bestTable = physicalTables.stream().reduce(betterTable).get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can probably get packed down to just a single statement, if you wanted to tighten this class up even more.
this.filterDimensions = dataApiRequest.getFilterDimensions(); | ||
this.metricDimensions = templateDruidQuery.getMetricDimensions(); | ||
this.metricNames = templateDruidQuery.getDependentFieldNames(); | ||
this.apiFilters = dataApiRequest.getFilters(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should defensively copy these collections
} | ||
|
||
public Set<String> getRequestDimensionNames() { | ||
return getRequestDimensions().stream().map(Dimension::getApiName).collect(Collectors.toSet()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're only using this for error handling. If that's the only use for this, can we drop this method?
If we're expecting to make use of this for query processing, perhaps we should either cache the result, or compute it on construction?
getRequestDimensions().stream(), | ||
getFilterDimensions().stream(), | ||
getMetricDimensions().stream() | ||
).flatMap(Function.identity()).collect(Collectors.toSet()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Precompute or lazy/cache compute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more feedback. (I'm now into the tests)
} | ||
|
||
public Set<String> getAllColumnNames() { | ||
return Stream.of( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should precompute this in the constructor.
AggregatableDimensionsMatcher aggregatabilityMatcher = new AggregatableDimensionsMatcher(apiRequest, query); | ||
public List<PhysicalTableMatcher> getMatchers(QueryPlanningConstraint requestConstraints) { | ||
SchemaPhysicalTableMatcher schemaMatcher = | ||
new SchemaPhysicalTableMatcher(requestConstraints); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can unwrap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to keep a nice pattern here so that the pattern can be easily recognized. Let me know otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think inlineing all of them is actually much cleaner:
return Arrays.asList(
new SchemaPhysicalTableMatcher(requestConstraints),
new AggregatableDimensionsMatcher(requestConstraints),
new TimeAlignmentPhysicalTableMatcher(requestConstraints)
);
new TimeAlignmentPhysicalTableMatcher(requestConstraints); | ||
|
||
AggregatableDimensionsMatcher aggregatabilityMatcher = | ||
new AggregatableDimensionsMatcher(requestConstraints); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can just inline all of these, now that their signatures are much smaller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of them cannot, the Time Alignment one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks to me like they can:
return Arrays.asList(
new SchemaPhysicalTableMatcher(requestConstraints),
new AggregatableDimensionsMatcher(requestConstraints),
new TimeAlignmentPhysicalTableMatcher(requestConstraints)
);
😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, ic nice
comparators.add( | ||
new VolatileTimeComparator(apiRequest, query, partialDataHandler, volatileIntervalsService) | ||
); | ||
new PartialTimeComparator(requestConstraints, partialDataHandler)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can unwrap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to keep a nice pattern here so that the pattern can be easily recognized. Let me know otherwise.
this.intervals = dataApiRequest.getIntervals(); | ||
this.minimumGranularity = new RequestQueryGranularityResolver().apply(dataApiRequest, templateDruidQuery); | ||
this.requestGranularity = dataApiRequest.getGranularity(); | ||
this.logicalMetrics = dataApiRequest.getLogicalMetrics(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use defensive copies
} | ||
|
||
public Set<String> getLogicalMetricNames() { | ||
return getLogicalMetrics().stream().map(LogicalMetric::getName).collect(Collectors.toSet()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compute at construction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
QueryPlanningConstraint
to replace current interface of Matchers and Resolvers arguments during query planningDataSourceConstraint
to allow implementation ofPartitionedFactTable
's availability in the near futurefindMissingTimeGrainIntervals
method inPartialDataHandler
to take a set of columns instead ofDataApiRequest
andDruidAggregationQuery