Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1. Implement QueryPlanning and DataSource Constraints into Resolvers and Matchers #169

Merged
merged 2 commits into from
Mar 15, 2017

Conversation

garyluoex
Copy link
Collaborator

@garyluoex garyluoex commented Feb 14, 2017

  • Added QueryPlanningConstraint to replace current interface of Matchers and Resolvers arguments during query planning
  • Added DataSourceConstraint to allow implementation of PartitionedFactTable's availability in the near future
  • Modified findMissingTimeGrainIntervals method in PartialDataHandler to take a set of columns instead of DataApiRequest and DruidAggregationQuery

@garyluoex garyluoex added the WIP label Feb 14, 2017
@garyluoex garyluoex changed the title Implement and Integrate DataSourceConstraint into Resolvers and Matchers Implement QueryPlanning and DataSourceConstraint into Resolvers and Matchers Feb 23, 2017
@garyluoex garyluoex changed the title Implement QueryPlanning and DataSourceConstraint into Resolvers and Matchers Implement QueryPlanning and DataSource Constraints into Resolvers and Matchers Feb 23, 2017
@garyluoex garyluoex changed the title Implement QueryPlanning and DataSource Constraints into Resolvers and Matchers 1. Implement QueryPlanning and DataSource Constraints into Resolvers and Matchers Feb 28, 2017
@garyluoex garyluoex force-pushed the UDCore branch 3 times, most recently from 6a1665d to c728200 Compare March 2, 2017 22:40
private final LogicalTable logicalTable;
private final Set<Interval> intervals;
private final Set<LogicalMetric> logicalMetrics;
private final Granularity minumumTimeGran;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling: minimum

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be minimumGranularity, not timeGrain. Tables can't have the 'All' granularity, but requests can.

Copy link
Contributor

@michael-mclawhorn michael-mclawhorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approval contingent on trivial style fixes.


/**
* Constructor saves metrics, dimensions, coarsest time grain, and logical table name (for logging).
*
* @param request The request whose dimensions are being matched on
* @param query The query whose columns are being matched
* @param requestConstraints contains the request constraints extracted from DataApiRequest and TemplateDruidQuery
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spacing: requestConstraints (only one space, needs two)

@@ -18,16 +16,14 @@
* Choose the best fit Physical Table from a table group.
*
* @param candidateTables The tables being considered for match
* @param apiRequest The ApiRequest for the query
* @param query a partial query representation
* @param requestConstraints contains the request constraints extracted from DataApiRequest and TemplateDruidQuery
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spacing: two expected after param name

private final LogicalTable logicalTable;
private final Set<Interval> intervals;
private final Set<LogicalMetric> logicalMetrics;
private final Granularity minumumTimeGran;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be minimumGranularity, not timeGrain. Tables can't have the 'All' granularity, but requests can.


public Granularity getMinimumTimeGran() {
return minumumTimeGran;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getter name change minimumGranularity not time grain.

@@ -31,15 +30,15 @@
* Stores the request table name and intervals and creates a predicate to test a physical table based on request
* intervals.
*
* @param request The Api Request being matched against
* @param requestConstraints contains the request constraints extracted from DataApiRequest and TemplateDruidQuery
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spacing: two spaces after param name

private final PartialDataHandler partialDataHandler;
private final VolatileIntervalsService volatileIntervalsService;

/**
* Builds a table comparator that compares tables based on how much data there is in their volatile intervals.
*
* @param query The query to send to druid, and that requires a table
* @param apiRequest The data request
* @param requestConstraints contains the request constraints extracted from DataApiRequest and TemplateDruidQuery
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spacing: two spaces after param name

groupByQuery.getInnermostQuery() >> { innerQuery }
groupByQuery.getMetricDimensions() >> []
dataSourceConstraint = Stub(QueryPlanningConstraint)
dataSourceConstraint.getAllColumnNames() >> columnNames

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason you're using Stub rather than the Mock we usually use?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't care about cardinality on calls, a Stub is better, since it allows for just the behavior stubbing parts, w/o asserting anything about how much it's called. Not having assertions that the test doesn't really care about allows for less-brittle tests.

@@ -25,7 +25,7 @@
public TableDataSource(ConcretePhysicalTable physicalTable) {
super(DefaultDataSourceType.TABLE, Collections.singleton(physicalTable));

this.name = physicalTable.getFactTableName();
this.name = physicalTable.getAvailability().getDataSourceNames().stream().findFirst().get().asName();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're only taking in a ConcretePhysicalTable, could we make ImmutableAvailability (since it's tied to only 1 availability) have a getDataSourceName method? It would simplify this constructor quite a bit, reduce it's assumptions, and let it leverage the type system a bit better.

And while we're at it, could we rename ImmutableAvailability to ConcreteAvailability to better match the other concepts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are implemented in the third PR #179

.map(Dimension::getApiName)
.collect(Collectors.toSet());
Granularity minimumTableTimeGrain = requestConstraints.getMinimumGranularity();
Set<String> columnNames = requestConstraints.getAllColumnNames();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we actually inline these two variables into the places they are called? They seem to only be used for a trace logging, and an error handler, which shouldn't be called in the common case, meaning that we generally won't need to compute these values.

.map(LogicalMetric::getName)
.collect(Collectors.toCollection(LinkedHashSet::new));
Set<String> requestDimensionNames = requestConstraints.getAllDimensionNames();
Set<String> requestMetricNames = requestConstraints.getLogicalMetricNames();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can just inline these, now that they are simple

Granularity minimumTableTimeGrain,
TemplateDruidQuery innerQuery
QueryPlanningConstraint requestConstraints,
Granularity minimumTableTimeGrain
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can unwrap

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this parameter is no longer needed, because it can get pulled from the requestConstraints

// Minimum grain at which the request can be aggregated from
Granularity minimumTableTimeGrain = resolveAcceptingGrain.apply(apiRequest, query);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change means the instance variable resolveAcceptingGrain on this class is no longer needed.

* @param minimumTableTimeGrain Minimum grain that we needed to meet
* @param innerQuery Innermost query for the query we were trying to match
*/
public void logMatchException(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once this is more fully tightened, this method is able to be inlined into the only place it's used: the catch block of the resolve method. So, perhaps we inline this method as well?


BinaryOperator<PhysicalTable> betterTable = getBetterTableOperator(apiRequest, query);
BinaryOperator<PhysicalTable> betterTable = getBetterTableOperator(requestConstraints);
PhysicalTable bestTable = physicalTables.stream().reduce(betterTable).get();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can probably get packed down to just a single statement, if you wanted to tighten this class up even more.

this.filterDimensions = dataApiRequest.getFilterDimensions();
this.metricDimensions = templateDruidQuery.getMetricDimensions();
this.metricNames = templateDruidQuery.getDependentFieldNames();
this.apiFilters = dataApiRequest.getFilters();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should defensively copy these collections

}

public Set<String> getRequestDimensionNames() {
return getRequestDimensions().stream().map(Dimension::getApiName).collect(Collectors.toSet());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're only using this for error handling. If that's the only use for this, can we drop this method?

If we're expecting to make use of this for query processing, perhaps we should either cache the result, or compute it on construction?

getRequestDimensions().stream(),
getFilterDimensions().stream(),
getMetricDimensions().stream()
).flatMap(Function.identity()).collect(Collectors.toSet());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Precompute or lazy/cache compute?

@michael-mclawhorn michael-mclawhorn changed the base branch from master to JekyllSync March 13, 2017 21:35
@michael-mclawhorn michael-mclawhorn changed the base branch from JekyllSync to master March 13, 2017 21:35
Copy link
Collaborator

@cdeszaq cdeszaq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more feedback. (I'm now into the tests)

}

public Set<String> getAllColumnNames() {
return Stream.of(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should precompute this in the constructor.

AggregatableDimensionsMatcher aggregatabilityMatcher = new AggregatableDimensionsMatcher(apiRequest, query);
public List<PhysicalTableMatcher> getMatchers(QueryPlanningConstraint requestConstraints) {
SchemaPhysicalTableMatcher schemaMatcher =
new SchemaPhysicalTableMatcher(requestConstraints);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can unwrap

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to keep a nice pattern here so that the pattern can be easily recognized. Let me know otherwise.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think inlineing all of them is actually much cleaner:

return Arrays.asList(
        new SchemaPhysicalTableMatcher(requestConstraints),
        new AggregatableDimensionsMatcher(requestConstraints),
        new TimeAlignmentPhysicalTableMatcher(requestConstraints)
);

new TimeAlignmentPhysicalTableMatcher(requestConstraints);

AggregatableDimensionsMatcher aggregatabilityMatcher =
new AggregatableDimensionsMatcher(requestConstraints);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can just inline all of these, now that their signatures are much smaller.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of them cannot, the Time Alignment one.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me like they can:

return Arrays.asList(
        new SchemaPhysicalTableMatcher(requestConstraints),
        new AggregatableDimensionsMatcher(requestConstraints),
        new TimeAlignmentPhysicalTableMatcher(requestConstraints)
);

😉

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, ic nice

comparators.add(
new VolatileTimeComparator(apiRequest, query, partialDataHandler, volatileIntervalsService)
);
new PartialTimeComparator(requestConstraints, partialDataHandler));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can unwrap

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to keep a nice pattern here so that the pattern can be easily recognized. Let me know otherwise.

this.intervals = dataApiRequest.getIntervals();
this.minimumGranularity = new RequestQueryGranularityResolver().apply(dataApiRequest, templateDruidQuery);
this.requestGranularity = dataApiRequest.getGranularity();
this.logicalMetrics = dataApiRequest.getLogicalMetrics();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use defensive copies

}

public Set<String> getLogicalMetricNames() {
return getLogicalMetrics().stream().map(LogicalMetric::getName).collect(Collectors.toSet());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compute at construction

Copy link
Collaborator

@cdeszaq cdeszaq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@michael-mclawhorn michael-mclawhorn merged commit 741d316 into master Mar 15, 2017
@garyluoex garyluoex deleted the UDCore branch March 17, 2017 00:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants