Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce makers #127

Merged
merged 10 commits into from
Jan 3, 2017
Merged

Reduce makers #127

merged 10 commits into from
Jan 3, 2017

Conversation

michael-mclawhorn
Copy link
Contributor

Big simplification of makers

Copy link
Contributor

@archolewa archolewa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the use of a Function to cut down on the boilerplate. Had a few suggestsions here and there, but overall this looks great.

@@ -35,6 +38,10 @@
* takes aggregated data from a finer time grain (i.e. DefaultTimeGrain.DAY) and computes an average across a coarser
* time grain (i.e. DefaultTimeGrain.WEEK). For example, given the total number of visitors to www.example.com for each
* day of the week, we can compute the average number of daily visitors to example.com for the entire week.
* <p>
* A nested average requires the following columns, an inner query constant, and outer query sum of that constant,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a colon after following columns instead of a comma?

Also this paragraph is extremely dense, and I couldn't keep my eyes from glazing as I looked at it. I would suggest something like:

  1. An inner query constant c to sum, and the outer query sum of c: cs.
  2. An inner query (post) aggregation a to sum, and an outer query sum of a: as
  3. An outer query post aggregation that computes as / cs.

newInnerPostAggregations.add((PostAggregation) sourceMetric);
}
Set<PostAggregation> newInnerPostAggregations = !(sourceMetric instanceof PostAggregation) ?
Collections.emptySet() : ImmutableSet.of((PostAggregation) sourceMetric);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would flip these conditions, and make the check a positive check rather than a negative one.

return originalSourceMetric;
return !(originalSourceMetric instanceof SketchAggregation) ?
originalSourceMetric :
sketchConverter.asSketchEstimate((SketchAggregation) originalSourceMetric);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would flip these conditions, and make the check a positive check rather than a negative one.

}
return result;
return originalAggregations.stream()
.map(agg -> ! agg.isSketch() ? agg : sketchConverter.asInnerSketch((SketchAggregation) agg))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would flip these conditions, and make the check a positive check rather than a negative one.

return result;
return originalAggregations.stream()
.map(agg -> ! agg.isSketch() ? agg : sketchConverter.asInnerSketch((SketchAggregation) agg))
.collect(Collectors.toCollection(LinkedHashSet::new));
}

/**
* Create an Aggregation for summing on a metric from an inner query.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably toss in a blank line and a <p> tag here, to separate the summary from the more detailed description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That hasn't been our rule. The summary ends with a period and newline. Further commentary may be offset, but need not be. The sentence end will be caught by the javadoc tool and pulled out to a summary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When one is adding an extra line or two to the summary, I'm fine with that. But since this is a rather detailed paragraph, I was just thinking that it would look better if it was offset into its own paragraph from the summary, in the method's more detailed view.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough

@@ -2,22 +2,20 @@
// Licensed under the terms of the Apache license. Please see LICENSE.md file distributed with this work for terms.
package com.yahoo.bard.webservice.data.config.metric.makers;

import static java.util.Collections.emptySet;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the static import? A static import makes sense to me if the class name is really long and we're using a function/constant a lot, but Collections isn't that long, and emptySet is only used once. So I would argue that the readability benefits of Collections.emptySet() outweighs the benefits from reducing verbosity.

query,
NO_OP_MAPPER,
metricName
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be wrapped?

*
* @return The aggregation used by this metric
*/
private Aggregation assertDependentIsAggregationMetric(final LogicalMetric sourceMetric) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final not necessary on final LogicalMetric sourceMetric.

.map(metrics::get)
.map(LogicalMetric::getTemplateDruidQuery)
.reduce(TemplateDruidQuery::merge)
.orElseGet(() -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use orElseThrow here instead.


/**
* Metric maker which only creates aggregations. They do not depend on other metrics and cannot be
* validated against the metric dictionary
*/
public abstract class RawAggregationMetricMaker extends MetricMaker {

private static final int DEPENDENT_METRICS_REQUIRED = 1;

private final BiFunction<String, String, Aggregation> aggregationFactory;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awww yeah...

@archolewa
Copy link
Contributor

👍

@archolewa
Copy link
Contributor

👍 Reafirming my plus one.

}

/**
* Copy a set of aggregations, replacing any sketch aggregations with sketchMerge aggregations.
* This is an artifact of earlier sketch code which didn't have a single sketch type that could be used without
* finalization in inner queries.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is an artifact and is no longer needed, should we deprecate it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not until we remove that deprecated code, I believe.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's the case, can we @Deprecate this method as well, so that when we do the next sweep to remove deprecated code, this will be flagged and we can remove it?

* @param resultSetMapper The mapping function to be applied to the result that is returned by the query that is
* built from the LogicalMetric which is built by this maker.
*
* @deprecated to override default mapping, use the Function constructor
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're deprecating this, we should also:

  1. Convert the test using this constructor to use the non-deprecated version
  2. Also deprecate the ColumnMapper interface

Alternatively, if we think there's value in the existence of the ColumnMapper interface, and the JavaDoc on it's withColumName method, we should keep it...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deprecating

*
* @param metricDictionary The dictionary used to resolve dependent metrics when building the LogicalMetric
* @param function The arithmetic operation performed by the LogicalMetrics constructed by this maker
* @param resultSetMapper The function to be applied to the result that is returned by the query
* @param resultSetMapperBuilder A builder for a function to be applied to the result that is returned by the query
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're making this change, we should take the opportunity to make the param comment more clear / easier to understand. As it is, this doc doesn't really say anything about what this param is for / expected to do.

@@ -58,32 +83,32 @@ public ArithmeticMaker(
*/
public ArithmeticMaker(MetricDictionary metricDictionary, ArithmeticPostAggregationFunction function) {
// TODO: Deprecate me, mappers should always be specified at creation time, not implicitly
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're changing a bunch of things anyways, should we deprecate this method as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if I actually agree with this TODO. Does anyone know why this is asked for?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea behind this TODO is that we shouldn't apply a SketchRoundUpMapper to everything that uses the ArithmeticMaker, and instead require those who want that to add the mapper they want explicitly.

For example, someone might want to add 2 estimated sketches and have the result be a floating-point, rather than an integer. The Maker should not be opinionated.

this(
metricDictionary,
function,
(Function<String, ResultSetMapper>) (String name) -> new SketchRoundUpMapper(name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can replace lambda with method reference

given: "The name of the metric the maker depends on, and the maker itself"

Aggregation sumAggregation = new LongSumAggregation("tmp_metric", DEPENDENT_METRIC_NAME)
LogicalMetric metric = longSumMaker.make("longSum", DEPENDENT_METRIC_NAME)
Dimension dim = new KeyValueStoreDimension("d", "des", new LinkedHashSet<DimensionField>(), Mock(KeyValueStore), Mock(SearchProvider))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplified using a set literal, rather than constructing a new LHS:

Dimension dim = new KeyValueStoreDimension("d", "des", [], Mock(KeyValueStore), Mock(SearchProvider))

/**
*
* Created by mclawhor on 12/19/16.
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not useful

def "Expected numeric aggregation is produced for #makerClass.getSimpleName()"() {
setup:
RawAggregationMetricMaker maker = makerClass.newInstance()
expect:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blank line between blocks

def "Expected sketch aggregation is produced for #makerClass.getSimpleName()"() {
setup:
RawAggregationMetricMaker maker = makerClass.newInstance((MetricDictionary) null, 5)
expect:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blank line between blocks

public static String FIELD_NAME = "BAR"

@Unroll
def "Expected numeric aggregation is produced for #makerClass.getSimpleName()"() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the call to getSimpleName() is needed, since I believe Groovy defaults to that. That said, I also don't think method calls will work correctly with unrolling, only property-style accessors, so this should be #makerClass.simpleName

(Applies to other test as well)


- [Deprecated MetricMaker.getDependentQuery lookup method in favor of simpler direct access](https://github.com/yahoo/fili/pull/124)

- [Default DimensionColumn name to use apiName instead of physicalName](https://github.com/yahoo/fili/pull/115)
* Deprecated `TableUtils::getColumnNames(DataApiRequest, DruidAggregationQuery, PhysicalTable)` returning dimension physical name,
* Deprecated `TableUtils::getColumnNames(DataApiRequest, DruidAggregationQuery, PhysicalTable)` returning dimension physical ltername,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo?

@@ -58,32 +83,32 @@ public ArithmeticMaker(
*/
public ArithmeticMaker(MetricDictionary metricDictionary, ArithmeticPostAggregationFunction function) {
// TODO: Deprecate me, mappers should always be specified at creation time, not implicitly
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea behind this TODO is that we shouldn't apply a SketchRoundUpMapper to everything that uses the ArithmeticMaker, and instead require those who want that to add the mapper they want explicitly.

For example, someone might want to add 2 estimated sketches and have the result be a floating-point, rather than an integer. The Maker should not be opinionated.

*/
public class FilteredAggregationMaker extends MetricMaker {

protected static final Logger LOG = LoggerFactory.getLogger(ConstantMaker.class);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong class

… the original NumberFormatException (Config Behavior Change)
…ssing in a metric instead of binding to a single metric aggregation.

Made the FilteredAggregation constructor use the column name as as the metric, as per convention elsewhere.
this(
metricDictionary,
function,
(Function<String, ResultSetMapper>) (ignore) -> NO_OP_MAPPER
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parens around ignore (ie. the lambda input variable) are not technically needed, since there's no type on it and there's only 1 variable.

/NotABlocker


/**
* The metric dictionary used to resolve dependent metrics.
*/
protected final MetricDictionary metricDictionary;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can/should we kick these over to be private instead of protected? My gut says we should generally not encourage extension, since it makes it harder to change internal variables like these...

/NotABlocker

@@ -27,7 +27,7 @@

private static final int DEPENDENT_METRICS_REQUIRED = 2;

final SketchSetOperationPostAggFunction function;
private final SketchSetOperationPostAggFunction function;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should call this change out in the change log. It's unlikely this was extended, but just in case...

Copy link
Collaborator

@cdeszaq cdeszaq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small things that could be tightened up, but nothing major. 👍

@michael-mclawhorn michael-mclawhorn deleted the ReduceMakers branch January 3, 2017 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants