Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partition table config #259

Merged
merged 11 commits into from
May 10, 2017
Merged

Partition table config #259

merged 11 commits into from
May 10, 2017

Conversation

michael-mclawhorn
Copy link
Contributor

Added dimension value based partition feature.

Configuration of dimension value partitions is done by supplying a map of table names to maps of dimension names to maps of dimension values.
The entry for a given table name describes a set of filters, by dimension. If each dimension successfully applies when combined with the restrictions on a table, then the partition is considered to pertain to the query. A partition which pertains is used for restricting availability.

Copy link
Collaborator

@cdeszaq cdeszaq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs changelog

Copy link
Collaborator

@garyluoex garyluoex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small things.

ZonedTimeGrain timeGrain,
Set<FieldName> metricNames,
Set<? extends DimensionConfig> dimensionConfigs,
Map<TableName, Map<String, Set<String>>> tablePartDefinitions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have a DimensionName now =)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll come back to this.

@@ -35,7 +37,8 @@ public PhysicalTableSchema(
super(timeGrain, columns);
this.timeGrain = timeGrain;

this.logicalToPhysicalColumnNames = Collections.unmodifiableMap(logicalToPhysicalColumnNames);
this.logicalToPhysicalColumnNames = Collections.unmodifiableMap(new LinkedHashMap<>
(logicalToPhysicalColumnNames));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we wrap this better?

return Objects.equals(metricNames, that.metricNames)
&& Objects.equals(availabilitiesToMetricNames, that.availabilitiesToMetricNames);
}
return super.equals(obj);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using super's equal if two objects are not the same?

public static final PaginationParameters ONE_RESULT = new PaginationParameters(
1,
1
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can unwrap?

searchProvider.hasAnyRows(_) >> searchRowsExist

expect: "filter matches when there are no constraints or they exist and return rows from search provider"
expected == filter.emptyConstraintOrAnyRows(dimension, constraintMap)
Copy link
Collaborator

@garyluoex garyluoex Apr 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expected should be on the right hand side. IDE will display result on the left hand side as actual and right hand side as expected I think.

}

expect: "True as long as the dimension that doesnt return rows is not constrained"
dimensionIdFilter.apply(constraint) == success
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably rename success to something like, expected or participate

Copy link
Collaborator

@cdeszaq cdeszaq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing too big

* @param timeGrain Zoned time grain of the table
* @param metricNames The Set of metric names on the table
* @param dimensionConfigs Set of dimensions on the table as dimension configs
* @param tablePartDefinitions A map from table names to the dimension value matches for that table
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we expand a bit here on what the components of the Map are? Map<String, Set<String>> doesn't tell us what the key and value should be.

*/
public DefaultingVolatileIntervalsService(
DefaultingDictionary<PhysicalTable, VolatileIntervalsFunction> intervalsFunctions
) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we removing this constructor outright instead of deprecating it? (call out in changelog)

*
* @return true if at least one row is returned.
*/
default boolean hasAnyRows(Set<ApiFilter> filters) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call out addition in changelog.

@@ -139,4 +140,31 @@ public String getPhysicalColumnName(String logicalName) {
protected void setAvailability(Availability availability) {
this.availability = availability;
}

@Override
public boolean equals(final Object obj) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final not needed


@Override
public boolean equals(final Object obj) {
if (obj instanceof BasePhysicalTable) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should check instance equality 1st?

TableName.of("partition"),
DAY_UTC,
Collections.emptySet(),
Collections.emptySet(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[] as Set

PartitionCompositeTable expected = new PartitionCompositeTable(
TableName.of("partition"),
DAY_UTC,
(Set<Column>) Collections.emptySet(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[] as Set

keyField.name >> keyFieldFieldName
nonKeyField.name >> nonKeyFieldName
}
def "healthy initialization returns correct state"() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line between methods

import com.yahoo.bard.webservice.util.SimplifiedIntervalList

import org.joda.time.DateTime
import org.joda.time.Interval

import spock.lang.Specification

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep

@@ -59,7 +63,7 @@ class DefaultingVolatileIntervalsServiceSpec extends Specification {
allGrainQuery.intervals >> fullRange
allGrainQuery.granularity >> AllGranularity.INSTANCE

intervalsFunctions = new DefaultingDictionary<>(defaultFunction)
intervalsFunctions = new HashMap<>()
intervalsFunctions.put(table2, mockFunction2)
intervalsFunctions.put(table3, mockFunction3)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use map literals, now that this is a simple Map?

* @param timeGrain Zoned time grain of the table
* @param metricNames The Set of metric names on the table
* @param dimensionConfigs Set of dimensions on the table as dimension configs
* @param tablePartDefinitions A map from table names to a map of dimension names to sets of values for those
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 spaces between name and description


@Override
public Set<TableName> getDependentTableNames() {
return tablePartDefinitions.keySet();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest putting tablePartDefinitions.keySet(); to a variable so we don't have to call .keySet() multiple times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is keySet expensive? I'm not sure it makes sense to keep a copy of the keySet and the original collection. Also, I think the builder might modify the tablePartDefinitions class during construction and having a copy of the keyset might result in inconsistency.

entry -> dictionaries.getPhysicalDictionary().get(entry.getKey().asName()),
entry -> new DimensionIdFilter(toDimensionValuesMap(
entry.getValue(),
dictionaries.getDimensionDictionary()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be better to assign dictionaries.getDimensionDictionary() and dictionaries.getPhysicalDictionary() to variables?

* @param dimensionNameMap The configuration map from dimension names to sets of dimension key values.
* @param dimensionDictionary The dictionary of dimensions to use for binding.
*
* @return A map of dimensions to dimension key values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cdeszaq Lower case "A"?

&& Objects.equals(availability, that.availability);
} else {
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else {
             return false;
         }

can be simplified to

return false;

dimensionKeySelectFilters = dimensionMappingValues.entrySet().stream()
.map(entry -> new AbstractMap.SimpleEntry<>(
entry.getKey(),
new ApiFilter(entry.getKey(), entry.getKey().getKey(), FilterOperation.in, entry.getValue())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are getting key multiple times, could we assign entry.getKey() to a variable?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's likely not worth it here, since this is a fairly simple constructor, and the call is likely very cheap. The added complexity isn't worth it, imho.

* @param dimension The dimension whose rows are being tested on.
* @param constraintFilters The api filters from the constraint
*
* @return True if for this dimension there are rows matching the query filters AND the embedded filters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once space between "return" and "True". "true" should be lowercased


@Override
public boolean equals(Object obj) {
if (obj instanceof DimensionIdFilter) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add this == obj check?

* @param value The value being appended to the set.
* @param <T> The type of the set
*
* @return The new set containing the additional value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lowercase "The"


TableName dataSource1 = TableName.of("part1_ds1")
TableName dataSource2 = TableName.of("part1_ds2")
TableName dataSource3 = TableName.of("part2_ds1")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is part2Ds1 better for style?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these aren't java identifiers I think readability is the biggest concern and this way is clearly more readable.

Equals optimizations
@cdeszaq cdeszaq merged commit 872379b into master May 10, 2017
@cdeszaq cdeszaq deleted the PartitionTableConfig branch May 10, 2017 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants