Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge identical transforms at forks #4029

Merged
merged 11 commits into from Aug 2, 2018
Merged

Conversation

invokesus
Copy link
Member

No description provided.

@domoritz domoritz changed the title [WIP] Merge identical transforms at forks Merge identical transforms at forks Jul 16, 2018
@invokesus invokesus force-pushed the inv/merge-identical-transforms branch 2 times, most recently from 546be4c to bb4def7 Compare July 18, 2018 19:24
@invokesus invokesus force-pushed the inv/merge-identical-transforms branch 3 times, most recently from d2e2cde to 201f48b Compare July 26, 2018 21:03
@invokesus
Copy link
Member Author

Fixes #2177 and one of the problems @kanitw mentioned in #3797.

[
  {"data": "data_0", "field": "Origin"},
  {"data": "data_1", "field": "Origin"}
]

now is

{"data": "source_0", "field": "mean_Miles_per_Gallon"}

@@ -70,16 +70,6 @@
"name": "data_3",
"source": "data_1",
"transform": [{"type": "filter", "expr": "datum.label !== 'PoleStar'"}]
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future PRs, we should try to see how we can make data_2 and data_3 merge for this example. (No need for this PR.)

@@ -22,4 +22,9 @@ export class FilterNode extends DataFlowNode {
expr: this.expr
};
}

public hash() {
// FIX include something from model in hash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I thought we agreed to decouple SelectionFilter from ExpressionFilter.
So we only merge ExpressionFilter, not SelectionFilter.

That said, the current code means that it will always be merged. The question is whether this is the right behavior. Maybe it's fine.

To test it, create an spec where the same selection is applied two layers and see if merging the results still produce correct output. (Maybe concat one unit spec with a layered spec then apply selection from the unit spec to both layers in the other spec.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, modify interactive_stocks_nearest_index to use timeUnit would demonstrate if this works well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sad news is that adding timeUnit to interactive_stocks_nearest_index doesn't even work, but the diff that this PR is causing to that version seems reasonable.

Thus, I wonder if we should ignore this case for now.

@domoritz any thoughts?

In any case, @invokesus we better remove // FIX include something from model in hash. I don't think we should add stuff from the model to the hash, but rather get rid of model from Filter #4084

@@ -161,6 +161,7 @@ export function optimizeDataflow(dataComponent: DataComponent) {

roots.forEach(moveFacetDown);
roots.forEach(mergeParse);
roots.forEach(optimizers.mergeIdenticalTransforms);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, why is the call pattern different from others (mergeParse isn't in optimizers?)

cc: @domoritz

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mergeParse should be in optimizers. It's not super clear because in a way it's necessary for correctness as well.

@kanitw
Copy link
Member

kanitw commented Jul 27, 2018

LGTM, but I'll let @domoritz approve.

@invokesus invokesus force-pushed the inv/merge-identical-transforms branch from 97e3ac4 to 0b2f712 Compare July 27, 2018 10:00
@invokesus
Copy link
Member Author

The coverage test is failing but I don't think it makes sense to write new tests for the concerned files.

@invokesus invokesus force-pushed the inv/merge-identical-transforms branch from 607f032 to 0b2f712 Compare July 27, 2018 15:28
const hashes = transforms.map(x => x.hash());
const buckets = {};
for (let i = 0; i < hashes.length; i++) {
if (buckets[hashes[i]] === undefined) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buckets[hashes[i]] = (buckets[hashes[i]] || []).push(transforms[i])?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

push doesn't return the array only the element pushed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could use concat but whatever.

@@ -161,6 +161,7 @@ export function optimizeDataflow(dataComponent: DataComponent) {

roots.forEach(moveFacetDown);
roots.forEach(mergeParse);
roots.forEach(optimizers.mergeIdenticalTransforms);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to start from the leaves?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think so. But I thought it would be better to have the top-down version working with the mergeParse(which is also top-down) and refactor them both to bottom-up when I've got the multiple pass thing working.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, just don't forget

@@ -144,6 +144,9 @@ export class BinNode extends DataFlowNode {
return out;
}

public hash() {
return hash(this.bins);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could there be problems since the node type is not part of the hash?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll include it in the hash.
What's the best way though? Adding a nodeType key?

Maybe something like
return hash({nodeType: "BinNode", ...this.bins});

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just type + hash

@@ -59,6 +59,10 @@ export class WindowTransformNode extends DataFlowNode {
return windowFieldDef.as || vgField(windowFieldDef);
}

public hash() {
return hash(this.transform);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it makes sense to have tests for these hash functions so we can read them in the diff and see whether they look good?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting tests like
i) hash(node) === hash(node.clone())
or
ii) hash(node) === '__hash_value__'
I see the benefit that test ii might provide, but the value will have to be updated each time we change things.
However, I think I should write some tests for the hash function itself with probable inputs in util.test.ts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latter

@invokesus invokesus force-pushed the inv/merge-identical-transforms branch 2 times, most recently from 0882e01 to e6e6f03 Compare July 28, 2018 07:38
@@ -191,6 +192,10 @@ export class StackNode extends DataFlowNode {
}, {});
}

public hash() {
return 'StackNode' + hash(this._stack);
Copy link
Member

@domoritz domoritz Jul 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stack ${hash(this._stack)}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

colon?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't care but @kanitw suggested that we already use colons in the hash. It doesn't really matter so I'd say space is fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks nicer in the test without colon lol -- not that it's a big deal.

@invokesus invokesus force-pushed the inv/merge-identical-transforms branch from ff19493 to d16240b Compare August 1, 2018 22:08
@invokesus invokesus force-pushed the inv/merge-identical-transforms branch from d16240b to ea53045 Compare August 1, 2018 22:15
@invokesus invokesus force-pushed the inv/merge-identical-transforms branch from 07632f9 to e7ebd7a Compare August 1, 2018 22:32
@invokesus
Copy link
Member Author

Let me know if I missed anything.

@domoritz
Copy link
Member

domoritz commented Aug 1, 2018

Remove "Node" from hash

@invokesus invokesus force-pushed the inv/merge-identical-transforms branch from 608f261 to eca0d12 Compare August 1, 2018 23:19
@domoritz domoritz merged commit 1dad6bf into master Aug 2, 2018
@domoritz domoritz deleted the inv/merge-identical-transforms branch August 2, 2018 16:55
@domoritz
Copy link
Member

domoritz commented Aug 2, 2018

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants