Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cascading3] Adds a checkpoint when hashjoin is merged with one of its sides #1557

Merged
merged 4 commits into from
May 9, 2016

Conversation

rubanm
Copy link
Contributor

@rubanm rubanm commented May 6, 2016

part of #1465

Fixes the outstanding issue from #1521 where merging a hashjoin with one of its own sides is disallowed by Cascading. This adds a Checkpoint to the join result as a workaround.

[info]   cascading.flow.planner.PlannerException: [Hfs["TextDelimited[[0]...][run() @ org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:466)]
  may not merge accumulated and streamed in same pipeline: Merge(_pipe_2+_pipe_3)[by: _pipe_2:[{1}:0] _pipe_3:[{1}:0]]
[info]   at cascading.flow.planner.rule.RuleExec.performAssertion(RuleExec.java:390)

https://github.com/cwensel/cascading/blob/wip-3.1/cascading-hadoop/src/main/shared/cascading/flow/hadoop/planner/rule/assertion/DualStreamedAccumulatedMergePipelineAssert.java

cc @johnynek @piyushnarang

@johnynek
Copy link
Collaborator

johnynek commented May 6, 2016

looks good to me. Would love to see this tested at Twitter where there are many hashjoins before a full release.

* If p1 represents a hashjoin, this method checks if
* p2 is one of the two sides in that hashjoin.
*/
def isHashJoinedWithPipe(p1: Pipe, p2: Pipe): Boolean = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename p1 to hashJoinPipe and p2 to hashJoinOperandPipe (or something on those lines?). Tends to be hard to keep track of p1, p2..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed

@piyushnarang
Copy link
Collaborator

minor var rename but looks good to me :shipit:

@rubanm
Copy link
Contributor Author

rubanm commented May 9, 2016

Merging this. Will be testing it on pilot jobs before a full release.

@rubanm rubanm merged commit cb04163 into cascading3 May 9, 2016
@rubanm rubanm deleted the rubanm/cascading3/hashjoin_merge branch May 9, 2016 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants