-
Notifications
You must be signed in to change notification settings - Fork 703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snapshot a pipe in the REPL #918
Changes from 30 commits
3d737f8
19ec513
d837f7e
5e347d1
b6b5ebf
4924b3b
66f87ea
f13ff17
af8b3ff
bd9c0f4
ea3b9a3
e262328
6ac5051
c844f41
acbcb2b
2733a88
3b85bb2
3f6920e
9102d4c
932ce34
36e55d9
e9e7a1c
1f015ed
f266f89
68049e8
63ba01d
c01a188
759bd86
a01ef97
48693dc
fa2bf25
70a46c7
4b53fd3
875af09
70fa0a1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,101 @@ | ||
/* | ||
Copyright 2014 Twitter, Inc. | ||
|
||
Licensed under the Apache License, Version 2.0 (the "License"); | ||
you may not use this file except in compliance with the License. | ||
You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. | ||
*/ | ||
package com.twitter.scalding | ||
|
||
import cascading.flow.FlowDef | ||
import cascading.pipe.Pipe | ||
|
||
/** | ||
* This is an enrichment-pattern class for cascading.flow.FlowDef. | ||
* The rule is to never use this class directly in input or return types, but | ||
* only to add methods to FlowDef. | ||
*/ | ||
class RichFlowDef(val fd: FlowDef) { | ||
// RichPipe and RichFlowDef implicits | ||
import Dsl._ | ||
|
||
def copy: FlowDef = { | ||
val newFd = new FlowDef | ||
newFd.merge(fd) | ||
newFd | ||
} | ||
|
||
private[scalding] def mergeMisc(o: FlowDef) { | ||
fd.addTags(o.getTags) | ||
fd.addTraps(o.getTraps) | ||
fd.addCheckpoints(o.getCheckpoints) | ||
fd.setAssertionLevel(o.getAssertionLevel) | ||
fd.setName(o.getName) | ||
fd | ||
} | ||
|
||
/** | ||
* Mutate current flow def to add all sources/sinks/etc from given FlowDef | ||
*/ | ||
def merge(o: FlowDef): FlowDef = { | ||
fd.addSources(o.getSources) | ||
fd.addSinks(o.getSinks) | ||
fd.addTails(o.getTails) | ||
fd.mergeMisc(o) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. don't like this. Just add all of it here. No need for two methods. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see, you call it below. Nevermind... Can you add a comment avoid to explain the strange choice? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should return Unit to make it clear that there is mutation here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. also, |
||
fd | ||
} | ||
|
||
/** | ||
* New flow def with only sources upstream from tails. | ||
*/ | ||
def withoutUnusedSources: FlowDef = { | ||
import collection.JavaConverters._ | ||
|
||
// find all heads reachable from the tails (as a set of names) | ||
val heads = fd.getTails.asScala | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. let's break this out:
|
||
.flatMap(_.getHeads).map(_.getName).toSet | ||
// add taps associated with heads to localFlow | ||
val filteredSources = fd.getSources.asScala.filterKeys(src => heads.contains(src)).asJava | ||
|
||
val newFd = fd.copy | ||
newFd.getSources.clear() | ||
newFd.addSources(filteredSources) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. does this really work? What about checkpoints that point to sources? They will still be in there, right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't know about Cascading's Checkpoints. Does Scalding use them at all? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, I see, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. By the way, I know this patch is a lot of work, but in fact I think it will I think this can actually fix a design flaw in scalding: that using pipe Especially with TypedPipe, when this code is all done, even outside of the The trick is: keep a local FlowDef when we make a TypedPipe, and at write I think this can keep TypedPipe referentially transparent. On Friday, June 27, 2014, Brandon Holt notifications@github.com wrote:
Oscar Boykin :: @posco :: http://twitter.com/posco |
||
|
||
newFd | ||
} | ||
|
||
/** | ||
* FlowDef that only includes things upstream from the given Pipe | ||
*/ | ||
def onlyUpstreamFrom(pipe: Pipe): FlowDef = { | ||
val newFd = new FlowDef | ||
// don't copy any sources/sinks | ||
newFd.mergeMisc(fd) | ||
|
||
val sourceTaps = fd.getSources | ||
val newSrcs = newFd.getSources | ||
|
||
pipe.upstreamPipes | ||
.filter(_.getPrevious.length == 0) // implies _ is a head | ||
.foreach { head => | ||
if (!newSrcs.containsKey(head.getName)) | ||
newFd.addSource(head, sourceTaps.get(head.getName)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we need to check for Checkpoints in these pipes here, otherwise scalding's At the very least, add a TODO for that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added a TODO for this. Not sure what's supposed to happen -- I tried adding a checkpoint, but it seems to go into the flow as just another pipe (doesn't get added to the flowDef ( |
||
} | ||
|
||
val sinks = fd.getSinks | ||
if (sinks.containsKey(pipe.getName)) { | ||
newFd.addTailSink(pipe, sinks.get(pipe.getName)) | ||
} | ||
|
||
newFd | ||
} | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like the add* methods FlowDef in cascading are not idempotent:
https://github.com/cwensel/cascading/blob/wip-2.6/cascading-core/src/main/java/cascading/flow/FlowDef.java#L153
Is that going to be a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case,
addSources
does look idempotent: it's just adding what is already in one map into another.As for
addSource(Pipe)
, it shouldn't be any different -- my understanding is that head pipes have the same name as their source, soaddSource(Pipe,...)
should be equivalent toaddSource(name: String)
, which is simply adding to a map.In any case, all we're doing is re-adding references to the same pipes, so the
IllegalArgumentException
can't fire (unless something else broke the flow) because we know we already have a valid pipe with 1 head.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, bad link:
https://github.com/cwensel/cascading/blob/wip-2.6/cascading-core/src/main/java/cascading/flow/FlowDef.java#L128
addSource(String, Tap) is not adding to a map, it is making sure that the key is not already present. Am I missing something?
I am not sure your code ever does this. Does it? But, ultimately, we want
Equiv.equiv(x.mergeFrom(x), x)
as a law, whereEquiv[FlowDef]
is set/map equality of all the structure of the flow.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as you're copying over all sources, as in
mergeFrom
, you should get map equality forgetSources
and the rest. The only place where you might have a problem is if you build up your own set of sources, in which case you have to ensure this property yourself -- as we do here inonlyUpstreamFlow
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we can merge with this now, but we need to add an issue to fix it.
mergeFrom should be idempotent, or it will be very difficult to use in interesting cases, specifically for the case I have in mind for making TypedPipe referentially transparent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. So you're proposing making a deterministic choice when keys collide?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the keys can collide with the same value, in that case there is no choice to make. You can still error if there are keys that collide, but the values are not equal.