New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove FlowState and use a proxy pattern for that state #1001
Conversation
re Tap.id being private, you might find this useful its to prevent confusion with the instance level getIdentifier etc, and its really not a user level value (unless you really want a guid on the Tap instance). re ProxyTap, not sure how to surface that into java without a new dependency (or a bunch of code). we use javassist currently for such things (Lingual uses it for something similar I think, another reason to factor stuff out of that project) ckw On Aug 5, 2014, at 4:47 PM, P. Oscar Boykin notifications@github.com wrote:
Chris K Wensel |
* This sucks, but I can't see how to make the id the same, and without it | ||
* this proxy breaks when proxies return this | ||
*/ | ||
@transient private[this] val idF = classOf[Tap[_, _, _]].getDeclaredField("id") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the purpose of this to make idF available internallyk, or to somehow change/manipulate it? This matters because any change to the id won't be represented in the underlying Tap. We can just surface the info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the purpose of this to make idF available internallyk, or to somehow change/manipulate it? This matters because any change to the id won't be represented in the underlying Tap. We can just surface the info
you cannot change the value of ID in a Tap. the field 'id' has been made final starting in 2.5.6 and is no longer lazily initialized.
Cascading relies heavily on these IDs for planning and resolving resources cluster side. An id changing on a Tap could cause silent failures during processing that would be very hard to detect. This is especially true in the 3.0 planner.
we are happy to address adding user meta-data to the Tap object itself and ship it in 2.6.
probably best you provide and example of what you think would be the minimum level of functionality.
fwiw, access to the value is through the static method Tap.id( tap )
ckw
Chris K Wensel
chris@concurrentinc.com
http://concurrentinc.com
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note the following facts:
-
We are not "changing" and ids, we are initializing them in the constructor to be the same as the Id of the proxy. Can you comment as to why that is dangerous? It seems the tests pass?
-
As for adding Metadata, I would actually just like the class I have here, ProxyTap. I can subclass proxytap and attach any data I like to a Tap passed to be by a user. The things I need to attach to the tap are: Source, TypedPipe going in or out of that Source, the scalding Mode this tap was associated with, and possibly more. So, I want to be able to have a
class MetadataTap<Meta, Config, Input, Output> extends Tap<Config, Input, Output> {
public MetadataTap(Meta metadata, Tap<Config, Input, Output> wrapping)
Tap<Config, Input, Output> getWrappedTap();
Meta getMetaData();
}
That would be great if I had that. This is what I have implemented in two steps (proxytap and a subclass of that I use when making the scalding graphs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note the following facts:
- We are not "changing" and ids, we are initializing them in the constructor to be the same as the Id of the proxy. Can you comment as to why that is dangerous? It seems the tests pass?
as of 2.5.6 the id is immediately initialized on construction and set as final. you can pull it up to the proxy (which i think sounds like a decorator), not the reverse.
this was a mistake on my part to not make it this way initially. my apologies for any confusion.
- As for adding Metadata, I would actually just the class I have here, ProxyTap. I can subclass proxytap and attach any data I like to a Tap passed to be by a user. The things I need to attach to the tap are: Source, TypedPipe going in or out of that Source, the scalding Mode this tap was associated with, and possibly more. So, I want to be able to have a
class MetadataTap<Meta, Config, Input, Output> extends Tap<Config, Input, Output> {
public MetadataTap(Meta metadata, Tap<Config, Input, Output> wrapping)
Tap<Config, Input, Output> getWrappedTap();
Meta getMetaData();
}
That would be great if I had that. This is what I have implemented in two steps (proxytap and a subclass of that I use when making the scalding graphs).
as of 3.0, we have already changed the type signature of Hfs to accommodate differing hdfs based platforms. we could go as far as introducing a new Meta type parameter to Tap. of course this complicates the belief by the raw Java folk Tap/Scheme should not have type parameters because they don't believe in nesting parent child relationships. which is a core pattern in the Tap framework (MultiSource/Sink, PartitionTap, etc)
having a Meta decorator is also an option. this promotes the nesting parent child relationship model to preserve type safety. this option lets us get the feature into 2.6
what might also be interesting is that we have introduced java level annotations that can be added to any property on a core java object (Tap/Scheme/SubAssembly/Operation) to help manage those values. Flow now also has a flow descriptor that can hold arbitrary information (like what SQL statements is this Flow running).
if the annotations are on the Meta object, they can be utilized as well.
does Pipe/SubAssembly need a Meta decorator?
Chris K Wensel
chris@concurrentinc.com
http://concurrentinc.com
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have Pipe have a decorator, but we don't use SubAssemblies so far in scalding (other than the AggregateBy for some code in the Fields-API of scalding).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets start with a MetaTap, and I can try a MetaPipe later.
On Aug 6, 2014, at 1:02 PM, P. Oscar Boykin notifications@github.com wrote:
In scalding-core/src/main/scala/com/twitter/scalding/cascading_interop/ProxyTap.scala:
+import java.util.{ Set => JSet }
+
+/**
- * This is for when you might want to attach some additional state or behavior
- * onto an existing tap instance (for instance, one returned from createTap
- * in scalding source
- */
+abstract class ProxyTap[C, I, O, +T <: Tap[C, I, O]] extends Tap[C, I, O] {- def proxy: T
- /**
- * This sucks, but I can't see how to make the id the same, and without it
- * this proxy breaks when proxies return this
- */
- @transient private[this] val idF = classOf[Tap[_, _, _]].getDeclaredField("id")
It would be nice to have Pipe have a decorator, but we don't use SubAssemblies so far in scalding (other than the AggregateBy for some code in the Fields-API of scalding).—
Reply to this email directly or view it on GitHub.
Chris K Wensel
chris@concurrentinc.com
http://concurrentinc.com
Looks like the reducer estimate test is failing. Is it known to be flaky? Do we have an issue there? Otherwise this looks fine to me, unless we are going to try to tackle it in cascading? |
Closing this as it seems concurrent will provide a ProxyTap/MetadataTap which they can verify does not break any planner/monitoring logic. |
This is a pretty big change as it causes all read taps to be wrapped by a proxy, but it does remove a big wart: the global static FlowState map.
Thoughts on this one?
@cwensel Note the ProxyTap here. I had to resort to reflection to make this work due to the .id in Taps being private. It would be nice to have this ProxyTap idea supporting in cascading so that we can attach extra information to Taps we get at runtime.