Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove FlowState and use a proxy pattern for that state #1001

Closed
wants to merge 1 commit into from

Conversation

johnynek
Copy link
Collaborator

@johnynek johnynek commented Aug 5, 2014

This is a pretty big change as it causes all read taps to be wrapped by a proxy, but it does remove a big wart: the global static FlowState map.

Thoughts on this one?

@cwensel Note the ProxyTap here. I had to resort to reflection to make this work due to the .id in Taps being private. It would be nice to have this ProxyTap idea supporting in cascading so that we can attach extra information to Taps we get at runtime.

@cwensel
Copy link

cwensel commented Aug 5, 2014

re Tap.id being private, you might find this useful
https://github.com/cwensel/cascading/blob/wip-2.6/cascading-core/src/main/java/cascading/tap/Tap.java#L101-101

its to prevent confusion with the instance level getIdentifier etc, and its really not a user level value (unless you really want a guid on the Tap instance).

re ProxyTap, not sure how to surface that into java without a new dependency (or a bunch of code). we use javassist currently for such things (Lingual uses it for something similar I think, another reason to factor stuff out of that project)

ckw

On Aug 5, 2014, at 4:47 PM, P. Oscar Boykin notifications@github.com wrote:

This is a pretty big change as it causes all read taps to be wrapped by a proxy, but it does remove a big wart: the global static FlowState map.

Thoughts on this one?

@cwensel Note the ProxyTap here. I had to resort to reflection to make this work due to the .id in Taps being private. It would be nice to have this ProxyTap idea supporting in cascading so that we can attach extra information to Taps we get at runtime.

You can merge this Pull Request by running

git pull https://github.com/twitter/scalding tapproxy
Or view, comment on, or merge it at:

#1001

Commit Summary

Remove FlowState and use a proxy pattern for that state
File Changes

M scalding-core/src/main/scala/com/twitter/scalding/Execution.scala (33)
D scalding-core/src/main/scala/com/twitter/scalding/FlowState.scala (104)
M scalding-core/src/main/scala/com/twitter/scalding/Job.scala (10)
M scalding-core/src/main/scala/com/twitter/scalding/Mode.scala (6)
M scalding-core/src/main/scala/com/twitter/scalding/RichFlowDef.scala (20)
M scalding-core/src/main/scala/com/twitter/scalding/Source.scala (52)
A scalding-core/src/main/scala/com/twitter/scalding/cascading_interop/ProxyTap.scala (138)
Patch Links:

https://github.com/twitter/scalding/pull/1001.patch
https://github.com/twitter/scalding/pull/1001.diff

Reply to this email directly or view it on GitHub.

Chris K Wensel
chris@concurrentinc.com
http://concurrentinc.com

* This sucks, but I can't see how to make the id the same, and without it
* this proxy breaks when proxies return this
*/
@transient private[this] val idF = classOf[Tap[_, _, _]].getDeclaredField("id")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the purpose of this to make idF available internallyk, or to somehow change/manipulate it? This matters because any change to the id won't be represented in the underlying Tap. We can just surface the info.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the purpose of this to make idF available internallyk, or to somehow change/manipulate it? This matters because any change to the id won't be represented in the underlying Tap. We can just surface the info

you cannot change the value of ID in a Tap. the field 'id' has been made final starting in 2.5.6 and is no longer lazily initialized.

Cascading relies heavily on these IDs for planning and resolving resources cluster side. An id changing on a Tap could cause silent failures during processing that would be very hard to detect. This is especially true in the 3.0 planner.

we are happy to address adding user meta-data to the Tap object itself and ship it in 2.6.

probably best you provide and example of what you think would be the minimum level of functionality.

fwiw, access to the value is through the static method Tap.id( tap )

ckw

Chris K Wensel
chris@concurrentinc.com
http://concurrentinc.com

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the following facts:

  1. We are not "changing" and ids, we are initializing them in the constructor to be the same as the Id of the proxy. Can you comment as to why that is dangerous? It seems the tests pass?

  2. As for adding Metadata, I would actually just like the class I have here, ProxyTap. I can subclass proxytap and attach any data I like to a Tap passed to be by a user. The things I need to attach to the tap are: Source, TypedPipe going in or out of that Source, the scalding Mode this tap was associated with, and possibly more. So, I want to be able to have a

class MetadataTap<Meta, Config, Input, Output> extends Tap<Config, Input, Output> {
  public MetadataTap(Meta metadata, Tap<Config, Input, Output> wrapping)
  Tap<Config, Input, Output> getWrappedTap();
  Meta getMetaData();
}

That would be great if I had that. This is what I have implemented in two steps (proxytap and a subclass of that I use when making the scalding graphs).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the following facts:

  1. We are not "changing" and ids, we are initializing them in the constructor to be the same as the Id of the proxy. Can you comment as to why that is dangerous? It seems the tests pass?

as of 2.5.6 the id is immediately initialized on construction and set as final. you can pull it up to the proxy (which i think sounds like a decorator), not the reverse.

this was a mistake on my part to not make it this way initially. my apologies for any confusion.

  1. As for adding Metadata, I would actually just the class I have here, ProxyTap. I can subclass proxytap and attach any data I like to a Tap passed to be by a user. The things I need to attach to the tap are: Source, TypedPipe going in or out of that Source, the scalding Mode this tap was associated with, and possibly more. So, I want to be able to have a

class MetadataTap<Meta, Config, Input, Output> extends Tap<Config, Input, Output> {
public MetadataTap(Meta metadata, Tap<Config, Input, Output> wrapping)
Tap<Config, Input, Output> getWrappedTap();
Meta getMetaData();
}
That would be great if I had that. This is what I have implemented in two steps (proxytap and a subclass of that I use when making the scalding graphs).

as of 3.0, we have already changed the type signature of Hfs to accommodate differing hdfs based platforms. we could go as far as introducing a new Meta type parameter to Tap. of course this complicates the belief by the raw Java folk Tap/Scheme should not have type parameters because they don't believe in nesting parent child relationships. which is a core pattern in the Tap framework (MultiSource/Sink, PartitionTap, etc)

having a Meta decorator is also an option. this promotes the nesting parent child relationship model to preserve type safety. this option lets us get the feature into 2.6

what might also be interesting is that we have introduced java level annotations that can be added to any property on a core java object (Tap/Scheme/SubAssembly/Operation) to help manage those values. Flow now also has a flow descriptor that can hold arbitrary information (like what SQL statements is this Flow running).

http://docs.concurrentinc.com/cascading/2.6/cascading-core/cascading/management/annotation/package-summary.html

if the annotations are on the Meta object, they can be utilized as well.

does Pipe/SubAssembly need a Meta decorator?

Chris K Wensel
chris@concurrentinc.com
http://concurrentinc.com

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have Pipe have a decorator, but we don't use SubAssemblies so far in scalding (other than the AggregateBy for some code in the Fields-API of scalding).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets start with a MetaTap, and I can try a MetaPipe later.

On Aug 6, 2014, at 1:02 PM, P. Oscar Boykin notifications@github.com wrote:

In scalding-core/src/main/scala/com/twitter/scalding/cascading_interop/ProxyTap.scala:

+import java.util.{ Set => JSet }
+
+/**

  • * This is for when you might want to attach some additional state or behavior
  • * onto an existing tap instance (for instance, one returned from createTap
  • * in scalding source
  • */
    +abstract class ProxyTap[C, I, O, +T <: Tap[C, I, O]] extends Tap[C, I, O] {
  • def proxy: T
  • /**
  • * This sucks, but I can't see how to make the id the same, and without it
  • * this proxy breaks when proxies return this
  • */
  • @transient private[this] val idF = classOf[Tap[_, _, _]].getDeclaredField("id")
    It would be nice to have Pipe have a decorator, but we don't use SubAssemblies so far in scalding (other than the AggregateBy for some code in the Fields-API of scalding).


Reply to this email directly or view it on GitHub.

Chris K Wensel
chris@concurrentinc.com
http://concurrentinc.com

@jcoveney
Copy link
Contributor

jcoveney commented Aug 6, 2014

Looks like the reducer estimate test is failing. Is it known to be flaky? Do we have an issue there? Otherwise this looks fine to me, unless we are going to try to tackle it in cascading?

@johnynek
Copy link
Collaborator Author

johnynek commented Aug 7, 2014

Closing this as it seems concurrent will provide a ProxyTap/MetadataTap which they can verify does not break any planner/monitoring logic.

@johnynek johnynek closed this Aug 7, 2014
@johnynek johnynek reopened this Aug 7, 2014
@johnynek johnynek closed this Aug 7, 2014
@caniszczyk caniszczyk deleted the tapproxy branch May 18, 2015 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants