-
Notifications
You must be signed in to change notification settings - Fork 703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cascading 2.6 tracing #1156
Cascading 2.6 tracing #1156
Conversation
This commit set's a tracing boundary so that DocumentServices such as Driven don't show "RichPipe" as being the originator of Cascading nodes. Instead user code should now be the origin. The change to existing Scalding code is kept minimal: the Job class calls an init method. If other entry points are common it's easy enough to add them. The call into Cascading is implemented via reflection so users can continue to use older versions of Cascading without breakage.
Thanks for the PR. Just an FYI we are going to be pretty slow about taking the major cascading bump as it'll take lots of testing. |
@@ -77,6 +77,7 @@ object Job { | |||
* these functions can be combined Monadically using algebird.monad.Reader. | |||
*/ | |||
class Job(val args: Args) extends FieldConversions with java.io.Serializable { | |||
Tracing.init() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't seem like it should be in the default job, something can be opted into by extending a trait instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting that Job extend a trait that initializes Tracing? Or that end users should explicitly request tracing. If the later I disagree. This functionality isn't something people are likely to discover.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The former, they should extend a trait. If they use driven they will know its there. Why have tracing code running for everyone as this would have?
(Indeed it seems like tracing here should be a trait anyway that gets extended from. also can allow its methods easier overridden)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A document service is a plugin deployed at runtime. The programmer may not have any idea that one is deployed: that may be controlled by a separate admin.
The tracing code change is just registering a regex representing the boundary of the internals of Scalding. It runs once at start up and has no impact on users.
Understood. Let me know if there's anything I can do to help. |
*/ | ||
private val defaultRegex = """^com\.twitter\.scalding\.(?!Tool|Job|ExecutionContext).*""" | ||
|
||
register() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this here and not in init?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a very cheap way to create one-time-init semantics. I.e. if it ever becomes necessary to put the init() call in other places in Scalding, multiple calls to init() won't actually do anything.
Okay, this seems safe enough, and good for Driven users. |
When Scalding talked to document services like Driven it was sending line numbers that essentially mark the boundary between Scalding and Cascading. But that's not very useful to end users. A change was made to Cascading 2.6 that allows higher level APIs to indicate where a document service should consider the boundary. This PR bumps Scalding up to use Cascading 2.6 and to participate in the protocol for registering an API boundary.