Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cascading 2.6 tracing #1156

Merged
merged 2 commits into from
Jan 21, 2015
Merged

Conversation

JamesIry
Copy link
Contributor

@JamesIry JamesIry commented Jan 9, 2015

When Scalding talked to document services like Driven it was sending line numbers that essentially mark the boundary between Scalding and Cascading. But that's not very useful to end users. A change was made to Cascading 2.6 that allows higher level APIs to indicate where a document service should consider the boundary. This PR bumps Scalding up to use Cascading 2.6 and to participate in the protocol for registering an API boundary.

This commit set's a tracing boundary so that DocumentServices such as
Driven don't show "RichPipe" as being the originator of Cascading nodes.
Instead user code should now be the origin. The change to existing
Scalding code is kept minimal: the Job class calls an init method. If
other entry points are common it's easy enough to add them. The call
into Cascading is implemented via reflection so users can continue to
use older versions of Cascading without breakage.
@ianoc
Copy link
Collaborator

ianoc commented Jan 9, 2015

Thanks for the PR. Just an FYI we are going to be pretty slow about taking the major cascading bump as it'll take lots of testing.

@@ -77,6 +77,7 @@ object Job {
* these functions can be combined Monadically using algebird.monad.Reader.
*/
class Job(val args: Args) extends FieldConversions with java.io.Serializable {
Tracing.init()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't seem like it should be in the default job, something can be opted into by extending a trait instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting that Job extend a trait that initializes Tracing? Or that end users should explicitly request tracing. If the later I disagree. This functionality isn't something people are likely to discover.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The former, they should extend a trait. If they use driven they will know its there. Why have tracing code running for everyone as this would have?

(Indeed it seems like tracing here should be a trait anyway that gets extended from. also can allow its methods easier overridden)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A document service is a plugin deployed at runtime. The programmer may not have any idea that one is deployed: that may be controlled by a separate admin.

The tracing code change is just registering a regex representing the boundary of the internals of Scalding. It runs once at start up and has no impact on users.

@JamesIry
Copy link
Contributor Author

JamesIry commented Jan 9, 2015

Understood. Let me know if there's anything I can do to help.

*/
private val defaultRegex = """^com\.twitter\.scalding\.(?!Tool|Job|ExecutionContext).*"""

register()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this here and not in init?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a very cheap way to create one-time-init semantics. I.e. if it ever becomes necessary to put the init() call in other places in Scalding, multiple calls to init() won't actually do anything.

@johnynek
Copy link
Collaborator

Okay, this seems safe enough, and good for Driven users.

johnynek added a commit that referenced this pull request Jan 21, 2015
@johnynek johnynek merged commit e0e2a3b into twitter:develop Jan 21, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants