Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement LoggingAsync for GRPC Worker Services #14604

Merged
merged 11 commits into from Jan 25, 2018
Merged

Implement LoggingAsync for GRPC Worker Services #14604

merged 11 commits into from Jan 25, 2018

Conversation

xldrx
Copy link
Member

@xldrx xldrx commented Nov 16, 2017

This allow RecvTensor events to show up in StepStats and in turn in Chrome Tracing format.

@tensorflow-jenkins
Copy link
Collaborator

Can one of the admins verify this patch?

@jhseu jhseu requested a review from mrry November 16, 2017 04:21
@jhseu jhseu self-assigned this Nov 16, 2017
@xldrx
Copy link
Member Author

xldrx commented Nov 16, 2017

@pbar @mrry @suharshs Could you please take a look at this?

@mrry mrry requested review from prb12 and removed request for mrry November 16, 2017 15:05
@mrry
Copy link
Contributor

mrry commented Nov 16, 2017

Reassigning this to @prb12, since he knows the ins and outs of the logging path better than I do. A few quick observations though:

  • Instead of modifying the BaseRendezvousMgr and related classes to capture this information, please use the existing stubs in GrpcWorkerCache:

    void SetLogging(bool v) override { logger_.SetLogging(v); }
    void ClearLogs() override { logger_.ClearLogs(); }
    bool RetrieveLogs(int64 step_id, StepStats* ss) override {
    return logger_.RetrieveLogs(step_id, ss);
    }

  • The changes in master_session.cc are potentially controversial, because they could cause a lot of additional logging to be generated when FULL_TRACE is specified. We might need an additional TraceLevel for this case.

  • Run clang-format -style=Google over the code to take care of style issues (which should trigger a failure in the presubmit).

- Revert changes to *_rendezvous_mgr
- Implement logging primitives in session_mgr instead
- Implement ClearLogs
- Fixed C++ formating
@xldrx
Copy link
Member Author

xldrx commented Nov 23, 2017

Thanks @mrry for the feed back:

  • I changed the implementation to use SessionMgr instead of *RendezvousMgr.
    Note that unlike other worker rpcs, LoggingRequest in LoggingAsync rpc does not have any session handle, therefore all sessions should be touched. Both implementations (previous one, and this one) iterates over all sessions (previously through RendezvousMgr now SessionMgr), then uses GrpcWorkerCache calls in each session.

  • @pbar: Thoughts on changes in master_session.cc?

@sb2nov
Copy link
Contributor

sb2nov commented Nov 29, 2017

@xldrx can you look into the merge conflicts.

@xldrx
Copy link
Member Author

xldrx commented Dec 16, 2017

Hey @sb2nov,
It seems that Paul is swamped. Could you review this PR?

@yifeif yifeif added awaiting review Pull request awaiting review stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Dec 20, 2017
@drpngx drpngx requested a review from sb2nov December 27, 2017 01:40
@sb2nov sb2nov removed their request for review January 7, 2018 00:56
@sb2nov
Copy link
Contributor

sb2nov commented Jan 7, 2018

@xldrx I don't know enough to review this right now. @prb12 ping.

session_mgr->SetLogging(request->rpc_logging());
if (request->clear()) {
session_mgr->ClearLogs();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If both clear and fetch_step_id are used the semantics are unclear, but I would have thought it made more sense to retrieve any previously collected data before clearing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great observation. Fixed.

Copy link
Member

@prb12 prb12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addiing this!

With regard to an additional TRACE_LEVEL - I don't think that this is likely to be a problem.

I'm assuming this doesn't have a noticeable performance impact when tracing is not enabled?

One minor question about the case where clear and retrieve are done at the same time.

@googlebot
Copy link

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for the commit author(s). If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again. If the bot doesn't comment, it means it doesn't think anything has changed.

@googlebot
Copy link

CLAs look good, thanks!

@xldrx
Copy link
Member Author

xldrx commented Jan 15, 2018

@prb12 Thanks for the review.

I'm assuming this doesn't have a noticeable performance impact when tracing is not enabled?

This PR does nothing when tracing is not enabled.

One minor question about the case where clear and retrieve are done at the same time.

Great observation. Fixed.

@prb12 prb12 added the kokoro:force-run Tests on submitted change label Jan 19, 2018
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Jan 19, 2018
@rmlarsen rmlarsen added awaiting testing (then merge) kokoro:force-run Tests on submitted change and removed awaiting review Pull request awaiting review stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Jan 23, 2018
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Jan 23, 2018
@rmlarsen
Copy link
Member

@xldrx This appears to have numerous broken tests. Can you please take a look?

@rmlarsen rmlarsen added the stat:awaiting response Status - Awaiting response from author label Jan 24, 2018
@prb12 prb12 added the kokoro:force-run Tests on submitted change label Jan 24, 2018
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Jan 24, 2018
@rmlarsen rmlarsen added the kokoro:force-run Tests on submitted change label Jan 25, 2018
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Jan 25, 2018
@rmlarsen rmlarsen removed the stat:awaiting response Status - Awaiting response from author label Jan 25, 2018
@rmlarsen
Copy link
Member

@xldrx Thanks for the contribution!

@rmlarsen rmlarsen merged commit aa1e754 into tensorflow:master Jan 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants