Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tf.data server] prevent restart of the dispatcher server w/o fault-tolerance, in the test bed #43714

Merged
merged 3 commits into from Oct 5, 2020
Merged

Conversation

kvignesh1420
Copy link
Member

This PR is a follow up of #43691 and prevents a restart of the dispatcher if it has been configured without fault-tolerance. Failing to do so will lead to a perpetual stream of warn messages on a dispatcher restart and unnecessarily extend the testing time without an immediate exit.

@aaudiber until the enhancements are implemented to handle such scenarios (as per our discussion), this will help us in handling the upcoming test cases.

@google-ml-butler google-ml-butler bot added the size:S CL Change Size: Small label Oct 1, 2020
Copy link
Contributor

@aaudiber aaudiber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kvignesh1420

"""Stops `dispatcher` and creates a new dispatcher with the same port."""
"""Stops `dispatcher` and creates a new dispatcher with the same port.

When the dispatcher is restarted with `fault-tolerant_mode=False`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This scenario is too specific, can we make the docstring more general, e.g. "Restarting is only supported when the dispatcher is configured with fault_tolerant_mode=True"?

@gbaned gbaned self-assigned this Oct 2, 2020
@gbaned gbaned added the comp:data tf.data related issues label Oct 2, 2020
@gbaned gbaned added this to Assigned Reviewer in PR Queue via automation Oct 2, 2020
@kvignesh1420
Copy link
Member Author

Made the change, @aaudiber.

@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Oct 2, 2020
PR Queue automation moved this from Assigned Reviewer to Approved by Reviewer Oct 2, 2020
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Oct 2, 2020
@tensorflow-copybara tensorflow-copybara merged commit f7d8a38 into tensorflow:master Oct 5, 2020
PR Queue automation moved this from Approved by Reviewer to Merged Oct 5, 2020
@kvignesh1420
Copy link
Member Author

@aaudiber, I had a quick question regarding the updates to the workers based on the task list.
Reference:

// TODO(aaudibert): Instead of polling, have dispatcher send updates when

How would we identify whether the task list changed or not if we don't poll for changes?
Please let me know.

@kvignesh1420 kvignesh1420 deleted the tfdataserver-restarts branch October 31, 2020 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes comp:data tf.data related issues ready to pull PR ready for merge process size:S CL Change Size: Small
Projects
PR Queue
  
Merged
Development

Successfully merging this pull request may close these issues.

None yet

6 participants