New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[tf.data server] prevent restart of the dispatcher server w/o fault-tolerance, in the test bed #43714
[tf.data server] prevent restart of the dispatcher server w/o fault-tolerance, in the test bed #43714
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kvignesh1420
"""Stops `dispatcher` and creates a new dispatcher with the same port.""" | ||
"""Stops `dispatcher` and creates a new dispatcher with the same port. | ||
|
||
When the dispatcher is restarted with `fault-tolerant_mode=False`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This scenario is too specific, can we make the docstring more general, e.g. "Restarting is only supported when the dispatcher is configured with fault_tolerant_mode=True
"?
Made the change, @aaudiber. |
@aaudiber, I had a quick question regarding the updates to the workers based on the task list.
How would we identify whether the task list changed or not if we don't poll for changes? |
This PR is a follow up of #43691 and prevents a restart of the dispatcher if it has been configured without fault-tolerance. Failing to do so will lead to a perpetual stream of
warn
messages on a dispatcher restart and unnecessarily extend the testing time without an immediate exit.@aaudiber until the enhancements are implemented to handle such scenarios (as per our discussion), this will help us in handling the upcoming test cases.