You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If there are large container scheduling/start-up delays, jobs can fail due to this. We should remove these timeouts entirely. We also then don't need the tony.task.registration-retry-count property either.
The text was updated successfully, but these errors were encountered:
Currently, there are a couple timeouts involved in worker/parameter server registration:
tony.task.registration-timeout-sec
(default 300 sec)If there are large container scheduling/start-up delays, jobs can fail due to this. We should remove these timeouts entirely. We also then don't need the
tony.task.registration-retry-count
property either.The text was updated successfully, but these errors were encountered: