Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reject task on worker still starting up #21921

Merged
merged 4 commits into from
May 13, 2024

Commits on May 10, 2024

  1. Configuration menu
    Copy the full SHA
    8bd7461 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    a7311cf View commit details
    Browse the repository at this point in the history
  3. Fix StartupStatus in tests

    `StartupStatus.startupComplete` was set in production code, but not in
    tests.
    findepi committed May 10, 2024
    Configuration menu
    Copy the full SHA
    b591db1 View commit details
    Browse the repository at this point in the history
  4. Reject task on worker still starting up

    When worker node is restarted after a crash, coordinator may be still
    unaware of the situation and may attempt to schedule tasks on it.
    Ideally the coordinator should not schedule tasks on worker that is not
    ready, but in pipelined execution there is currently no way to move a
    task.  Accepting a request too early will likely lead to some failure
    and HTTP 500 (INTERNAL_SERVER_ERROR) response. The coordinator won't
    retry on this.  Send 503 (SERVICE_UNAVAILABLE) so that request is
    retried.
    findepi committed May 10, 2024
    Configuration menu
    Copy the full SHA
    8fbbd54 View commit details
    Browse the repository at this point in the history