Skip to content

gRPC health check may say the server is unhealthy even if it's responding successfully to GetSystemInfo #5015

@josh-berry

Description

@josh-berry

For full context, see the discussion on this PR: temporalio/cli#368 (comment)

Expected Behavior

If GetSystemInfo returns successfully, the gRPC health check should also pass.

Actual Behavior

For a period of up to about 1 second after GetSystemInfo succeeds, the gRPC health check may fail (returning NOT_SERVING), falsely indicating that gRPC is down when it's not.

This was causing frequent intermittent failures (such as this one) in the CLI CI/CD pipeline until we worked around it in temporalio/cli#368 .

Steps to Reproduce the Problem

  1. Launch the server
  2. Immediately try to connect to it using the Go SDK. (The Go SDK will wait for a successful GetSystemInfo response before returning a client object to the caller.)
  3. Once the Go SDK returns a client object, immediately use the client object to perform a health check of the server.
  4. Intermittently, the health check will fail.

Specifications

  • Version: 1.22.0
  • Platform: Seen on all platforms

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions