Skip to content

Introduce grpc health check in etcd client #16276

Closed as not planned
Closed as not planned
@chaochn47

Description

@chaochn47

What would you like to be added?

Background

gRPC Health checks are used to probe whether the server is able to handle RPCs. A server may choose to reply “unhealthy” because it is not ready to take requests, it is shutting down or some other reason. The client can act accordingly if the response is not received within some time window or the response says unhealthy in it.

ref.

  1. https://github.com/grpc/grpc/blob/master/doc/health-checking.md
  2. https://github.com/grpc/proposal/blob/master/A17-client-side-health-checking.md

#8121 added basic grpc health service only on server side since etcd v3.3.

// server should register all the services manually
// use empty service name for all etcd services' health status,
// see https://github.com/grpc/grpc/blob/master/doc/health-checking.md for more
hsrv := health.NewServer()
hsrv.SetServingStatus("", healthpb.HealthCheckResponse_SERVING)
healthpb.RegisterHealthServer(grpcServer, hsrv)

Problem

In a multi etcd server endpoints scenario, etcd client only fails over to the other endpoint when existing connection/channel is in not in Ready state. However, etcd client does not know about if etcd server can handle RPCs.

For example

  1. defrag stops the process completely for noticeable duration #9222
  2. Removed etcd member failed to stop on stuck disk #14338

It needs a comprehensive design and testing.

Placeholder google doc etcd client grpc health check copied from the KEP template.

Why is this needed?

Improve etcd availability by failing over to other healthy etcd endpoints

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions