Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vtadmin-api] vtctld proxy dialer should check that gRPC connection is ready #9422

Closed
doeg opened this issue Dec 17, 2021 · 0 comments · Fixed by #9915
Closed

[vtadmin-api] vtctld proxy dialer should check that gRPC connection is ready #9422

doeg opened this issue Dec 17, 2021 · 0 comments · Fixed by #9915
Assignees
Labels

Comments

@doeg
Copy link
Contributor

doeg commented Dec 17, 2021

Currently, vtadmin-api's vtctld proxy's Dial function will only reinitialize its vtctld connection if (a) it does not have a cached connection (i.e., it's the first time Dial has been called), or (b) if the vtctld connection is explicitly closed.

As a result, if any of the vtctlds to which VTAdmin is connected go away (are deprovisioned, the vtctld service crashes, etc.) then any vtadmin-api endpoint that the vtctlds (keyspaces, workflows, schemas, etc.) will time out.

Ideally, Dial should also check that the gRPC connection is "ready for work", i.e., its connectivity state is ready/idle. If it is in a failure state, then the Dial function should close the connection and rediscover a new vtctld.

This is fairly easy to reproduce locally:

  1. Bring up two vtctlds (e.g., using vtctld-up.sh)
  2. Update VTAdmin's discovery.json file to include both:
{
    "vtctlds": [
        {
            "host": {
                "fqdn": "localhost:15000",
                "hostname": "localhost:15999"
            }
        },
        {
            "host": {
                "fqdn": "localhost:19000",
                "hostname": "localhost:19999"
            }
        }
    ],
    "vtgates": [
        {
            "host": {
                "hostname": "localhost:15991"
            }
        }
    ]
}
  1. Bring up vtadmin-api with ./scripts/vtadmin-up.sh
  2. Make a request against http://localhost:14200/api/keyspaces, which will call Dial and discover one of the two vtctlds. Additional logging to show:
I1217 14:04:51.390344   53482 config.go:122] [rbac]: loaded authorizer with 1 rules
I1217 14:04:51.390402   53482 config.go:146] [rbac]: no authenticator implementation specified
I1217 14:04:51.396443   53482 server.go:240] server vtadmin listening on :14200
I1217 14:04:56.160507   53482 proxy.go:140] Discovering vtctld to dial...
I1217 14:04:56.160575   53482 proxy.go:147] Discovered vtctld localhost:19999
; dialing...
I1217 14:04:56.161394   53482 proxy.go:173] Established connection to vtctld localhost:19999
  1. kill -9 whichever vtctld it established a connection to
  2. Make another request against http://localhost:14200/api/keyspaces to redial

At this point, ideal behaviour is that vtadmin-api will detect that vtctld is no longer available, close the gRPC connection, and then rediscover the other vtctld.

What currently happens is that the gRPC connection just retries forever:

W1217 14:06:34.506722   53482 component.go:41] [core] grpc: addrConn.createTransport failed to connect to {localhost:19999 localhost:19999 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp [::1]:19999: connect: connection refused". Reconnecting...
W1217 14:06:39.317060   53482 component.go:41] [core] grpc: addrConn.createTransport failed to connect to {localhost:19999 localhost:19999 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp [::1]:19999: connect: connection refused". Reconnecting...
W1217 14:06:45.755668   53482 component.go:41] [core] grpc: addrConn.createTransport failed to connect to {localhost:19999 localhost:19999 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp [::1]:19999: connect: connection refused". Reconnecting...
I1217 14:06:50.956459   53482 client.go:86] WaitForReady ClientConn status: TRANSIENT_FAILURE
@doeg doeg added the Type: Bug label Dec 17, 2021
@doeg doeg self-assigned this Dec 17, 2021
@doeg doeg added the Component: VTAdmin VTadmin interface label Dec 17, 2021
@vitessio vitessio deleted a comment from Manasi25 Jan 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant