Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: reserved connection retry logic when vttablet or mysql is down #10005

Merged
merged 5 commits into from
Mar 31, 2022

Conversation

harshit-gangal
Copy link
Member

@harshit-gangal harshit-gangal commented Mar 29, 2022

Description

This PR fixes issues with reserved connection when the vttablet or the underlying MySQL process is down.
The VTGate needs to understand the error codes/messages to redirect it to the new available tablet.

Issue 1: 
MySQL error: code 2002/2003 when the underlying MySQL is not available for connection.

Issue 2: 
Vttablet is down: ERROR 1105 (HY000): tablet: cell:"zone1" uid:100 is either down or nonexistent

Issue 3:
Vttablet tx engine is closed: tx engine can't accept new connections in state NotServing

In all the above scenarios the current reserved connection held should be dropped and a new one should be created by calling the ReserveExecute API to the tablet.

Related Issue(s)

Checklist

  • Should this PR be backported?
  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

if err != nil {
return nil, err
}
qs = rs.Gateway
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea here that we route it through the gateway again and it will pick a tablet for us?
That's clever 💯

Signed-off-by: Harshit Gangal <harshit@planetscale.com>
…ablet is down then it should go through gateway

Signed-off-by: Harshit Gangal <harshit@planetscale.com>
… if needed and not executeLock

Signed-off-by: Harshit Gangal <harshit@planetscale.com>
Signed-off-by: Harshit Gangal <harshit@planetscale.com>
… healthy tablet for the given tablet_type

Signed-off-by: Harshit Gangal <harshit@planetscale.com>
}

func requireNewQS(err error, target *querypb.Target) bool {
code := vterrors.Code(err)
msg := err.Error()
return (code == vtrpcpb.Code_FAILED_PRECONDITION && vterrors.RxWrongTablet.MatchString(msg)) ||
(code == vtrpcpb.Code_CLUSTER_EVENT && ((target != nil && target.TabletType == topodatapb.TabletType_PRIMARY) || vterrors.RxOp.MatchString(msg)))
if (code == vtrpcpb.Code_FAILED_PRECONDITION && vterrors.RxWrongTablet.MatchString(msg)) ||
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe this could benefit from the same treatment as ☝️

@harshit-gangal harshit-gangal merged commit 2b7e2ef into vitessio:main Mar 31, 2022
@harshit-gangal harshit-gangal deleted the fix-rconn-retry branch March 31, 2022 06:36
@frouioui frouioui mentioned this pull request Apr 4, 2022
24 tasks
harshit-gangal added a commit to planetscale/vitess that referenced this pull request Apr 7, 2022
…itessio#10005)

* fix: when mysql is down and vttablet is up

Signed-off-by: Harshit Gangal <harshit@planetscale.com>

* fix: in reserved conn with no transaction if the queryservice i.e vttablet is down then it should go through gateway

Signed-off-by: Harshit Gangal <harshit@planetscale.com>

* refactor: execute and streamExecute should be able to reset the shard if needed and not executeLock

Signed-off-by: Harshit Gangal <harshit@planetscale.com>

* test: e2e test for bug fix

Signed-off-by: Harshit Gangal <harshit@planetscale.com>

* fix: when mysql is down, query should go through gateway to elect new healthy tablet for the given tablet_type

Signed-off-by: Harshit Gangal <harshit@planetscale.com>
frouioui pushed a commit that referenced this pull request Apr 7, 2022
…10005) (#10052)

* fix: when mysql is down and vttablet is up

Signed-off-by: Harshit Gangal <harshit@planetscale.com>

* fix: in reserved conn with no transaction if the queryservice i.e vttablet is down then it should go through gateway

Signed-off-by: Harshit Gangal <harshit@planetscale.com>

* refactor: execute and streamExecute should be able to reset the shard if needed and not executeLock

Signed-off-by: Harshit Gangal <harshit@planetscale.com>

* test: e2e test for bug fix

Signed-off-by: Harshit Gangal <harshit@planetscale.com>

* fix: when mysql is down, query should go through gateway to elect new healthy tablet for the given tablet_type

Signed-off-by: Harshit Gangal <harshit@planetscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants