Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: Premature buffering stop during concurrent external reparenting can lead to query failures #16438

Open
arthurschreiber opened this issue Jul 22, 2024 · 1 comment

Comments

@arthurschreiber
Copy link
Contributor

Overview of the Issue

With the keyspace_events buffer implementation, we see that sometimes buffering stops before the failover has actually been detected and processed by the healthcheck stream, causing buffered queries to be sent to the demoted primary.

Here's the log output from one vtgate process:

{"_time":"2024-07-22T13:23:30.708+00:00","message":"Starting buffering for shard: <redacted>/20-30 (window: 5s, size: 1000, max failover duration: 5s) (A failover was detected by this seen error: vttablet: rpc error: code = Code(17) desc = The MySQL server is running with the --super-read-only option so it cannot execute this statement (errno 1290) (sqlstate HY000) (CallerID: issues_pull_requests_rw_1).)"}
{"_time":"2024-07-22T13:23:33.840+00:00","message":"Starting buffering for shard: <redacted>/30-40 (window: 5s, size: 1000, max failover duration: 5s) (A failover was detected by this seen error: vttablet: rpc error: code = Code(17) desc = The MySQL server is running with the --super-read-only option so it cannot execute this statement (errno 1290) (sqlstate HY000) (CallerID: issues_pull_requests_rw_1).)"}
{"_time":"2024-07-22T13:23:33.856+00:00","message":"Adding 1 to PrimaryPromoted counter for target: keyspace:\"<redacted>\" shard:\"20-30\" tablet_type:REPLICA, tablet: <redacted>-0171233832, tabletType: PRIMARY"}
{"_time":"2024-07-22T13:23:33.856+00:00","message":"keyspace event resolved: <redacted>/<redacted> is now consistent (serving: true)"}
{"_time":"2024-07-22T13:23:33.856+00:00","message":"keyspace event resolved: <redacted>/<redacted> is now consistent (serving: true)"}
{"_time":"2024-07-22T13:23:33.856+00:00","message":"keyspace event resolved: <redacted>/<redacted> is now consistent (serving: true)"}
{"_time":"2024-07-22T13:23:33.856+00:00","message":"keyspace event resolved: <redacted>/<redacted> is now consistent (serving: true)"}
{"_time":"2024-07-22T13:23:33.856+00:00","message":"keyspace event resolved: <redacted>/<redacted> is now consistent (serving: true)"}
{"_time":"2024-07-22T13:23:33.856+00:00","message":"keyspace event resolved: <redacted>/<redacted> is now consistent (serving: true)"}
{"_time":"2024-07-22T13:23:33.856+00:00","message":"keyspace event resolved: <redacted>/<redacted> is now consistent (serving: true)"}
{"_time":"2024-07-22T13:23:33.856+00:00","message":"keyspace event resolved: <redacted>/<redacted> is now consistent (serving: true)"}
{"_time":"2024-07-22T13:23:33.856+00:00","message":"keyspace event resolved: <redacted>/<redacted> is now consistent (serving: true)"}
{"_time":"2024-07-22T13:23:33.856+00:00","message":"keyspace event resolved: <redacted>/<redacted> is now consistent (serving: true)"}
{"_time":"2024-07-22T13:23:33.856+00:00","message":"keyspace event resolved: <redacted>/<redacted> is now consistent (serving: true)"}
{"_time":"2024-07-22T13:23:33.856+00:00","message":"keyspace event resolved: <redacted>/<redacted> is now consistent (serving: true)"}
{"_time":"2024-07-22T13:23:33.856+00:00","message":"keyspace event resolved: <redacted>/<redacted> is now consistent (serving: true)"}
{"_time":"2024-07-22T13:23:33.856+00:00","message":"keyspace event resolved: <redacted>/<redacted> is now consistent (serving: true)"}
{"_time":"2024-07-22T13:23:33.856+00:00","message":"keyspace event resolved: <redacted>/<redacted> is now consistent (serving: true)"}
{"_time":"2024-07-22T13:23:33.856+00:00","message":"keyspace event resolved: <redacted>/<redacted> is now consistent (serving: true)"}
{"_time":"2024-07-22T13:23:33.856+00:00","message":"disruption in shard <redacted>/80-90 resolved (serving: true)"}
{"_time":"2024-07-22T13:23:33.856+00:00","message":"disruption in shard <redacted>/c0-d0 resolved (serving: true)"}
{"_time":"2024-07-22T13:23:33.856+00:00","message":"disruption in shard <redacted>/b0-c0 resolved (serving: true)"}
{"_time":"2024-07-22T13:23:33.856+00:00","message":"disruption in shard <redacted>/e0-f0 resolved (serving: true)"}
{"_time":"2024-07-22T13:23:33.856+00:00","message":"disruption in shard <redacted>/-10 resolved (serving: true)"}
{"_time":"2024-07-22T13:23:33.856+00:00","message":"disruption in shard <redacted>/f0- resolved (serving: true)"}
{"_time":"2024-07-22T13:23:33.856+00:00","message":"disruption in shard <redacted>/60-70 resolved (serving: true)"}
{"_time":"2024-07-22T13:23:33.856+00:00","message":"disruption in shard <redacted>/40-50 resolved (serving: true)"}
{"_time":"2024-07-22T13:23:33.856+00:00","message":"disruption in shard <redacted>/d0-e0 resolved (serving: true)"}
{"_time":"2024-07-22T13:23:33.856+00:00","message":"disruption in shard <redacted>/50-60 resolved (serving: true)"}
{"_time":"2024-07-22T13:23:33.856+00:00","message":"disruption in shard <redacted>/90-a0 resolved (serving: true)"}
{"_time":"2024-07-22T13:23:33.856+00:00","message":"disruption in shard <redacted>/10-20 resolved (serving: true)"}
{"_time":"2024-07-22T13:23:33.856+00:00","message":"disruption in shard <redacted>/a0-b0 resolved (serving: true)"}
{"_time":"2024-07-22T13:23:33.856+00:00","message":"disruption in shard <redacted>/70-80 resolved (serving: true)"}
{"_time":"2024-07-22T13:23:33.856+00:00","message":"disruption in shard <redacted>/20-30 resolved (serving: true)"}
{"_time":"2024-07-22T13:23:33.856+00:00","message":"Stopping buffering for shard: <redacted>/20-30 after: 3.1 seconds due to: a primary promotion has been detected. Draining 50 buffered requests now."}
{"_time":"2024-07-22T13:23:33.856+00:00","message":"disruption in shard <redacted>/30-40 resolved (serving: true)"}
{"_time":"2024-07-22T13:23:33.856+00:00","message":"Stopping buffering for shard: <redacted>/30-40 after: 0.0 seconds due to: a primary promotion has been detected. Draining 1 buffered requests now."}
{"_time":"2024-07-22T13:23:33.856+00:00","message":"Draining finished for shard: <redacted>/30-40 Took: 195.979µs for: 1 requests."}
{"_time":"2024-07-22T13:23:33.960+00:00","message":"FailoverTooRecent-<redacted>/30-40: NOT starting buffering for shard: <redacted>/30-40 because the last failover which triggered buffering is too recent (104.215357ms < 1m0s). (A failover was detected by this seen error: Code: CLUSTER_EVENT"}
{"_time":"2024-07-22T13:23:34.016+00:00","message":"Draining finished for shard: <redacted>/20-30 Took: 159.349351ms for: 50 requests."}
{"_time":"2024-07-22T13:23:34.642+00:00","message":"not marking healthy primary <redacted>-0171231759 as Up for <redacted>/20-30 because its PrimaryTermStartTime is smaller than the highest known timestamp from previous PRIMARYs <redacted>-0171233832: -62135596800 < 1721654613 "}
{"_time":"2024-07-22T13:23:36.743+00:00","message":"Adding 1 to PrimaryPromoted counter for target: keyspace:\"<redacted>\" shard:\"30-40\" tablet_type:REPLICA, tablet: <redacted>-0171233041, tabletType: PRIMARY"}
{"_time":"2024-07-22T13:23:36.743+00:00","message":"keyspace event resolved: <redacted>/<redacted> is now consistent (serving: true)"}
{"_time":"2024-07-22T13:23:36.743+00:00","message":"keyspace event resolved: <redacted>/<redacted> is now consistent (serving: true)"}
{"_time":"2024-07-22T13:23:36.743+00:00","message":"keyspace event resolved: <redacted>/<redacted> is now consistent (serving: true)"}
{"_time":"2024-07-22T13:23:36.743+00:00","message":"keyspace event resolved: <redacted>/<redacted> is now consistent (serving: true)"}
{"_time":"2024-07-22T13:23:36.743+00:00","message":"keyspace event resolved: <redacted>/<redacted> is now consistent (serving: true)"}
{"_time":"2024-07-22T13:23:36.743+00:00","message":"keyspace event resolved: <redacted>/<redacted> is now consistent (serving: true)"}
{"_time":"2024-07-22T13:23:36.743+00:00","message":"keyspace event resolved: <redacted>/<redacted> is now consistent (serving: true)"}
{"_time":"2024-07-22T13:23:36.743+00:00","message":"keyspace event resolved: <redacted>/<redacted> is now consistent (serving: true)"}
{"_time":"2024-07-22T13:23:36.743+00:00","message":"keyspace event resolved: <redacted>/<redacted> is now consistent (serving: true)"}
{"_time":"2024-07-22T13:23:36.743+00:00","message":"keyspace event resolved: <redacted>/<redacted> is now consistent (serving: true)"}
{"_time":"2024-07-22T13:23:36.743+00:00","message":"keyspace event resolved: <redacted>/<redacted> is now consistent (serving: true)"}
{"_time":"2024-07-22T13:23:36.743+00:00","message":"keyspace event resolved: <redacted>/<redacted> is now consistent (serving: true)"}
{"_time":"2024-07-22T13:23:36.743+00:00","message":"keyspace event resolved: <redacted>/<redacted> is now consistent (serving: true)"}
{"_time":"2024-07-22T13:23:36.743+00:00","message":"keyspace event resolved: <redacted>/<redacted> is now consistent (serving: true)"}
{"_time":"2024-07-22T13:23:36.743+00:00","message":"keyspace event resolved: <redacted>/<redacted> is now consistent (serving: true)"}
{"_time":"2024-07-22T13:23:36.743+00:00","message":"keyspace event resolved: <redacted>/<redacted> is now consistent (serving: true)"}
{"_time":"2024-07-22T13:23:36.744+00:00","message":"disruption in shard <redacted>/50-60 resolved (serving: true)"}
{"_time":"2024-07-22T13:23:36.744+00:00","message":"disruption in shard <redacted>/60-70 resolved (serving: true)"}
{"_time":"2024-07-22T13:23:36.744+00:00","message":"disruption in shard <redacted>/40-50 resolved (serving: true)"}
{"_time":"2024-07-22T13:23:36.744+00:00","message":"disruption in shard <redacted>/d0-e0 resolved (serving: true)"}
{"_time":"2024-07-22T13:23:36.744+00:00","message":"disruption in shard <redacted>/70-80 resolved (serving: true)"}
{"_time":"2024-07-22T13:23:36.744+00:00","message":"disruption in shard <redacted>/90-a0 resolved (serving: true)"}
{"_time":"2024-07-22T13:23:36.744+00:00","message":"disruption in shard <redacted>/10-20 resolved (serving: true)"}
{"_time":"2024-07-22T13:23:36.744+00:00","message":"disruption in shard <redacted>/a0-b0 resolved (serving: true)"}
{"_time":"2024-07-22T13:23:36.744+00:00","message":"disruption in shard <redacted>/30-40 resolved (serving: true)"}
{"_time":"2024-07-22T13:23:36.744+00:00","message":"disruption in shard <redacted>/20-30 resolved (serving: true)"}
{"_time":"2024-07-22T13:23:36.744+00:00","message":"disruption in shard <redacted>/f0- resolved (serving: true)"}
{"_time":"2024-07-22T13:23:36.744+00:00","message":"disruption in shard <redacted>/80-90 resolved (serving: true)"}
{"_time":"2024-07-22T13:23:36.744+00:00","message":"disruption in shard <redacted>/c0-d0 resolved (serving: true)"}
{"_time":"2024-07-22T13:23:36.744+00:00","message":"disruption in shard <redacted>/b0-c0 resolved (serving: true)"}
{"_time":"2024-07-22T13:23:36.744+00:00","message":"disruption in shard <redacted>/e0-f0 resolved (serving: true)"}
{"_time":"2024-07-22T13:23:36.744+00:00","message":"disruption in shard <redacted>/-10 resolved (serving: true)"}
{"_time":"2024-07-22T13:23:37.282+00:00","message":"not marking healthy primary <redacted>-0171232808 as Up for <redacted>/30-40 because its PrimaryTermStartTime is smaller than the highest known timestamp from previous PRIMARYs <redacted>-0171233041: -62135596800 < 1721654616 "}
{"_time":"2024-07-22T13:23:38.961+00:00","message":"FailoverTooRecent-<redacted>/30-40: skipped 6 log messages"}
{"_time":"2024-07-22T13:23:38.961+00:00","message":"Execute: skipped 3 log messages"}

I think what's happening here is that primaries of the 20-30 and 30-40 shard went into read-only mode due to the external failover at roughly the same time, which in turn caused buffering to start on both these shards in quick succession.

Once the primary failover on shard 20-30 was done and Vitess was notified about the new primary via a TabletExternallyReparented call, the whole keyspace was detected as being consistent again - including the 30-40 shard which was still in the midst of an external failover. This caused the buffering on the 20-30 and the 30-40 shard to stop, while the 30-40 shard was not failed over yet.

Queries that performed write operations against the 30-40 shard started noticeably failing, until the external failover was finished.

Reproduction Steps

N/A

Binary Version

v17+

Operating System and Environment details

N/A

Log Fragments

N/A
@arthurschreiber arthurschreiber added Type: Bug Needs Triage This issue needs to be correctly labelled and triaged Component: VTGate and removed Needs Triage This issue needs to be correctly labelled and triaged labels Jul 22, 2024
@arthurschreiber
Copy link
Contributor Author

@deepthi @vmg This wasn't an issue in v17 and earlier with --buffer_implementation=healthcheck - but that implementation was deprecated and removed in v18.

I'm a bit at a loss of how this could be fixed. Buffering starts because the vtgate notices that the vttablet is in read-only mode (but still serving), but keyspace events don't know about this and instead make decisions based solely on the serving state of the primary (which in this case is happily reporting that it's up and healthy even though it's in readonly mode during the external failover).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant