Feature Request: allow VTOrc to start with recoveries disabled #18007

timvaillancourt · 2025-03-22T21:37:07Z

Feature Description

As a user of VTOrc, I would like to be able to start VTOrc will all recoveries disabled. It's currently possible to do this using the HTTP API, but there is a short period where recoveries will be enabled between the time VTOrc starts up and this API is called (needs to be done per instance)

This issue proposes a new flag --allow-recovery is added to achieve this

Use Case(s)

A user that would like no VTOrc recoveries to happen

We run a patch that adds this functionality, and it was instrumental in the rollout of VTOrc in Slack's production. This feature allowed us to validate (and later optimize) the topo and discovery performance in advance of switching over to VTOrc. This "dry-run-like" mode also allowed us to gain confidence in what VTOrc would do when enabled for the first time on a keyspace

The text was updated successfully, but these errors were encountered:

deepthi · 2025-03-23T09:58:13Z

I'm curious to hear whether other people see the need for this. How exactly did you use the dry-run-like mode?

timvaillancourt · 2025-03-24T21:48:01Z

@deepthi I think this is mostly useful for initial migration to VTOrc, for performance tuning an existing install or for setting up a new keyspace/cluster

In our case, the "dry-run" (or perhaps "discover-only") mode was used to:

Understand what problems VTOrc would fix it (if it were made active) while our old solution remained in charge. Using the logs, this helped us gain confidence of what existing problems would be solved when VTOrc took over, what volume of reparents might happen all at once when it starts (potentially risking scatters and/or the topo), etc
- VTOrc logs explain what VTOrc "would have done" when discoveries are disabled
Verify the topo and discovery performance before VTOrc takes over, potentially struggling with the workload. When we first cut a large keyspace over to VTOrc before the v22 optimizations (and this functionality), the instance struggled significantly and would have taken a long time to respond to unplanned events, and we intentionally avoided two systems being active
- During the "dry-run" style tuning I used metrics outputted from VTOrc to measure performance wins, the most important metrics being DiscoveriesInstancePollSecondsExceeded, queue sizes and CPU capacity
- Another use case: I plan to run a dedicated VTOrc for further performance tunings, pointed at the same --clusters_to_watch as our busiest VTOrc pool/group, but with discoveries disabled. This will allow us to compare the impact of future fixes
- I don't expect many VTOrc users to submit performance fixes, but they may want to tune the existing flags and/or # of tablets in advance of cutover

timvaillancourt added the Needs Triage label Mar 22, 2025

timvaillancourt linked a pull request Mar 22, 2025 that will close this issue

vtorc: allow recoveries to be disabled from startup #18005

Open

5 tasks

frouioui added Type: Feature Component: VTorc and removed Needs Triage labels Mar 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: allow VTOrc to start with recoveries disabled #18007

Feature Request: allow VTOrc to start with recoveries disabled #18007

timvaillancourt commented Mar 22, 2025

deepthi commented Mar 23, 2025

Uh oh!

timvaillancourt commented Mar 24, 2025 •

edited

Loading

Uh oh!

Feature Request: allow VTOrc to start with recoveries disabled #18007

Feature Request: allow VTOrc to start with recoveries disabled #18007

Comments

timvaillancourt commented Mar 22, 2025

Feature Description

Use Case(s)

deepthi commented Mar 23, 2025

Uh oh!

timvaillancourt commented Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timvaillancourt commented Mar 24, 2025 •

edited

Loading