Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect -disable_active_reparents in backup/restore #7576

Merged
merged 2 commits into from Mar 10, 2021

Conversation

deepthi
Copy link
Member

@deepthi deepthi commented Mar 1, 2021

Description

When the vttablet flag -disable_active_reparents is set, then vitess should not be managing the mysql level replication.
However, it turns out that this was not being respected by backup/restore.
This PR fixes that by adding checks of the flag before restarting replication after a backup/restore. In terms of backup, this is only applicable to offline backups.
In addition, we put the vttablet into orchestrator maintenance mode during backup/restore. We require replication to be off and we do not want orchestrator noticing that it is off and enabling it again while a backup or restore is in progress.

Related Issue(s)

Fixes #7657

Checklist

  • Should this PR be backported?
  • Tests were added or are not required
  • Documentation was added or is not required

Impacted Areas in Vitess

Components that this PR will affect:

  • Query Serving
  • VReplication
  • Cluster Management
  • Build/CI
  • VTAdmin

…t. Also, Begin and End Maintenance on orchestrator

Signed-off-by: deepthi <deepthi@planetscale.com>
…. Also, Begin and End Maintenance on orchestrator

Signed-off-by: deepthi <deepthi@planetscale.com>
@deepthi deepthi requested review from aquarapid and enisoc and removed request for rohit-nayak-ps March 1, 2021 22:13
@deepthi deepthi added this to In progress in Cluster Management Mar 1, 2021
Cluster Management automation moved this from In progress to Reviewer approved Mar 10, 2021
Copy link
Contributor

@aquarapid aquarapid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@deepthi deepthi merged commit ff90ed1 into vitessio:master Mar 10, 2021
Cluster Management automation moved this from Reviewer approved to Done Mar 10, 2021
@deepthi deepthi deleted the ds-restore-with-orc branch March 10, 2021 20:00
return vterrors.Wrap(err, "MysqlDaemon.SetMaster failed")
}

// If active reparents are disabled, we don't restart replication. So it makes no sense to wait for an update on the replica.
// Return immediately.
if !*mysqlctl.DisableActiveReparents {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be the opposite?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing this out. It would have been a regression. I have opened #7703 to fix this.

if tm.orc == nil {
return
}
if err := tm.orc.BeginMaintenance(tm.Tablet(), "vttablet has been told to Restore"); err != nil {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from our experiment, tm.Tablet() seems to give an out dated state and we are getting cannot find mysql port error, we ended up using tm.tmState.Tablet(). Let me know if you also experience it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tm.Tablet() simply returns tm.tmState.Tablet() so they should be equivalent. I do see that it is possible for a race condition to occur between checkMysql and handleRestore.

err := tm.restoreDataLocked(ctx, logger, waitForBackupInterval, deleteBeforeRestore)
// Tell Orchestrator we're no longer stopped on purpose.
// Do this in the background, as it's best-effort.
go func() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we only EndMaintenance if restoreDataLocked returns no error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems reasonable.

@@ -124,6 +134,16 @@ func (tm *TabletManager) Backup(ctx context.Context, concurrency int, logger log
}
returnErr = err
}
// Tell Orchestrator we're no longer stopped on purpose.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Zhannan what about here? Should this also be called only if err == nil?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

Backup/restore operations do not respect -disable_active_reparents flag
5 participants