Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration testing in CI #7344

Open
deepthi opened this issue Jan 21, 2021 · 8 comments
Open

Migration testing in CI #7344

deepthi opened this issue Jan 21, 2021 · 8 comments

Comments

@deepthi
Copy link
Member

deepthi commented Jan 21, 2021

Feature Description

In #7294, we automated upgrading and downgrading the build version by 1 major version. There are some additional scenarios that should be tested to ensure that we are complying with VEP-3

  • Ability to restore backups taken with the previous version after upgrade/downgrade
  • Ability for a cluster to operate correctly in the midst of an upgrade / downgrade. E.g. vtctld and vttablets are still running but vtgate is restarted with the new version.

Use Case(s)

  • Documented in VEP-3
@shlomi-noach
Copy link
Contributor

I will work on the 2nd bullet, ie halfway-upgrade.

E.g. vtctld and vttablets are still running but vtgate is restarted with the new version.

What's a complete list (order by importance desc) for half-upgrade setups I should be investigating?

@deepthi
Copy link
Member Author

deepthi commented Jan 22, 2021

  1. Upgrade vtgate only
  2. Upgrade vtctld only
  3. Upgrade all vttablets
  4. Upgrade a subset of vttablets
  5. Combination of 1 or 2 with 3 or 4

@shlomi-noach
Copy link
Contributor

Cool, thank you! The approach I will take is to always take the cluster down, then to start on top of the existing data with mixed binaries. Thus, the approach will not test e.g. just taking down vtgate and bringing up a newer vtgate. We can strive to do that, too, at some point in the future. For now the approach I'm suggesting is easiest.

I think we may be able to squeeze in all of these tests in a single CI and within a reasonable amount of runtime, seeing that most of the time is spent on setup/build. But we may need to split to two different CI workflows.

@deepthi
Copy link
Member Author

deepthi commented Jan 25, 2021

That approach should be fine.

@shlomi-noach
Copy link
Contributor

See #7368 for partial upgrade tests

@frouioui
Copy link
Member

Pull request #8471 relates to this issue.

@deepthi
Copy link
Member Author

deepthi commented Oct 22, 2021

Some more details on how we currently do manual testing:

  • backup and restore
    • create a cluster which has version n of all components
    • insert some data
    • take a backup
    • downgrade vttablets to previous version n-1
    • restore a replica from backup
    • take a backup
    • upgrade tablets to n
    • restore a replica from backup
  • cluster management
    • create a cluster with version n
    • downgrade tablets to version n-1
    • run PlannedReparentShard to change the primary
    • upgrade tablets to version n
    • downgrade vtctld to version n-1
    • run PlannedReparentShard again
    • need a similar test for EmergencyReparentShard
    • probably also one for InitShardPrimary not required since there is an existing cluster
  • query serving
    • create a cluster with version n
    • insert some data
    • downgrade vtgate to n-1
    • queries should work
    • upgrade vtgate to n
    • downgrade tablets to n-1
    • queries should still work

@deepthi deepthi assigned frouioui and unassigned shlomi-noach Nov 12, 2021
@frouioui frouioui removed the LFX label Nov 30, 2021
@frouioui
Copy link
Member

I have opened #9300, I am starting the implementation of the matrix detailed in #7344 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants