Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[reparentutil / wrangler] Extract PlannedReparentShard logic from wrangler to PlannedReparenter struct #7501

Merged
merged 7 commits into from Feb 18, 2021

Conversation

ajm188
Copy link
Contributor

@ajm188 ajm188 commented Feb 16, 2021

Description

This PR extracts the PRS logic from ./go/vt/wrangler to a dedicated PlannedReparenter struct in ./go/vt/vtctl/reparentutil, to be shared between the legacy and new vtctl APIs in my next PR, similar to what I did for ERS in #7464.

Some notes on commits

All the commits that begin with "wip" are just adding tests, and it's how I committed changes as I went, so it may be useful to step through that way, but before merging I'd like to squash all of those — and just those ones — into a single, "Add all the tests" commit.

Related Issue(s)

Checklist

  • Should this PR be backported? No
  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

Impacted Areas in Vitess

Components that this PR will affect:

  • Query Serving
  • VReplication
  • Cluster Management
  • Build/CI
  • VTAdmin

… PRS

This will eventually replace the equivalent logic in wrangler, which can
then delegate into this module, along with `VtctldServer` when we add
PRS to that API.

Signed-off-by: Andrew Mason <amason@slack-corp.com>
This allows conveniently setting up a shard with multiple tablets
claiming to be MASTER, with the shard be correctly configured to have
_a_ serving master. Previously, those were mutually-exclusive setups
with `AddTablets`.

Signed-off-by: Andrew Mason <amason@slack-corp.com>
@doeg
Copy link
Contributor

doeg commented Feb 17, 2021

I can review this from a Go perspective but, as with #7464, I defer to experts on the Vitess-y parts. @deepthi, @setassociative, and/or @rohit-nayak-ps would you be able to give this one a look too?

@ajm188
Copy link
Contributor Author

ajm188 commented Feb 17, 2021

Test failure is coming from a PRS endtoend test, but looks like a port reservation issue. Going to kick it again and also try to repro locally

I0217 02:51:04.937384    1595 vtctlclient_process.go:149] Executing vtctlclient with command: vtctlclient -server localhost:23806 PlannedReparentShard -keyspace_shard ks/0 -new_master zone1-0000000102
    reparent_test.go:348: 
        	Error Trace:	reparent_test.go:348
        	Error:      	"E0217 02:51:12.928862    4210 main.go:67] E0217 02:51:12.928491 planned_reparenter.go:622] some replicas failed to reparent; retry PlannedReparentShard with the same new primary alias ([zone1-0000000102 tablet zone1-0000000103 failed SetMaster(zone1-0000000102): rpc error: code = Unknown desc = TabletManager.SetMaster on zone1-0000000103 error: net.Dial(/home/runner/work/vitess/vitess/vtdataroot/vt_731717628/vtroot_23801/vt_0000000103/mysql.sock) to local server failed: dial unix /home/runner/work/vitess/vitess/vtdataroot/vt_731717628/vtroot_23801/vt_0000000103/mysql.sock: connect: no such file or directory (errno 2002) (sqlstate HY000): net.Dial(/home/runner/work/vitess/vitess/vtdataroot/vt_731717628/vtroot_23801/vt_0000000103/mysql.sock) to local server failed: dial unix /home/runner/work/vitess/vitess/vtdataroot/vt_731717628/vtroot_23801/vt_0000000103/mysql.sock: connect: no such file or directory (errno 2002) (sqlstate HY000): rpc error: code = Unknown desc = TabletManager.SetMaster on zone1-0000000103 error: net.Dial(/home/runner/work/vitess/vitess/vtdataroot/vt_731717628/vtroot_23801/vt_0000000103/mysql.sock) to local server failed: dial unix /home/runner/work/vitess/vitess/vtdataroot/vt_731717628/vtroot_23801/vt_0000000103/mysql.sock: connect: no such file or directory (errno 2002) (sqlstate HY000): net.Dial(/home/runner/work/vitess/vitess/vtdataroot/vt_731717628/vtroot_23801/vt_0000000103/mysql.sock) to local server failed: dial unix /home/runner/work/vitess/vitess/vtdataroot/vt_731717628/vtroot_23801/vt_0000000103/mysql.sock: connect: no such file or directory (errno 2002) (sqlstate HY000)]) to retry failed replicas: %!v(MISSING)
        	            	PlannedReparentShard Error: rpc error: code = Unknown desc = some replicas failed to reparent; retry PlannedReparentShard with the same new primary alias (zone1-0000000102) to retry failed replicas: tablet zone1-0000000103 failed SetMaster(zone1-0000000102): rpc error: code = Unknown desc = TabletManager.SetMaster on zone1-0000000103 error: net.Dial(/home/runner/work/vitess/vitess/vtdataroot/vt_731717628/vtroot_23801/vt_0000000103/mysql.sock) to local server failed: dial unix /home/runner/work/vitess/vitess/vtdataroot/vt_731717628/vtroot_23801/vt_0000000103/mysql.sock: connect: no such file or directory (errno 2002) (sqlstate HY000): net.Dial(/home/runner/work/vitess/vitess/vtdataroot/vt_731717628/vtroot_23801/vt_0000000103/mysql.sock) to local server failed: dial unix /home/runner/work/vitess/vitess/vtdataroot/vt_731717628/vtroot_23801/vt_0000000103/mysql.sock: connect: no such file or directory (errno 2002) (sqlstate HY000): rpc error: code = Unknown desc = TabletManager.SetMaster on zone1-0000000103 error: net.Dial(/home/runner/work/vitess/vitess/vtdataroot/vt_731717628/vtroot_23801/vt_0000000103/mysql.sock) to local server failed: dial unix /home/runner/work/vitess/vitess/vtdataroot/vt_731717628/vtroot_23801/vt_0000000103/mysql.sock: connect: no such file or directory (errno 2002) (sqlstate HY000): net.Dial(/home/runner/work/vitess/vitess/vtdataroot/vt_731717628/vtroot_23801/vt_0000000103/mysql.sock) to local server failed: dial unix /home/runner/work/vitess/vitess/vtdataroot/vt_731717628/vtroot_23801/vt_0000000103/mysql.sock: connect: no such file or directory (errno 2002) (sqlstate HY000)
        	            	E0217 02:51:12.929834    4210 main.go:72] remote error: rpc error: code = Unknown desc = some replicas failed to reparent; retry PlannedReparentShard with the same new primary alias (zone1-0000000102) to retry failed replicas: tablet zone1-0000000103 failed SetMaster(zone1-0000000102): rpc error: code = Unknown desc = TabletManager.SetMaster on zone1-0000000103 error: net.Dial(/home/runner/work/vitess/vitess/vtdataroot/vt_731717628/vtroot_23801/vt_0000000103/mysql.sock) to local server failed: dial unix /home/runner/work/vitess/vitess/vtdataroot/vt_731717628/vtroot_23801/vt_0000000103/mysql.sock: connect: no such file or directory (errno 2002) (sqlstate HY000): net.Dial(/home/runner/work/vitess/vitess/vtdataroot/vt_731717628/vtroot_23801/vt_0000000103/mysql.sock) to local server failed: dial unix /home/runner/work/vitess/vitess/vtdataroot/vt_731717628/vtroot_23801/vt_0000000103/mysql.sock: connect: no such file or directory (errno 2002) (sqlstate HY000): rpc error: code = Unknown desc = TabletManager.SetMaster on zone1-0000000103 error: net.Dial(/home/runner/work/vitess/vitess/vtdataroot/vt_731717628/vtroot_23801/vt_0000000103/mysql.sock) to local server failed: dial unix /home/runner/work/vitess/vitess/vtdataroot/vt_731717628/vtroot_23801/vt_0000000103/mysql.sock: connect: no such file or directory (errno 2002) (sqlstate HY000): net.Dial(/home/runner/work/vitess/vitess/vtdataroot/vt_731717628/vtroot_23801/vt_0000000103/mysql.sock) to local server failed: dial unix /home/runner/work/vitess/vitess/vtdataroot/vt_731717628/vtroot_23801/vt_0000000103/mysql.sock: connect: no such file or directory (errno 2002) (sqlstate HY000)
        	            	" does not contain "tablet zone1-0000000103 SetMaster failed"
        	Test:       	TestReparentWithDownReplica

@ajm188 ajm188 requested a review from sougou as a code owner February 17, 2021 20:54
@ajm188 ajm188 force-pushed the am_planned_reparenter branch 2 times, most recently from 1016d29 to 02040df Compare February 17, 2021 23:26
Copy link
Contributor

@doeg doeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change seems straightforward and looks good to me! (Apart from the failing test, which... seems unrelated? Ugh.) I do suggest waiting for another +1 from one of the other reviewers, though. :)

Truly a heroic amount of faking + testing. 🏆 Seriously good work.

Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice refactor. This is so much more readable than what we had before. 💯

)
}

// prelightChecks checks some invariants that pr.reparentShardLocked() depends
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: preflightChecks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"prelight" 😂

nice catch, should be fixed, and i've cleaned up the commit history as well now

- add preflightChecks tests
- add test cases for `performPartialPromotionRecovery`
- remove unused arg, add test cases for `performPotentialPromotion`
- add missing call to waitgroup add
- add test cases for `reparentTablets`
- add tests for graceful promotion
- add the rest of the PlannedReparenter tests

Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
@deepthi deepthi merged commit bba94ad into vitessio:master Feb 18, 2021
@askdba askdba added this to the v10.0 milestone Feb 22, 2021
@ajm188 ajm188 deleted the am_planned_reparenter branch March 4, 2021 16:32
@ajm188 ajm188 added this to In progress in Vtctld Service via automation May 23, 2021
@ajm188 ajm188 moved this from In progress to Done in Vtctld Service May 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants