Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replica selector refactor #1142

Merged
merged 76 commits into from
Mar 11, 2024
Merged

replica selector refactor #1142

merged 76 commits into from
Mar 11, 2024

Conversation

crazycs520
Copy link
Contributor

@crazycs520 crazycs520 commented Jan 26, 2024

close #1144

What changed and how does it work?

Since the old version replicaSelector is hard to test and understand, This PR introduces a new replica selector implementation replicaSelectorV2, which is less code and easer to understand.

For rollback, I don't remove the old version replicaSelector, and add a new config EnableReplicaSelectorV2 whose default value is true, which means use replicaSelectorV2 by default, set it's value to false will use old version replicaSelector. After replicaSelectorV2 is stable, maybe we can consider remove old version replicaSelector and config EnableReplicaSelectorV2.

How to test?

  • All the old tests are pass.

  • I add many test case in replica_selector_test.go to make sure EnableReplicaSelectorV2 has the same behavior with old version replicaSelector.

    • TestReplicaReadAccessPathByCase
    • TestReplicaReadAccessPathByCase2
    • TestReplicaReadAccessPathByBasicCase
    • TestReplicaReadAccessPathByLeaderCase
    • TestReplicaReadAccessPathByFollowerCase
    • TestReplicaReadAccessPathByMixedAndPreferLeaderCase
    • TestReplicaReadAccessPathByStaleReadCase
    • TestReplicaReadAccessPathByTryIdleReplicaCase
    • TestReplicaReadAccessPathByFlashbackInProgressCase
    • TestReplicaReadAccessPathByProxyCase
    • TestReplicaReadAccessPathByLearnerCase
  • I write a TestReplicaReadAccessPathByGenError test to generate various combinations of region errors to verify that EnableReplicaSelectorV2 has the same behavior with old version replicaSelector. From the following log, you can see this test generated a total 2200723 cases.

    [2024/03/05 10:27:52.562 +08:00] [INFO] [replica_selector_test.go:2511] ["TestReplicaReadAccessPathByGenError Finished"] [total-case=2200723] [valid-case=857630] [invalid-case=1343093]
    
  • The test coverage of replicaSelectorV2 is more than 98%.

image

Advice when reviewing PR

If you have any questions about the logic about replicaSelector, you can try to modify the logic code, and then run the test to see whether the test results are as expected.

cd internal/locate
go test -run=TestReplicaReadAccessPathBy.*Case   # run some test cases, this is fast and less than 1s on my m1 mbp.
go test > test.log   # run all pkg test, this may cost 230s on my m1 mbp.

Signed-off-by: crazycs520 <crazycs520@gmail.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
…ad request

Signed-off-by: crazycs520 <crazycs520@gmail.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
Copy link
Contributor

@zyguan zyguan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. We may need to do some basic benchmarks and stablility tests to make sure there is no obvious regression.

internal/locate/region_cache.go Outdated Show resolved Hide resolved
internal/locate/replica_selector.go Show resolved Hide resolved
internal/locate/replica_selector.go Outdated Show resolved Hide resolved
internal/locate/region_request.go Outdated Show resolved Hide resolved
internal/locate/replica_selector.go Show resolved Hide resolved
Signed-off-by: crazycs520 <crazycs520@gmail.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
internal/locate/region_cache.go Outdated Show resolved Hide resolved
) (*replicaSelectorV2, error) {
cachedRegion := regionCache.GetCachedRegionWithRLock(regionID)
if cachedRegion == nil || !cachedRegion.isValid() {
return nil, errors.New("cached region invalid")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. It's fine to me for this PR to keep the logic consistent, but actually I think it doesn't make sense to let this error be recorded as an rpcError, which marks the RPC's result undetermined:

			// (in getRPCContext)
			if err != nil {
				s.rpcError = err
				return nil, nil
			}

I think it would be good if the error handling can be refined in the future

internal/locate/replica_selector.go Outdated Show resolved Hide resolved
internal/locate/replica_selector.go Show resolved Hide resolved
internal/locate/replica_selector.go Outdated Show resolved Hide resolved
metrics.TiKVReplicaSelectorFailureCounter.WithLabelValues("exhausted").Inc()
for _, r := range replicas {
if r.deadlineErrUsingConfTimeout {
// when meet deadline exceeded error, do fast retry without invalidate region cache.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it sure that the deadlineErrUsingConfTimeout is produced during the current attempt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. The logic is if have meet deadlineErrUsingConfTimeout error in this round, don't invalidate region to let up layer to remove kv_read_timeout then do fast retry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is same with the old version.

Comment on lines 341 to 342
// when has match labels, prefer leader than not-matched peers.
score += scoreOfPreferLeader
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Should a not-slow normal follower (NotSlow + NormalPeer -> 2) have the same priority as a slow leader in prefer-leader mode (PreferLeader -> 2) ?

  • Is there an assumption that in cases that leader must be expected (such as the first attempt in prefer-leader mode and the second attempt in stale read mode) must be done with the ReplicaSelectLeaderStrategy instead of using the calculateScore? If so, I think there should be some comments to explain that this function doesn't handle those cases.

  • If LabelMatch always has higher priority than PreferLeader, and PreferLeader always has higher priority than NormalPeer and NotSlow, is it possible that we use different bits for these factors, so that the more significant bit a factor is at, the more dominant status the factor is? For factors that has the same priority, we reserve more bits for them. For example, we can have such definition:

    const (
    	flagLabelMatches = 1 << 3
    	flagPreferLeader = 1 << 2
    	flagNormalPeer = 1
    	flagNotSlow = 1
    	
    	// So that we have definition of the final score:
    	// MSB                                                                               LSB
    	// [unused bits][1 bit: LabelMatches][1 bit: PreferLeader][2 bits: NormalPeer + NotSlow]
    )

    I'm not sure if this is possible to give the same result as your current code, but I think in this way, we can have less cases like:

    • Factor A < Factor C
    • Factor B < Factor C
    • Factor A + Factor B > Factor C

    which I'm afraid may make the priority calculation less comprehensible. Another benefit is that we can mostly know how a score comes from by reading its bits, which may help in diagnosing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch, done as your advice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Is there an assumption that in cases that leader must be expected (such as the first attempt in prefer-leader mode and the second attempt in stale read mode) must be done with the ReplicaSelectLeaderStrategy instead of using the calculateScore? If so, I think there should be some comments to explain that this function doesn't handle those cases.

I'm not very sure. If so, please consider adding some comments to note that to avoid misunderstanding of the scoring rule.

Signed-off-by: crazycs520 <crazycs520@gmail.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
internal/locate/region_request.go Outdated Show resolved Hide resolved
internal/locate/replica_selector.go Outdated Show resolved Hide resolved
Signed-off-by: crazycs520 <crazycs520@gmail.com>
Signed-off-by: crazycs520 <crazycs520@gmail.com>
Copy link
Contributor

@MyonKeminta MyonKeminta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM but I'm not sure if I've understood it totally... I need some more time

Comment on lines +238 to +239
sender := NewRegionRequestSender(s.cache, s.regionRequestSender.client)
sender.regionCache.enableForwarding = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to create a new sender instead of using the prepared one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was worried that sharing the same sender for each test might have concurrency issues, so I create a new sender for each test.

Copy link
Contributor

@MyonKeminta MyonKeminta Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I assume that the fields initialized in SetupTest should be done for each single test

// SetupTestSuite has a SetupTest method, which will run before each
// test in the suite.
type SetupTestSuite interface {
	SetupTest()
}

@@ -633,7 +645,7 @@ func (state *accessByKnownProxy) onSendFailure(bo *retry.Backoffer, selector *re
}

func (state *accessByKnownProxy) onNoLeader(selector *replicaSelector) {
selector.state = &invalidLeader{}
selector.state = &tryFollower{leaderIdx: state.leaderIdx, lastIdx: state.leaderIdx, fromAccessKnownLeader: true}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is it that it won't trigger upper-layer's backoff in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for #1207

Since I need to test replicaSelectorV2 has some beha behavior with old replicaSelector, so I just change the old replicaSelector behavior here. If I don't change this, I'll have to modify the replicaSelectorV2 code to accommodate this strange behavior.

internal/locate/replica_selector.go Outdated Show resolved Hide resolved
Comment on lines 341 to 342
// when has match labels, prefer leader than not-matched peers.
score += scoreOfPreferLeader
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Is there an assumption that in cases that leader must be expected (such as the first attempt in prefer-leader mode and the second attempt in stale read mode) must be done with the ReplicaSelectLeaderStrategy instead of using the calculateScore? If so, I think there should be some comments to explain that this function doesn't handle those cases.

I'm not very sure. If so, please consider adding some comments to note that to avoid misunderstanding of the scoring rule.

internal/locate/region_request.go Show resolved Hide resolved
Signed-off-by: crazycs520 <crazycs520@gmail.com>
Copy link
Contributor

@MyonKeminta MyonKeminta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

internal/locate/replica_selector.go Show resolved Hide resolved
internal/locate/replica_selector.go Outdated Show resolved Hide resolved
Signed-off-by: crazycs520 <crazycs520@gmail.com>
Copy link
Contributor

@MyonKeminta MyonKeminta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: crazycs520 <crazycs520@gmail.com>
Copy link
Contributor

@you06 you06 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest LGTM

internal/locate/replica_selector_test.go Outdated Show resolved Hide resolved
internal/locate/replica_selector_test.go Outdated Show resolved Hide resolved
internal/locate/replica_selector_test.go Outdated Show resolved Hide resolved
Signed-off-by: crazycs520 <crazycs520@gmail.com>
@cfzjywxk cfzjywxk merged commit 8d6a95f into tikv:master Mar 11, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

replica selector refactor
5 participants