Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pd: cancel call when refreshing client #2669

Merged
merged 5 commits into from Jan 12, 2018

Conversation

@BusyJay
Copy link
Contributor

commented Jan 10, 2018

This is a quick fix for possible stale heartbeat streaming call.

Streaming call won't report error when messages are dropped silently in network. However there are sync requests in pd client can detect this error by timeout. When the error is detected, client will be refreshed. At that time, streaming call created by old client should be canceled automatically. But this doesn't happen as expected, so this pr cancels it explicitly.

We need to add a test case to tested this with iptables.

The issue needs to be investaged further too. /cc pingcap/grpc-rs#150

@siddontang

This comment has been minimized.

Copy link
Contributor

commented Jan 11, 2018

can we use failpoint to reproduce it? E.g, force breaking the loop and take the receiver.

@siddontang siddontang added this to the 2018 Q1 milestone Jan 11, 2018

@overvenus

This comment has been minimized.

Copy link
Contributor

commented Jan 11, 2018

@siddontang

This comment has been minimized.

Copy link
Contributor

commented Jan 11, 2018

Do we need to call cancel on raft client?

I think we need. Maybe we should add keep timeout too, but @disksing meets some problems with it.

@BusyJay

This comment has been minimized.

Copy link
Contributor Author

commented Jan 11, 2018

I don't think so. If client is refreshed in raft client, then the streaming call is dropped and recreated immediately.

@BusyJay

This comment has been minimized.

Copy link
Contributor Author

commented Jan 11, 2018

E.g, force breaking the loop and take the receiver.

I don't get it. What loop to be break? And how and why to take the receiver?

@siddontang

This comment has been minimized.

Copy link
Contributor

commented Jan 11, 2018

@BusyJay

This comment has been minimized.

Copy link
Contributor Author

commented Jan 11, 2018

I don't see any connections between what you mention and the bug trying to fix here. Actually the poll method is never called again until cancel is called.

@siddontang

This comment has been minimized.

Copy link
Contributor

commented Jan 11, 2018

Got it.

@BusyJay

This comment has been minimized.

Copy link
Contributor Author

commented Jan 11, 2018

PTAL

@siddontang

This comment has been minimized.

Copy link
Contributor

commented Jan 11, 2018

LGTM

@overvenus
Copy link
Contributor

left a comment

lgtm

@BusyJay

This comment has been minimized.

Copy link
Contributor Author

commented Jan 12, 2018

/run-integration-tests

@overvenus

This comment has been minimized.

Copy link
Contributor

commented Jan 12, 2018

/rebuild

@BusyJay

This comment has been minimized.

Copy link
Contributor Author

commented Jan 12, 2018

/run-integration-tests

@ngaut ngaut merged commit bf9ffda into tikv:master Jan 12, 2018

6 checks passed

ci/circleci Your tests passed on CircleCI!
Details
jenkins-ci-tikv/build Jenkins job succeeded.
Details
jenkins-ci-tikv/integration-common-test Jenkins job succeeded.
Details
jenkins-ci-tikv/integration-compatibility-test Jenkins job succeeded.
Details
jenkins-ci-tikv/integration-ddl-test Jenkins job succeeded.
Details
license/cla Contributor License Agreement is signed.
Details

@BusyJay BusyJay deleted the BusyJay:fix-hb-receiver-fresh branch Jan 12, 2018

overvenus added a commit to overvenus/tikv that referenced this pull request Jan 15, 2018
pd: cancel call when refreshing client (tikv#2669)
* pd: cancel call when refreshing client
overvenus added a commit that referenced this pull request Jan 18, 2018
Cherry pick several bug fixes (#2688)
* Cargo: update prometheus to v0.3.7 (#2684)

* Cargo: update hyper to v0.9.18 (#2686)

* pd: cancel call when refreshing client (#2669)

* ci-build/test.sh: add execute permission (#2472)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.