Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cdc: back pressure scan speed (#10142) #10151

Merged
merged 10 commits into from May 31, 2021

Conversation

ti-srebot
Copy link
Contributor

@ti-srebot ti-srebot commented May 11, 2021

cherry-pick #10142 to release-5.0
You can switch your code base to this Pull Request by using git-extras:

# In tikv repo:
git pr https://github.com/tikv/tikv/pull/10151

After apply modifications, you can push your change to this PR via:

git push git@github.com:ti-srebot/tikv.git pr/10151:release-5.0-dfa14c5d23e8

What problem does this PR solve?

There will be a series of PRs aims to fix the TiKV CDC OOM issue (too many pending send messages are buffered in memory).

To resolve the issue, we have two strategies,

  1. Limit the speed of producing messages during the initial scan stage. cdc: limit scan speed #9948
  2. Set an upper bound of pending send messages. cdc: flow control #9885

The series of PRs adopts both strategies.

This is part of #9885, it backpressure scan speed by replacing an unbounded channel with a bounded channel.

Also, this PR cherry-picks #10028.

Plan:

  • take1: cherry-pick speed limiter. cdc: limit scan speed #10108
  • take2: set an upper bound of an internal channel used in CDC <-- we are here.
  • take3: add memory quota to CDC and kill connection if it uses too much memory.
  • take4: cherry-pick other bugfixes in cdc: flow control #9885

See full changes: master...overvenus:cdc/flowcontrol/all

Cc #8168 #9996

Check List

Tests

  • Unit test
  • Integration test

Release note

  • Support back pressure CDC scan speed.
  • Fix interference between connections to the same region.

@ti-srebot
Copy link
Contributor Author

/run-all-tests

@ti-srebot ti-srebot mentioned this pull request May 11, 2021
4 tasks
@ti-srebot ti-srebot added component/CDC Component: Change Data Capture size/XXL type/bugfix Type: PR - Fix a bug type/cherry-pick Type: PR - Cherry pick labels May 11, 2021
@ti-srebot ti-srebot requested a review from 5kbpers May 11, 2021 22:04
@ti-srebot ti-srebot added this to the v5.0.1 milestone May 11, 2021
@ti-srebot
Copy link
Contributor Author

@overvenus you're already a collaborator in bot's repo.

@overvenus overvenus modified the milestones: v5.0.1, v5.0.2 May 19, 2021
overvenus and others added 2 commits May 25, 2021 17:41
* cdc: limit max pending sent scan events

Signed-off-by: Neil Shen <overvenus@gmail.com>

* cdc/channel: add tests

Signed-off-by: Neil Shen <overvenus@gmail.com>

* cdc: add grpc flow control test

Signed-off-by: Neil Shen <overvenus@gmail.com>

* cdc: add panic message details

Signed-off-by: Neil Shen <overvenus@gmail.com>

* cdc: add send_all for Sink

Signed-off-by: Neil Shen <overvenus@gmail.com>

* clean up

Signed-off-by: Neil Shen <overvenus@gmail.com>

* cdc: handle sink error

Signed-off-by: Neil Shen <overvenus@gmail.com>

* cdc: correct comments

Signed-off-by: Neil Shen <overvenus@gmail.com>

* cdc: rename Congest to Congested

Signed-off-by: Neil Shen <overvenus@gmail.com>

* cdc: deregister region when resolver fails to build

Signed-off-by: Neil Shen <overvenus@gmail.com>

Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Signed-off-by: Neil Shen <overvenus@gmail.com>
Signed-off-by: Neil Shen <overvenus@gmail.com>

Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Signed-off-by: Neil Shen <overvenus@gmail.com>
@zhouqiang-cl zhouqiang-cl added the cherry-pick-approved Cherry pick PR approved by release team. label May 26, 2021
@ti-chi-bot ti-chi-bot added the status/LGT1 Status: PR - There is already 1 approval label May 26, 2021
Signed-off-by: Neil Shen <overvenus@gmail.com>
@overvenus
Copy link
Member

/run-test

test_region_heartbeat_report_approximate_size fails.

@overvenus
Copy link
Member

/run-test

test_region_heartbeat_report_approximate_size fails

Signed-off-by: Neil Shen <overvenus@gmail.com>
Signed-off-by: Neil Shen <overvenus@gmail.com>
Signed-off-by: Neil Shen <overvenus@gmail.com>
Signed-off-by: Neil Shen <overvenus@gmail.com>
where
F: Fn(&Downstream) -> Result<()>,
{
// fn broadcast(&self, change_data_event: Event, normal_only: bool) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// fn broadcast(&self, change_data_event: Event, normal_only: bool) {

@ti-chi-bot
Copy link
Member

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • 5kbpers
  • amyangfei

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by writing /lgtm in a comment.
Reviewer can cancel approval by writing /lgtm cancel in a comment.

@ti-chi-bot ti-chi-bot added status/LGT2 Status: PR - There are already 2 approvals and removed status/LGT1 Status: PR - There is already 1 approval labels May 31, 2021
Signed-off-by: Neil Shen <overvenus@gmail.com>
@overvenus
Copy link
Member

/merge

@ti-chi-bot
Copy link
Member

@overvenus: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 3070b48

@ti-chi-bot ti-chi-bot added the status/can-merge Status: Can merge to base branch label May 31, 2021
@ti-chi-bot
Copy link
Member

@ti-srebot: Your PR was out of date, I have automatically updated it for you.

At the same time I will also trigger all tests for you:

/run-all-tests

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot ti-chi-bot merged commit 3d7c309 into tikv:release-5.0 May 31, 2021
@kennytm kennytm deleted the release-5.0-dfa14c5d23e8 branch October 6, 2021 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick-approved Cherry pick PR approved by release team. component/CDC Component: Change Data Capture size/XXL status/can-merge Status: Can merge to base branch status/LGT2 Status: PR - There are already 2 approvals type/bugfix Type: PR - Fix a bug type/cherry-pick Type: PR - Cherry pick
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants