New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TiKV OOM during CDC initial scan #16035
Comments
/assign @hicqu |
/severity major |
The OOM is caused by initial scan pending task surge. The size of initial scan tasks is roughly 5848 bytes, 1649639 tasks take about 8.9 GB memory.
tikv/components/cdc/src/endpoint.rs Lines 765 to 780 in 5165943
To fix the issue,
|
On TiCDC instances, there are lots of such logs:
After a gRPC stream is re-established, all subscribed regions will be re-sent to TiKV instances. That's why there are only 50K subscribed regions on a TiKV but 15000K pending region tasks. Limiting the total number of pending tasks is a good idea. However we still need to resolve the stream-recreation case. |
@overvenus pingcap/tiflow#8860 is expected to reduce changefeed initialization time, and it does works. So I suggest to just limit total number of pending tasks on TiKVs. |
close tikv#16035 Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
close tikv#16035 Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
close tikv#16035 Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
close tikv#16035 Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
close #16035 When TiCDC starts changefeed, it may send numerous requests leading to the creation of numerous scan tasks. However, the initial surge of scan tasks may cause OOM. This commit aims to resolve the issue by implementing a mechanism that allows TiKV to reject requests when the number of pending tasks reaches a certain limit. Signed-off-by: Neil Shen <overvenus@gmail.com>
ref #16035 return server_is_busy to cdc clients if necessary Signed-off-by: qupeng <qupeng@pingcap.com>
Bug Report
What version of TiKV are you using?
/ # /tikv-server -V
TiKV
Release Version: 6.5.3
Edition: Enterprise
Git Commit Hash: 5165943
Git Commit Branch: heads/refs/tags/v6.5.3-20231116-5165943
What operating system and CPU are you using?
K8S
Steps to reproduce
What did you expect?
TiKV should not OOM
What did happened?
TiKV OOM
The text was updated successfully, but these errors were encountered: