Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TiKV OOM causes by reading raft conf changes during raft.hup #15770

Closed
Tracked by #16375
overvenus opened this issue Oct 16, 2023 · 0 comments · Fixed by #15806
Closed
Tracked by #16375

TiKV OOM causes by reading raft conf changes during raft.hup #15770

overvenus opened this issue Oct 16, 2023 · 0 comments · Fixed by #15806

Comments

@overvenus
Copy link
Member

Bug Report

Before start election, raft-rs has to check if there is any unapplied conf change
entry. In the current implementation, this needs to scan logs from
[unapplied_index, committed_index]. It essentially takes unbounded memory when
raft peers that has many unapplied logs.
To fix the issue, TiKV can paginate scan raft log which has a fixed memory usage
upper bound.

Results below:
TiKV 3 OOM immediately after restart, until rolling update to a paginate-scan TiKV.

image

What version of TiKV are you using?

v7.4.0

Steps to reproduce

The case occurs during sysbench bulk insert, but I'm not sure if it can reproduce reliably.

sysbench --db-driver=mysql --mysql-host=<HOST> --mysql-port=<PORT> --mysql-user=<USER> \
        --threads=100 --time=3600000 ./bulk_insert.lua (prepare|run|cleanup)

What did you expect?

No OOM

What did happened?

OOM

@overvenus overvenus added type/bug Type: Issue - Confirmed a bug affects-4.0 This bug affects 4.0.x versions. affects-5.0 This bug affects 5.0.x versions. affects-5.1 This bug affects 5.1.x versions. affects-5.2 This bug affects 5.2.x versions. affects-5.3 This bug affects 5.3.x versions. affects-5.4 affects-6.0 affects-6.1 affects-6.2 affects-6.3 affects-6.4 affects-6.5 affects-6.6 affects-7.0 affects-7.1 affects-7.2 affects-7.3 affects-7.4 and removed affects-4.0 This bug affects 4.0.x versions. labels Oct 16, 2023
ti-chi-bot bot added a commit that referenced this issue Oct 23, 2023
…#15806)

close #15770

Before start election, raft-rs has to check if there is any unapplied conf change
entry. In the current implementation, this needs to scan logs from
[unapplied_index, committed_index]. It essentially takes unbounded memory when
raft peers that has many unapplied logs.
To fix the issue, TiKV can paginate scan raft log which has a fixed memory usage
upper bound.

Signed-off-by: Neil Shen <overvenus@gmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ti-chi-bot pushed a commit to ti-chi-bot/tikv that referenced this issue Oct 23, 2023
close tikv#15770

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot pushed a commit to ti-chi-bot/tikv that referenced this issue Oct 23, 2023
close tikv#15770

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot pushed a commit to ti-chi-bot/tikv that referenced this issue Oct 23, 2023
close tikv#15770

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot pushed a commit to ti-chi-bot/tikv that referenced this issue Oct 23, 2023
close tikv#15770

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot pushed a commit to ti-chi-bot/tikv that referenced this issue Oct 23, 2023
close tikv#15770

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Smityz pushed a commit to Smityz/tikv that referenced this issue Oct 23, 2023
…tikv#15806)

close tikv#15770

Before start election, raft-rs has to check if there is any unapplied conf change
entry. In the current implementation, this needs to scan logs from
[unapplied_index, committed_index]. It essentially takes unbounded memory when
raft peers that has many unapplied logs.
To fix the issue, TiKV can paginate scan raft log which has a fixed memory usage
upper bound.

Signed-off-by: Neil Shen <overvenus@gmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ti-chi-bot bot pushed a commit that referenced this issue Nov 17, 2023
…#15806) (#15814)

close #15770

Before start election, raft-rs has to check if there is any unapplied conf change
entry. In the current implementation, this needs to scan logs from
[unapplied_index, committed_index]. It essentially takes unbounded memory when
raft peers that has many unapplied logs.
To fix the issue, TiKV can paginate scan raft log which has a fixed memory usage
upper bound.

Signed-off-by: Neil Shen <overvenus@gmail.com>

Co-authored-by: Neil Shen <overvenus@gmail.com>
Co-authored-by: tonyxuqqi <tonyxuqi@outlook.com>
ti-chi-bot bot added a commit that referenced this issue Nov 17, 2023
…#15806) (#15812)

close #15770

Before start election, raft-rs has to check if there is any unapplied conf change
entry. In the current implementation, this needs to scan logs from
[unapplied_index, committed_index]. It essentially takes unbounded memory when
raft peers that has many unapplied logs.
To fix the issue, TiKV can paginate scan raft log which has a fixed memory usage
upper bound.

Signed-off-by: Neil Shen <overvenus@gmail.com>

Co-authored-by: Neil Shen <overvenus@gmail.com>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
overvenus added a commit to ti-chi-bot/tikv that referenced this issue Nov 17, 2023
…tikv#15806)

close tikv#15770

Before start election, raft-rs has to check if there is any unapplied conf change
entry. In the current implementation, this needs to scan logs from
[unapplied_index, committed_index]. It essentially takes unbounded memory when
raft peers that has many unapplied logs.
To fix the issue, TiKV can paginate scan raft log which has a fixed memory usage
upper bound.

Signed-off-by: Neil Shen <overvenus@gmail.com>
ti-chi-bot bot added a commit that referenced this issue Nov 17, 2023
…#15806) (#15813)

close #15770

Before start election, raft-rs has to check if there is any unapplied conf change
entry. In the current implementation, this needs to scan logs from
[unapplied_index, committed_index]. It essentially takes unbounded memory when
raft peers that has many unapplied logs.
To fix the issue, TiKV can paginate scan raft log which has a fixed memory usage
upper bound.

Signed-off-by: Neil Shen <overvenus@gmail.com>

Co-authored-by: Neil Shen <overvenus@gmail.com>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant