Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the invisible mvcc versions are not purged by gc #12729

Closed
shaoxiqian opened this issue Jun 1, 2022 · 20 comments · May be fixed by #13352 or #14691
Closed

the invisible mvcc versions are not purged by gc #12729

shaoxiqian opened this issue Jun 1, 2022 · 20 comments · May be fixed by #13352 or #14691
Labels
affects-5.0 This bug affects 5.0.x versions. affects-5.1 This bug affects 5.1.x versions. affects-5.2 This bug affects 5.2.x versions. affects-5.3 This bug affects 5.3.x versions. affects-5.4 affects-6.0 affects-6.1 affects-6.2 affects-6.3 affects-6.4 affects-6.5 affects-6.6 affects-7.0 affects-7.1 affects-7.5 severity/major type/bug Type: Issue - Confirmed a bug user_report The issue is reported by real TiKV user from their environment.

Comments

@shaoxiqian
Copy link

Bug Report

What version of TiKV are you using?

v5.4.0

What operating system and CPU are you using?

Steps to reproduce

This is a customer's workload which is only "read"

What did you expect?

TiKV will clear the invisible mvcc version automatically when there are a lot of rocksdb tombstone keys instead of waiting for compaction file GC

What did happened?

image
QPS is just 49.9

image
IO usage on TiKV is high, avg 571MB/s. CPU usage of TiDB/TiKV is low.

origin_img_v2_e4e04cc5-8eb1-438a-a995-40930cc560dg
We can see the total_keys is almost equal to key_skipped_count. It means during the scan, coprocessor meets a lot of versions.

origin_img_v2_0ddcfc1a-9e8c-4fe9-a6b1-c9d7753954eg
The iterator calls millions of next but only process tens of keys. It means there are too many invisible mvcc versions in the TiKV. From the flamegraph, the iterator is busy spinning on finding next valid key.

origin_img_v2_9d55ad74-e539-48ad-ac7b-887d3d456ceg
From TiKV metrics, that GC rarely happens in the collected time frame.

origin_img_v2_b48c1f30-0b98-4237-a56f-ce0d809bf90g
There is no modifications during the time frame, so no gc is executed

All above shows there are too many delete versions that have not been compacted into tombstones which affected the performance.

@hicqu
Copy link
Contributor

hicqu commented Jun 3, 2022

@shaoxiqian thanks for your report! In the current implementation, MVCC-deletions can only be handled at the bottlemost level. #10545 may be helpful for this case, we will continue on it.

@dbsid
Copy link
Contributor

dbsid commented Jun 4, 2022

/severity critical

@dbsid
Copy link
Contributor

dbsid commented Jun 4, 2022

I think this is a critical bug for the default gc. It’s devastating that the old mvcc versions are not purged from the user perspective.

@tonyxuqqi
Copy link
Contributor

@hicqu would you please move #10545 forward?

@dbsid
Copy link
Contributor

dbsid commented Jul 19, 2022

/severity major

@Lily2025
Copy link

/remove-severity critical

@shaoxiqian
Copy link
Author

/remove-severity critical

@ti-chi-bot
Copy link
Member

@shaoxiqian: These labels are not set on the issue: severity/critical.

In response to this:

/remove-severity critical

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@tonyxuqqi
Copy link
Contributor

/cc jiayang-zheng

@BusyJay
Copy link
Member

BusyJay commented Aug 4, 2022

I think an easy fix is to change the function

fn need_compact(
num_entires: u64,
num_versions: u64,
tombstones_num_threshold: u64,
tombstones_percent_threshold: u64,
) -> bool {
if num_entires <= num_versions {
return false;
}
// When the number of tombstones exceed threshold and ratio, this range need
// compacting.
let estimate_num_del = num_entires - num_versions;
estimate_num_del >= tombstones_num_threshold
&& estimate_num_del * 100 >= tombstones_percent_threshold * num_entires
}

If there are too many versions, compaction should be triggered for write cf, otherwise stale versions won't be deleted.

@cfzjywxk
Copy link
Collaborator

cfzjywxk commented Aug 9, 2022

Maybe it's needed to consider more about the compaction triggering strategy, before that we could abstract a simpler case to help investigate this issue.
/cc @you06

@tonyxuqqi
Copy link
Contributor

/assign jiayang-zheng

@ti-chi-bot
Copy link
Member

@tonyxuqqi: GitHub didn't allow me to assign the following users: jiayang-zheng.

Note that only tikv members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign jiayang-zheng

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cfzjywxk
Copy link
Collaborator

@tonyxuqqi
@you06 is already working on this and doing related tests.

afeinberg added a commit to afeinberg/tikv that referenced this issue Oct 25, 2023
ref tikv#12729

work-in-progress: do not merge, needs unit tests.

adds the concept of a full compaction: a compaction that compacts
all columns families, ranges, and levels. this has the effect of
deleting all of the tombstone markers.

if ``raftstore.full-compact-tick-interval`` is set, attempt running
full compaction at least frequently.

if ``raftstore.full_compact_restrict_hours_local_tz`` is set, run full
compaction only during the hours specified.

the tikv.yaml segment below will run compaction at 03:00 and 23:00
(3am and 11pm respectively) in the tikv nodes' local timezone.

```
[raftstore]
full-compact-tick-interval = "1h"
full-compact-restrict-hours-local-tz = [3, 23]

```
afeinberg added a commit to afeinberg/tikv that referenced this issue Oct 26, 2023
ref tikv#12729

adds the concept of a full compaction: a compaction that compacts
all columns families, ranges, and levels. this has the effect of
deleting all of the tombstone markers.

if ``raftstore.full-compact-tick-interval`` is set, attempt running
full compaction at least frequently.

if ``raftstore.full_compact_restrict_hours_local_tz`` is set, run full
compaction only during the hours specified.

the tikv.yaml segment below will run compaction at 03:00 and 23:00
(3am and 11pm respectively) in the tikv nodes' local timezone.

```
[raftstore]
full-compact-tick-interval = "1h"
full-compact-restrict-hours-local-tz = [3, 23]

```
afeinberg added a commit to afeinberg/tikv that referenced this issue Oct 26, 2023
ref tikv#12729

adds the concept of a full compaction: a compaction that compacts
all columns families, ranges, and levels. this has the effect of
deleting all of the tombstone markers.

if ``raftstore.full-compact-tick-interval`` is set, attempt running
full compaction at least frequently.

if ``raftstore.full_compact_restrict_hours_local_tz`` is set, run full
compaction only during the hours specified.

the tikv.yaml segment below will run compaction at 03:00 and 23:00
(3am and 11pm respectively) in the tikv nodes' local timezone.

```
[raftstore]
full-compact-tick-interval = "1h"
full-compact-restrict-hours-local-tz = [3, 23]

```
afeinberg added a commit to afeinberg/tikv that referenced this issue Oct 27, 2023
ref tikv#12729

adds the concept of a full compaction: a compaction that compacts
all columns families, ranges, and levels. this has the effect of
deleting all of the tombstone markers.

if ``raftstore.full-compact-tick-interval`` is set, attempt running
full compaction at least frequently.

if ``raftstore.full_compact_restrict_hours_local_tz`` is set, run full
compaction only during the hours specified.

the tikv.yaml segment below will run compaction at 03:00 and 23:00
(3am and 11pm respectively) in the tikv nodes' local timezone.

```
[raftstore]
full-compact-tick-interval = "1h"
full-compact-restrict-hours-local-tz = [3, 23]

```

to address in in follow up PRs:
* integration tests.
* pausing/rate-limiting full compactions to avoid disrupting live
  traffic.
afeinberg added a commit to afeinberg/tikv that referenced this issue Oct 27, 2023
ref tikv#12729

adds the concept of a full compaction: a compaction that compacts
all columns families, ranges, and levels. this has the effect of
deleting all of the tombstone markers.

if ``raftstore.full-compact-tick-interval`` is set, attempt running
full compaction at least frequently.

if ``raftstore.full_compact_restrict_hours_local_tz`` is set, run full
compaction only during the hours specified.

the tikv.yaml segment below will run compaction at 03:00 and 23:00
(3am and 11pm respectively) in the tikv nodes' local timezone.

```
[raftstore]
full-compact-tick-interval = "1h"
full-compact-restrict-hours-local-tz = [3, 23]

```

to address in in follow up PRs:
* integration tests.
* pausing/rate-limiting full compactions to avoid disrupting live
  traffic.

Signed-off-by: Alex Feinberg <alex@strlen.net>
afeinberg added a commit to afeinberg/tikv that referenced this issue Oct 28, 2023
ref tikv#12729

adds the concept of a full compaction: a compaction that compacts
all columns families, ranges, and levels. this has the effect of
deleting all of the tombstone markers.

if ``raftstore.full-compact-tick-interval`` is set, attempt running
full compaction at least frequently.

if ``raftstore.full_compact_restrict_hours_local_tz`` is set, run full
compaction only during the hours specified.

the tikv.yaml segment below will run compaction at 03:00 and 23:00
(3am and 11pm respectively) in the tikv nodes' local timezone.

```
[raftstore]
full-compact-tick-interval = "1h"
full-compact-restrict-hours-local-tz = [3, 23]

```

to address in in follow up PRs:
* integration tests.
* pausing/rate-limiting full compactions to avoid disrupting live
  traffic.

Signed-off-by: Alex Feinberg <alex@strlen.net>
afeinberg added a commit to afeinberg/tikv that referenced this issue Oct 28, 2023
ref tikv#12729

adds the concept of a full compaction: a compaction that compacts
all columns families, ranges, and levels. this has the effect of
deleting all of the tombstone markers.

if ``raftstore.full-compact-tick-interval`` is set, attempt running
full compaction at least frequently.

if ``raftstore.full_compact_restrict_hours_local_tz`` is set, run full
compaction only during the hours specified.

the tikv.yaml segment below will run compaction at 03:00 and 23:00
(3am and 11pm respectively) in the tikv nodes' local timezone.

```
[raftstore]
full-compact-tick-interval = "1h"
full-compact-restrict-hours-local-tz = [3, 23]

```

to address in in follow up PRs:
* integration tests.
* pausing/rate-limiting full compactions to avoid disrupting live
  traffic.

Signed-off-by: Alex Feinberg <alex@strlen.net>
afeinberg added a commit to afeinberg/tikv that referenced this issue Oct 30, 2023
ref tikv#12729

adds the concept of a full compaction: a compaction that compacts
all columns families, ranges, and levels. this has the effect of
deleting all of the tombstone markers.

if ``raftstore.full-compact-tick-interval`` is set, attempt running
full compaction at least frequently.

if ``raftstore.full_compact_restrict_hours_local_tz`` is set, run full
compaction only during the hours specified.

the tikv.yaml segment below will run compaction at 03:00 and 23:00
(3am and 11pm respectively) in the tikv nodes' local timezone.

```
[raftstore]
full-compact-tick-interval = "1h"
full-compact-restrict-hours-local-tz = [3, 23]

```

to address in in follow up PRs:
* integration tests.
* pausing/rate-limiting full compactions to avoid disrupting live
  traffic.

Signed-off-by: Alex Feinberg <alex@strlen.net>
ti-chi-bot bot pushed a commit that referenced this issue Oct 31, 2023
ref #12729

Signed-off-by: Alex Feinberg <alex@strlen.net>

Co-authored-by: lucasliang <nkcs_lykx@hotmail.com>
ti-chi-bot pushed a commit to ti-chi-bot/tikv that referenced this issue Oct 31, 2023
ref tikv#12729

adds the concept of a full compaction: a compaction that compacts
all columns families, ranges, and levels. this has the effect of
deleting all of the tombstone markers.

if ``raftstore.full-compact-tick-interval`` is set, attempt running
full compaction at least frequently.

if ``raftstore.full_compact_restrict_hours_local_tz`` is set, run full
compaction only during the hours specified.

the tikv.yaml segment below will run compaction at 03:00 and 23:00
(3am and 11pm respectively) in the tikv nodes' local timezone.

```
[raftstore]
full-compact-tick-interval = "1h"
full-compact-restrict-hours-local-tz = [3, 23]

```

to address in in follow up PRs:
* integration tests.
* pausing/rate-limiting full compactions to avoid disrupting live
  traffic.

Signed-off-by: Alex Feinberg <alex@strlen.net>
ti-chi-bot pushed a commit to ti-chi-bot/tikv that referenced this issue Oct 31, 2023
ref tikv#12729

adds the concept of a full compaction: a compaction that compacts
all columns families, ranges, and levels. this has the effect of
deleting all of the tombstone markers.

if ``raftstore.full-compact-tick-interval`` is set, attempt running
full compaction at least frequently.

if ``raftstore.full_compact_restrict_hours_local_tz`` is set, run full
compaction only during the hours specified.

the tikv.yaml segment below will run compaction at 03:00 and 23:00
(3am and 11pm respectively) in the tikv nodes' local timezone.

```
[raftstore]
full-compact-tick-interval = "1h"
full-compact-restrict-hours-local-tz = [3, 23]

```

to address in in follow up PRs:
* integration tests.
* pausing/rate-limiting full compactions to avoid disrupting live
  traffic.

Signed-off-by: Alex Feinberg <alex@strlen.net>
ti-chi-bot pushed a commit to ti-chi-bot/tikv that referenced this issue Oct 31, 2023
ref tikv#12729

adds the concept of a full compaction: a compaction that compacts
all columns families, ranges, and levels. this has the effect of
deleting all of the tombstone markers.

if ``raftstore.full-compact-tick-interval`` is set, attempt running
full compaction at least frequently.

if ``raftstore.full_compact_restrict_hours_local_tz`` is set, run full
compaction only during the hours specified.

the tikv.yaml segment below will run compaction at 03:00 and 23:00
(3am and 11pm respectively) in the tikv nodes' local timezone.

```
[raftstore]
full-compact-tick-interval = "1h"
full-compact-restrict-hours-local-tz = [3, 23]

```

to address in in follow up PRs:
* integration tests.
* pausing/rate-limiting full compactions to avoid disrupting live
  traffic.

Signed-off-by: Alex Feinberg <alex@strlen.net>
ti-chi-bot pushed a commit to ti-chi-bot/tikv that referenced this issue Oct 31, 2023
ref tikv#12729

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot pushed a commit to ti-chi-bot/tikv that referenced this issue Oct 31, 2023
ref tikv#12729

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot pushed a commit to ti-chi-bot/tikv that referenced this issue Oct 31, 2023
ref tikv#12729

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@afeinberg
Copy link
Contributor

another PR to address this is merged: #15995

@tonyxuqqi tonyxuqqi added the user_report The issue is reported by real TiKV user from their environment. label Jan 24, 2024
@tonyxuqqi
Copy link
Contributor

Since we already have the enhanced check_compact as well as the periodic full compaction. I think the issue can be closed for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-5.0 This bug affects 5.0.x versions. affects-5.1 This bug affects 5.1.x versions. affects-5.2 This bug affects 5.2.x versions. affects-5.3 This bug affects 5.3.x versions. affects-5.4 affects-6.0 affects-6.1 affects-6.2 affects-6.3 affects-6.4 affects-6.5 affects-6.6 affects-7.0 affects-7.1 affects-7.5 severity/major type/bug Type: Issue - Confirmed a bug user_report The issue is reported by real TiKV user from their environment.
Projects
None yet