Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pitr checkpoint ts lag reached more than 8h after inject network partition between one of tikv and pd leader #16469

Closed
Lily2025 opened this issue Jan 31, 2024 · 9 comments · Fixed by #16484 or #16624
Assignees
Labels
component/backup-restore Component: backup, import, external_storage severity/critical type/bug Type: Issue - Confirmed a bug

Comments

@Lily2025
Copy link

Lily2025 commented Jan 31, 2024

Bug Report

What version of TiKV are you using?

./tikv-server -V
TiKV
Release Version: 8.0.0-alpha
Edition: Community
Git Commit Hash: 43d0e06
Git Commit Branch: heads/refs/tags/v8.0.0-alpha
UTC Build Time: 2024-01-26 11:47:08
Rust Version: rustc 1.77.0-nightly (89e2160c4 2023-12-27)
Enable Features: pprof-fp jemalloc mem-profiling portable sse test-engine-kv-rocksdb test-engine-raft-raft-engine cloud-aws cloud-gcp cloud-azure trace-async-tasks openssl-vendored
Profile: dist_release

What operating system and CPU are you using?

8c/32g

Steps to reproduce

1、start pitr
2、run workload
go-tpc tpcc run -D tpcc20000 --host tc-tidb.endless-ha-test-oltp-pitr-tps-6570032-1-702 -P4000 --warehouses 20000 -T 32 --ignore-error '2013,1213,1105,1205,8022,8028,9004,9007,1062' --user root --password '' --interval '10s'
3、inject network partition between one of tikv and pd leader

image

What did you expect?

pitr checkpoint ts lag less than 10mins after fault recover

What did happened?

pitr checkpoint ts lag reached more than 8h after inject network partition between one of tikv and pd leader
image

@Lily2025 Lily2025 added the type/bug Type: Issue - Confirmed a bug label Jan 31, 2024
@Lily2025
Copy link
Author

/severity critical

@Lily2025
Copy link
Author

/assign BornChanger

Copy link
Contributor

ti-chi-bot bot commented Jan 31, 2024

@Lily2025: GitHub didn't allow me to assign the following users: BornChanger.

Note that only tikv members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign BornChanger

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@BornChanger
Copy link
Contributor

/component br

Copy link
Contributor

ti-chi-bot bot commented Jan 31, 2024

@BornChanger: The label(s) component/br cannot be applied, because the repository doesn't have them.

In response to this:

/component br

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@BornChanger
Copy link
Contributor

/component backup-restore

@ti-chi-bot ti-chi-bot bot added the component/backup-restore Component: backup, import, external_storage label Jan 31, 2024
@Lily2025
Copy link
Author

Lily2025 commented Feb 1, 2024

/assign YuJuncen

@Lily2025 Lily2025 changed the title pitr checkpoint ts lag reached more than 8h after tikv rolling restart pitr checkpoint ts lag reached more than 8h after inject network partition between one of tikv and pd leader Feb 1, 2024
@YuJuncen
Copy link
Contributor

YuJuncen commented Feb 1, 2024

This should be a mistake in #16008, which added a stale checking to every Start command.

The time line is:

Region R become leader -> StartObserve(R) sent -> Region epoch of R changed -> StartObserve(R) received but dropped

Then, the RefreshObserver will be dropped because there isn't any subscription record before.

The solution is to always add a phantom record in the subscription tracer if there isn't one while we are starting.

@YuJuncen
Copy link
Contributor

YuJuncen commented Feb 1, 2024

This will only affect master because #16008 haven't been brought to any release version. I think once we are going to pick that PR, we can (and we should) also pick this.

ti-chi-bot bot added a commit that referenced this issue Feb 26, 2024
close #16469

Now, `Start` will always put a phantom record in subscription tracer if there isn't one.

Signed-off-by: Yu Juncen <yu745514916@live.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
YuJuncen added a commit to ti-chi-bot/tikv that referenced this issue Mar 1, 2024
close tikv#16469

Now, `Start` will always put a phantom record in subscription tracer if there isn't one.

Signed-off-by: Yu Juncen <yu745514916@live.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Signed-off-by: hillium <yujuncen@pingcap.com>
YuJuncen added a commit to ti-chi-bot/tikv that referenced this issue Mar 4, 2024
close tikv#16469

Now, `Start` will always put a phantom record in subscription tracer if there isn't one.

Signed-off-by: Yu Juncen <yu745514916@live.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Signed-off-by: hillium <yujuncen@pingcap.com>
YuJuncen added a commit to ti-chi-bot/tikv that referenced this issue Mar 4, 2024
close tikv#16469

Now, `Start` will always put a phantom record in subscription tracer if there isn't one.

Signed-off-by: Yu Juncen <yu745514916@live.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Signed-off-by: hillium <yujuncen@pingcap.com>
ti-chi-bot bot added a commit that referenced this issue Mar 12, 2024
close #16469, ref #16554

Signed-off-by: Yu Juncen <yu745514916@live.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ti-chi-bot pushed a commit to ti-chi-bot/tikv that referenced this issue Mar 13, 2024
close tikv#16469, ref tikv#16554

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
dbsid pushed a commit to dbsid/tikv that referenced this issue Mar 24, 2024
close tikv#16469

Now, `Start` will always put a phantom record in subscription tracer if there isn't one.

Signed-off-by: Yu Juncen <yu745514916@live.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Signed-off-by: dbsid <chenhuansheng@pingcap.com>
dbsid pushed a commit to dbsid/tikv that referenced this issue Mar 24, 2024
close tikv#16469, ref tikv#16554

Signed-off-by: Yu Juncen <yu745514916@live.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Signed-off-by: dbsid <chenhuansheng@pingcap.com>
ti-chi-bot pushed a commit to ti-chi-bot/tikv that referenced this issue May 7, 2024
close tikv#16469, ref tikv#16554

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/backup-restore Component: backup, import, external_storage severity/critical type/bug Type: Issue - Confirmed a bug
Projects
None yet
3 participants