Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PD receives stale region info from TiKV #3868

Closed
Connor1996 opened this issue Dec 3, 2018 · 5 comments
Closed

PD receives stale region info from TiKV #3868

Connor1996 opened this issue Dec 3, 2018 · 5 comments
Assignees
Labels
type/bug The issue is confirmed as a bug.

Comments

@Connor1996
Copy link
Member

Connor1996 commented Dec 3, 2018

Bug Report

What did you do?
Deploy one TiDB, one TiKV, and one PD on my local machine, all use the default config.
Then use random-merge-scheduler to merge region randomly, and now region 2 is merged.
Once I restart PD, from PD log:

2018/12/03 11:52:24.956 cluster_info.go:474: [info] [region 2] Insert new region {id:2 region_epoch:<conf_ver:1 version:1 > peers:<id:3 store_id:1 > } 

seems that PD recevie a heartbeat including a stale region 2 state (is exactly the bootstrap region info with start_key and end_key both are empty)

the log of TiKV then

2018/12/03 11:52:37.194 ERRO endpoint.rs:468: Region(message: "region is not found" region_not_found {region_id: 2})

the log of TiDB then


2018/12/03 11:52:57.474 backoff.go:249: [warning] backoffer.maxSleep 20000ms is exceeded, errors:
message:"region is not found" region_not_found:<region_id:2 >  at 2018-12-03T11:52:56.466114+08:00
message:"region is not found" region_not_found:<region_id:2 >  at 2018-12-03T11:52:56.97086+08:00
message:"region is not found" region_not_found:<region_id:2 >  at 2018-12-03T11:52:57.474606+08:00
2018/12/03 11:52:58.312 backoff.go:249: [warning] backoffer.maxSleep 20000ms is exceeded, errors:
message:"region is not found" region_not_found:<region_id:2 >  at 2018-12-03T11:52:57.300941+08:00
message:"region is not found" region_not_found:<region_id:2 >  at 2018-12-03T11:52:57.807825+08:00
message:"region is not found" region_not_found:<region_id:2 >  at 2018-12-03T11:52:58.31217+08:00
2018/12/03 11:52:58.312 ddl_worker.go:141: [error] [ddl-worker 4, tp add index] handle DDL job err [tikv:9005]Region is unavailable[try again later]
@BusyJay
Copy link
Member

BusyJay commented Dec 3, 2018

Is there any log that shows the heartbeat is from TiKV? Is it possible that the meta is inserted by PD itself?

@Connor1996
Copy link
Member Author

the log cluster_info.go:474: [info] [region 2] Insert new region only can be printed when receiving a heartbeat. Are there other components sending PD region heartbeat also in some case?

@zhangjinpeng87
Copy link
Member

Are there other components sending PD region heartbeat also in some case

I think no.

@Connor1996 Connor1996 self-assigned this Dec 6, 2018
@Connor1996 Connor1996 added the type/bug The issue is confirmed as a bug. label Dec 7, 2018
@Connor1996
Copy link
Member Author

As investigated further, I find TiKV will send the first heartbeat info after booting once reconnecting to PD. It is confirmed as a pd-client bug, I will create a PR to fix it.

@Connor1996
Copy link
Member Author

Fixed on PD side, the stale region info will be ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants