Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor pd job mechanism #103

Closed
siddontang opened this issue May 13, 2016 · 5 comments
Closed

refactor pd job mechanism #103

siddontang opened this issue May 13, 2016 · 5 comments

Comments

@siddontang
Copy link
Contributor

siddontang commented May 13, 2016

Problems

  1. Now we may have lots of jobs in pd, because we don't handle duplicated command for a region.
  2. And even worse, if one leader 1 in a region like 1 asks to add peer, pd may first create the ConfChange Add peer 2 for it, but doesn't execute it quickly, a time later, the peer 1 asks to add peer again, pd may create the ConfChange Add peer 3 for it, so we have two different jobs for the same region at same time.
  3. If a job for a region is not executed successfully because of timeout or other errors, pd will retry it in loop, this will block any other region following jobs too.

## Principle

  1. One region can only do one thing at same time, if the leader in the region asks for adding peer, pd will skip any other commands for this region after the adding peer is finished (success or failure both ok).
  2. Expanding 1, if pd decides how to do for the command, e,g, for region a, adding peer 3 for it, it will execute this job all the time, no matter how much times it retries or the leader asks.
  3. The execution for one region won't block other region jobs.

## How to (Deprecated)

  1. Use region id for job, like /job/region_id -> job, so we can discard duplicated region jobs and guarantee one job for the region at same time.~~
  2. Pd can scan multi region jobs and execute concurrently.
  3. Expanding 2, above scanning may have a problem, we will get less region id jobs every time, e.g, if region 1 and 2 both have job, but region 2 is earlier, but we still scan region 1 first. So we should have also using the job list like before, so we have two KV pair, one is region_id -> job_id, the other is job_id -> job.
    So the job meta is: /job_region/region_id -> job_id and /job/job_id -> job.
  4. Guarantee all job executions are retry-able and can't corrupt the raft group, e.g, adding peer 3 for region a, we must be sure if adding peer 3 ok, another retry adding peer 3 is failure.
  5. Should have a cancel mechanism if the job can't be executed successfully for a long time, pd may cancel the job or notify the user to handle it manually.

/cc @ngaut @qiuyesuifeng @disksing @tiancaiamao

@siddontang
Copy link
Contributor Author

siddontang commented May 14, 2016

How To

We have only one final state for every region.

  • Pd caches whole region in memory.
  • Leader peer for the region will report region status regularly.

For conf change

Leader 1 for region a asks ChangePeer, pd adds 2 to region a directly, so the region meta in etcd contains peer 1, 2 now. The region in TiKV must enter this final state.

If leader 1 asks ChangePeer again, pd still replies adding peer 2 directly until pd finds region a has peer 1, 2 and then does the following Conf change.

For split

Leader 1 for region a [a-c) asks Split, pd allocs the new region ID and peer IDs only.
Region a [a-c) splits -> a [a-b) + b [b-c).
Leader in region a [a-b) reports region info to pd, pd finds it is different from the memory cache, updates the pd first and then update cache.
Leader in region b [b-c) reports new region info to pd, pd finds it is not in cache and not in etcd, so adds it directly to etcd and cache.

Problem: If region a reports status first, but region b delayed, we may meet a gap in key range, because we don't know where to find the data in [b, c). But this gap may be fixed later.

@qiuyesuifeng
Copy link
Contributor

/cc @disksing Seems something should be done in ticlient.

@tiancaiamao
Copy link
Contributor

One region can only do one thing at same time

Do you mean all command? including read-only request such as GetRegion?

Expanding 1, if pd decides how to do for the command, e,g, for region a, adding peer 3 for it, it will execute this job all the time, no matter how much times it retries or the leader asks.

This brings risk for dead lock. we have two 'state', one is in raft node, the other in pd.

  1. raft node send request to pd
  2. pd decides how to do and send job to raft node
  3. raft execute the job and update it's state to pd periodly
  4. pd response after it know state updated

the whole process forms a ring, if raft have some trouble and can't finish the job( although I haven't come up with a specific case), then dead lock will occur.

@tiancaiamao
Copy link
Contributor

b.t.w as long as we have GetRegion, those changes don't affect #102 much.... i.e. if peer found itself inactive for a long time, then it ask pd whether it's still live.

@siddontang
Copy link
Contributor Author

Pd now has no job, so no deadlock.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants