Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

placement: support batch deletions, with insertions #2699

Merged
merged 14 commits into from Jul 30, 2020
Merged

placement: support batch deletions, with insertions #2699

merged 14 commits into from Jul 30, 2020

Conversation

xhebox
Copy link
Contributor

@xhebox xhebox commented Jul 29, 2020

What problem does this PR solve?

Related to #2680.

Now one can post an array of operations(insert/delete) to /config/rules/batch in one request. In case there is an error, all modifications are cancelled.

Two notable points:

  1. Operations should be independent. Their IDs should be different.
  2. I noticed that this PR and the previous PR is not completely safe. See https://github.com/xhebox/pd/blob/delete/server/schedule/placement/rule_manager.go#L294-L299.

Check List

Tests

  • Unit test
  • Integration test

Code changes

  • Has HTTP API interfaces change

Release note

  • atomic POST api /config/rules/batcch for batch(insert/delete) rules in one request

@ti-srebot ti-srebot added the contribution Indicates that the PR was contributed by an external member. label Jul 29, 2020
server/schedule/placement/rule_manager.go Outdated Show resolved Hide resolved
server/schedule/placement/rule_manager.go Outdated Show resolved Hide resolved
server/schedule/placement/rule_manager.go Outdated Show resolved Hide resolved
server/schedule/placement/rule_manager.go Outdated Show resolved Hide resolved
if err != nil {
// TODO: it is not completely safe
// 1. in case that half of rules applied, error.. we have to cancel persisted rules
// but that may fail too, causing memory/disk inconsistency
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is admirable for you to notice this issue! I think we can add some note in the API annotation.

server/schedule/placement/rule_manager.go Outdated Show resolved Hide resolved
server/schedule/placement/rule_manager.go Outdated Show resolved Hide resolved
server/schedule/placement/rule_manager.go Outdated Show resolved Hide resolved
server/schedule/placement/rule_manager.go Outdated Show resolved Hide resolved
Copy link
Contributor

@Yisaer Yisaer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems currently one batch request couldn't guarantee that the inserting and deleting rules can be executed atomically. I think we can leave an issue to solve how to provide atomically batch rules api (if needed).


// Batch is for batching placement rule actions. The action type is
// distinguished by the field `Action`.
type Batch struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to say Batch here is a bit of misleading as one Batch only contains one rule.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any good idea? RuleOp?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RuleOp is ok to me. one Batchcontains multiple RuleOp seems more reasonable.

@disksing
Copy link
Contributor

It seems currently one batch request couldn't guarantee that the inserting and deleting rules can be executed atomically. I think we can leave an issue to solve how to provide atomically batch rules api (if needed).

why cannot ensure atomic?

@ti-srebot ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Jul 30, 2020
@Yisaer
Copy link
Contributor

Yisaer commented Jul 30, 2020

I think the contributor have explained on the following. @disksing

				// TODO: it is not completely safe
				// 1. in case that half of rules applied, error.. we have to cancel persisted rules
				// but that may fail too, causing memory/disk inconsistency
				// either rely a transaction API, or clients to request again until success
				// 2. in case that PD is suddenly down in the loop, inconsistency again
				// now we can only rely clients to request again

Copy link
Contributor

@Yisaer Yisaer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +293 to +300
if err != nil {
// TODO: it is not completely safe
// 1. in case that half of rules applied, error.. we have to cancel persisted rules
// but that may fail too, causing memory/disk inconsistency
// either rely a transaction API, or clients to request again until success
// 2. in case that PD is suddenly down in the loop, inconsistency again
// now we can only rely clients to request again
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a transaction API is needed, please create issue to track this (ignore me if already existed).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unexpected. Anyway, I think it is enough to use my old issue to track it.

@ti-srebot
Copy link
Contributor

@Yisaer,Thanks for your review. However, LGTM is restricted to Reviewers or higher roles.See the corresponding SIG page for more information. Related SIGs: scheduling(slack).

1 similar comment
@ti-srebot
Copy link
Contributor

@Yisaer,Thanks for your review. However, LGTM is restricted to Reviewers or higher roles.See the corresponding SIG page for more information. Related SIGs: scheduling(slack).

@nolouch
Copy link
Contributor

nolouch commented Jul 30, 2020

LGTM

@ti-srebot ti-srebot removed the status/LGT1 Indicates that a PR has LGTM 1. label Jul 30, 2020
@ti-srebot ti-srebot added the status/LGT2 Indicates that a PR has LGTM 2. label Jul 30, 2020
@nolouch
Copy link
Contributor

nolouch commented Jul 30, 2020

/merge

@ti-srebot ti-srebot added the status/can-merge Indicates a PR has been approved by a committer. label Jul 30, 2020
@ti-srebot
Copy link
Contributor

/run-all-tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contribution Indicates that the PR was contributed by an external member. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants