Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot balance data after a balance attempt failed on an offline storage host #2997

Closed
wey-gu opened this issue Sep 30, 2021 · 5 comments
Closed
Assignees
Milestone

Comments

@wey-gu
Copy link
Contributor

wey-gu commented Sep 30, 2021

Describe the bug (must be provided)

A clear and concise description of what the bug is.

Your Environments (must be provided)

2.5.0

How To Reproduce(must be provided)

Steps to reproduce the behavior:

  1. offline a storage host

in this step, the host is still presented in show hosts

  1. start a data balance
  2. data balance failed
  3. storage host was not shown in show host

Now, host is not shown in show hosts

  1. balance data

Then it's observed the old failed job id was returned

Expected behavior

It should start a new balance data job.

Additional context

https://nebulagraph.slack.com/archives/CJB79PQG2/p1632827375002600

@wey-gu wey-gu added the type/bug Type: something is unexpected label Sep 30, 2021
@learn2Pro
Copy link

any solution? i still can not balance data

@Sophie-Xie Sophie-Xie added this to the v2.6.0 milestone Oct 9, 2021
@critical27 critical27 modified the milestones: v2.6.0, v2.7.0 Oct 11, 2021
@critical27
Copy link
Contributor

critical27 commented Oct 11, 2021

I can't open the slack... There are a few questions:

  1. After the first step offline a storage host, the offline hosts is on/off in show hosts?
  2. The third step: why we failed? We can see the reason in metad's log.
  3. The forth step: storage host was not shown in show host would take more than one week's time, sure about that?

@critical27
Copy link
Contributor

critical27 commented Oct 11, 2021

PS: there is a command called balance data reset plan, which will remove the last failed balance plan. After that command succeed, new balance could be triggered.

@wey-gu
Copy link
Contributor Author

wey-gu commented Oct 12, 2021

I can't open the slack... There are a few questions:

  1. After the first step offline a storage host, the offline hosts is on/off in show hosts?
  2. The third step: why we failed? We can see the reason in metad's log.
  3. The forth step: storage host was not shown in show host would take more than one week's time, sure about that?

I updated the original description to add detailed info for step 1 and 4.
Please @learn2Pro add more information :)

PS: there is a command called balance data reset plan, which will remove the last failed balance plan. After that command succeed, new balance could be triggered.

As I recall, @learn2Pro only BALANCE DATA STOP was tried, I think BALANCE DATA RESET PLAN is the command that's needed here, can you try it? I think this issue could be closed as BALANCE DATA RESET PLAN exists.

I will create a docs PR to include BALANCE DATA RESET PLAN in https://docs.nebula-graph.com.cn/2.5.1/3.ngql-guide/18.operation-and-maintenance-statements/2.balance-syntax/ rather than just in https://docs.nebula-graph.com.cn/2.5.1/8.service-tuning/load-balance/#_4

@wey-gu wey-gu removed the type/bug Type: something is unexpected label Oct 12, 2021
@Sophie-Xie Sophie-Xie modified the milestones: v2.7.0, v3.0.0 Oct 15, 2021
@learn2Pro
Copy link

solved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants