Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot balance data in zone - "The cluster is balanced" #3153

Closed
HarrisChu opened this issue Oct 19, 2021 · 4 comments
Closed

cannot balance data in zone - "The cluster is balanced" #3153

HarrisChu opened this issue Oct 19, 2021 · 4 comments
Assignees
Labels
need to discuss Solution: issue or PR without a clear conclusion on whether to handle it type/bug Type: something is unexpected
Milestone

Comments

@HarrisChu
Copy link
Contributor

steps:

  1. create a 3 replica space with three zone.
  2. balance data, make sure the space is balanced.
  3. add a new host into a zone.
  4. balance data

expected result:

  1. balance the data.

actual result:

  1. [ERROR (-1005)]: The cluster is balanced!

log as below:

I1019 07:46:08.502847 652715 Balancer.cpp:706] Zone z1 have the host "192.168.8.152":16779
I1019 07:46:08.502851 652715 Balancer.cpp:706] Zone z1 have the host "192.168.8.152":16779
I1019 07:46:08.502856 652715 Balancer.cpp:706] Zone z1 have the host "192.168.8.152":16779
I1019 07:46:08.502858 652715 Balancer.cpp:690] Host: "192.168.8.152":14779
I1019 07:46:08.502862 652715 Balancer.cpp:706] Zone z4 have the host "192.168.8.152":14779
I1019 07:46:08.502866 652715 Balancer.cpp:690] Host: "192.168.8.152":12779
I1019 07:46:08.502871 652715 Balancer.cpp:706] Zone z2 have the host "192.168.8.152":12779
I1019 07:46:08.502874 652715 Balancer.cpp:690] Host: "192.168.8.152":11779
I1019 07:46:08.502877 652715 Balancer.cpp:706] Zone z1 have the host "192.168.8.152":11779
I1019 07:46:08.502882 652715 Balancer.cpp:706] Zone z1 have the host "192.168.8.152":11779
I1019 07:46:08.502885 652715 Balancer.cpp:706] Zone z1 have the host "192.168.8.152":11779
I1019 07:46:08.502888 652715 Balancer.cpp:690] Host: "192.168.8.152":15779
I1019 07:46:08.502892 652715 Balancer.cpp:706] Zone z1 have the host "192.168.8.152":15779
I1019 07:46:08.502895 652715 Balancer.cpp:706] Zone z1 have the host "192.168.8.152":15779
I1019 07:46:08.502898 652715 Balancer.cpp:706] Zone z1 have the host "192.168.8.152":15779
I1019 07:46:08.502902 652715 Balancer.cpp:690] Host: "192.168.8.152":13779
I1019 07:46:08.502905 652715 Balancer.cpp:706] Zone z3 have the host "192.168.8.152":13779
I1019 07:46:08.502947 652715 Balancer.cpp:371] Found new host "192.168.8.152":16779
I1019 07:46:08.502959 652715 Balancer.cpp:281] Now, try to balance the confirmedHostParts
I1019 07:46:08.502961 652715 Balancer.cpp:758] Host "192.168.8.152":16779 parts 0
I1019 07:46:08.502965 652715 Balancer.cpp:758] Host "192.168.8.152":14779 parts 5
I1019 07:46:08.502969 652715 Balancer.cpp:758] Host "192.168.8.152":12779 parts 6
I1019 07:46:08.502972 652715 Balancer.cpp:758] Host "192.168.8.152":11779 parts 4
I1019 07:46:08.502975 652715 Balancer.cpp:758] Host "192.168.8.152":15779 parts 4
I1019 07:46:08.502979 652715 Balancer.cpp:758] Host "192.168.8.152":13779 parts 4
I1019 07:46:08.502985 652715 Balancer.cpp:442] "192.168.8.152":12779:6 -> "192.168.8.152":16779:0
I1019 07:46:08.502990 652715 Balancer.cpp:452] partsFrom size 6 partsTo size 0 minLoad 4 maxLoad 4
I1019 07:46:08.502992 652715 Balancer.cpp:462] [space:9, part:2] "192.168.8.152":12779->"192.168.8.152":16779
I1019 07:46:08.502997 652715 Balancer.cpp:1228] z2 : z1
I1019 07:46:08.503000 652715 Balancer.cpp:477] sourceHost "192.168.8.152":12779 targetHost "192.168.8.152":16779 not same zone
I1019 07:46:08.503005 652715 Balancer.cpp:483] Part 2 have existed
I1019 07:46:08.503007 652715 Balancer.cpp:452] partsFrom size 6 partsTo size 0 minLoad 4 maxLoad 4
I1019 07:46:08.503010 652715 Balancer.cpp:462] [space:9, part:3] "192.168.8.152":12779->"192.168.8.152":16779
I1019 07:46:08.503018 652715 Balancer.cpp:1228] z2 : z1
I1019 07:46:08.503032 652715 Balancer.cpp:477] sourceHost "192.168.8.152":12779 targetHost "192.168.8.152":16779 not same zone
I1019 07:46:08.503036 652715 Balancer.cpp:483] Part 3 have existed
I1019 07:46:08.503038 652715 Balancer.cpp:452] partsFrom size 6 partsTo size 0 minLoad 4 maxLoad 4
I1019 07:46:08.503041 652715 Balancer.cpp:462] [space:9, part:4] "192.168.8.152":12779->"192.168.8.152":16779
I1019 07:46:08.503044 652715 Balancer.cpp:1228] z2 : z1
I1019 07:46:08.503047 652715 Balancer.cpp:477] sourceHost "192.168.8.152":12779 targetHost "192.168.8.152":16779 not same zone
I1019 07:46:08.503051 652715 Balancer.cpp:483] Part 4 have existed
I1019 07:46:08.503053 652715 Balancer.cpp:452] partsFrom size 6 partsTo size 0 minLoad 4 maxLoad 4
I1019 07:46:08.503055 652715 Balancer.cpp:462] [space:9, part:6] "192.168.8.152":12779->"192.168.8.152":16779
I1019 07:46:08.503059 652715 Balancer.cpp:1228] z2 : z1
I1019 07:46:08.503062 652715 Balancer.cpp:477] sourceHost "192.168.8.152":12779 targetHost "192.168.8.152":16779 not same zone
I1019 07:46:08.503065 652715 Balancer.cpp:483] Part 6 have existed
I1019 07:46:08.503068 652715 Balancer.cpp:452] partsFrom size 6 partsTo size 0 minLoad 4 maxLoad 4
I1019 07:46:08.503070 652715 Balancer.cpp:462] [space:9, part:7] "192.168.8.152":12779->"192.168.8.152":16779
I1019 07:46:08.503074 652715 Balancer.cpp:1228] z2 : z1
I1019 07:46:08.503077 652715 Balancer.cpp:477] sourceHost "192.168.8.152":12779 targetHost "192.168.8.152":16779 not same zone
I1019 07:46:08.503080 652715 Balancer.cpp:483] Part 7 have existed
I1019 07:46:08.503082 652715 Balancer.cpp:452] partsFrom size 6 partsTo size 0 minLoad 4 maxLoad 4
I1019 07:46:08.503085 652715 Balancer.cpp:462] [space:9, part:8] "192.168.8.152":12779->"192.168.8.152":16779
I1019 07:46:08.503089 652715 Balancer.cpp:1228] z2 : z1
I1019 07:46:08.503091 652715 Balancer.cpp:477] sourceHost "192.168.8.152":12779 targetHost "192.168.8.152":16779 not same zone
I1019 07:46:08.503095 652715 Balancer.cpp:483] Part 8 have existed
I1019 07:46:08.503098 652715 Balancer.cpp:515] Here is no action
I1019 07:46:08.503100 652715 Balancer.cpp:552] Balance tasks num: 0
E1019 07:46:08.503104 652715 Balancer.cpp:42] Create balance plan failed
E1019 07:46:08.503119 652715 BalanceProcessor.cpp:118] Balance Failed: E_BALANCED
@HarrisChu HarrisChu added the type/bug Type: something is unexpected label Oct 19, 2021
@HarrisChu HarrisChu added this to the v2.6.0 milestone Oct 19, 2021
@darionyaphet darionyaphet added the need to discuss Solution: issue or PR without a clear conclusion on whether to handle it label Oct 19, 2021
@darionyaphet
Copy link
Contributor

I tried to reproduce the exception, but it didn't appear.

(root@nebula) [(none)]> add zone z1 "192.168.8.215":55510
Execution succeeded (time spent 4219/5084 us)

Wed, 20 Oct 2021 06:51:43 CST

(root@nebula) [(none)]> add zone z2 "192.168.8.215":55520
Execution succeeded (time spent 3208/3916 us)

Wed, 20 Oct 2021 06:51:50 CST

(root@nebula) [(none)]> add zone z3 "192.168.8.215":55530
Execution succeeded (time spent 3099/3869 us)

Wed, 20 Oct 2021 06:51:55 CST

(root@nebula) [(none)]> add GROUP g1 z1,z2,z3
Execution succeeded (time spent 3362/3925 us)

Wed, 20 Oct 2021 06:52:08 CST

(root@nebula) [(none)]> create SPACE t(replica_factor=3, vid_type=int, partition_num=4)
Execution succeeded (time spent 3227/4057 us)

Wed, 20 Oct 2021 06:53:22 CST

(root@nebula) [(none)]> BALANCE DATA 1634684219
+---------------------------------------------------------------+-------------+
| balanceId, spaceId:partId, src->dst                           | status      |
+---------------------------------------------------------------+-------------+
| "[1634684219, 6:1, 192.168.8.215:55510->192.168.8.215:55540]" | "SUCCEEDED" |
+---------------------------------------------------------------+-------------+
| "[1634684219, 6:2, 192.168.8.215:55510->192.168.8.215:55540]" | "SUCCEEDED" |
+---------------------------------------------------------------+-------------+
| "Total:2, Succeeded:2, Failed:0, In Progress:0, Invalid:0"    | 100.0       |
+---------------------------------------------------------------+-------------+
Got 3 rows (time spent 1949/2768 us)

Wed, 20 Oct 2021 06:57:29 CST

(root@nebula) [(none)]> show hosts
+-----------------+-------+----------+--------------+----------------------+------------------------+
| Host            | Port  | Status   | Leader count | Leader distribution  | Partition distribution |
+-----------------+-------+----------+--------------+----------------------+------------------------+
| "192.168.8.215" | 55510 | "ONLINE" | 0            | "No valid partition" | "t:2"                  |
+-----------------+-------+----------+--------------+----------------------+------------------------+
| "192.168.8.215" | 55520 | "ONLINE" | 3            | "t:3"                | "t:4"                  |
+-----------------+-------+----------+--------------+----------------------+------------------------+
| "192.168.8.215" | 55530 | "ONLINE" | 1            | "t:1"                | "t:4"                  |
+-----------------+-------+----------+--------------+----------------------+------------------------+
| "192.168.8.215" | 55540 | "ONLINE" | 0            | "No valid partition" | "t:2"                  |
+-----------------+-------+----------+--------------+----------------------+------------------------+
| "Total"         |       |          | 4            | "t:4"                | "t:12"                 |
+-----------------+-------+----------+--------------+----------------------+------------------------+
Got 5 rows (time spent 2325/3465 us)

Wed, 20 Oct 2021 06:57:31 CST

@HarrisChu
Copy link
Contributor Author

yes, this issue depends on balance plan.

in my case, the plan is balance from z2 to z1 new host "192.168.8.152":16779, as all parts have peers in z1, there's no balance task.
because we balance data per host, not per zone, it may be triggered sometimes.

it's ok for me this version, but for next version, how about we change it:

  1. balance peer between zone firstly.
  2. then balance peer in a zone internally.

@darionyaphet
Copy link
Contributor

thanks and the next version will redesign load balancing

@HarrisChu
Copy link
Contributor Author

we have balance in a zone and balance between zones in next version.
close the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need to discuss Solution: issue or PR without a clear conclusion on whether to handle it type/bug Type: something is unexpected
Projects
None yet
Development

No branches or pull requests

3 participants