Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Balance wrong for additional zone #3028

Closed
HarrisChu opened this issue Oct 11, 2021 · 3 comments
Closed

Balance wrong for additional zone #3028

HarrisChu opened this issue Oct 11, 2021 · 3 comments
Labels
need to discuss Solution: issue or PR without a clear conclusion on whether to handle it priority/med-pri Priority: medium type/bug Type: something is unexpected
Milestone

Comments

@HarrisChu
Copy link
Contributor

steps:

  1. z1 has 3 hosts, z2 has 1 host, z3 has 1 host.
  2. add group with z1,z2,z3, then create space with 3 replica.
  3. add zone z4 into group.
  4. balance data && balance leader.

expected result:

  1. partition leaders are in 4 zones.

actual result:

  1. no partition leaders in additional zone z4.
(root@nebula) [(none)]> show zones
+------+------------------------------------------------------------------------+------+
| Name | Host                                                                   | Port |
+------+------------------------------------------------------------------------+------+
| "z1" | "cbrpnm-storaged-0.cbrpnm-storaged-headless.default.svc.cluster.local" | 9779 |
+------+------------------------------------------------------------------------+------+
| "z1" | "cbrpnm-storaged-1.cbrpnm-storaged-headless.default.svc.cluster.local" | 9779 |
+------+------------------------------------------------------------------------+------+
| "z1" | "cbrpnm-storaged-2.cbrpnm-storaged-headless.default.svc.cluster.local" | 9779 |
+------+------------------------------------------------------------------------+------+
| "z2" | "cbrpnm-storaged-3.cbrpnm-storaged-headless.default.svc.cluster.local" | 9779 |
+------+------------------------------------------------------------------------+------+
| "z3" | "cbrpnm-storaged-4.cbrpnm-storaged-headless.default.svc.cluster.local" | 9779 |
+------+------------------------------------------------------------------------+------+
| "z4" | "cbrpnm-storaged-5.cbrpnm-storaged-headless.default.svc.cluster.local" | 9779 |
+------+------------------------------------------------------------------------+------+
Got 6 rows (time spent 676/6640 us)

Mon, 11 Oct 2021 10:38:38 CST

(root@nebula) [(none)]> add group g1 z1,z2,z3
Execution succeeded (time spent 1137/3903 us)

Mon, 11 Oct 2021 10:38:52 CST

(root@nebula) [(none)]> create space s1(replica_factor=3, vid_type=int, partition_num=4) on g1
Execution succeeded (time spent 912/3652 us)

Mon, 11 Oct 2021 10:39:56 CST
(root@nebula) [s1]> add zone z4 into group g1
Execution succeeded (time spent 838/4079 us)

Mon, 11 Oct 2021 10:40:33 CST

(root@nebula) [s1]> balance data
+------------+
| ID         |
+------------+
| 1633920055 |
+------------+
Got 1 rows (time spent 1234/4614 us)

Mon, 11 Oct 2021 10:40:52 CST

(root@nebula) [s1]> balance data 1633920055
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+
| balanceId, spaceId:partId, src->dst                                                                                                                                       | status        |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+
| "[1633920055, 7:1, cbrpnm-storaged-3.cbrpnm-storaged-headless.default.svc.cluster.local:9779->cbrpnm-storaged-1.cbrpnm-storaged-headless.default.svc.cluster.local:9779]" | "IN_PROGRESS" |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+
| "[1633920055, 7:1, cbrpnm-storaged-4.cbrpnm-storaged-headless.default.svc.cluster.local:9779->cbrpnm-storaged-5.cbrpnm-storaged-headless.default.svc.cluster.local:9779]" | "IN_PROGRESS" |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+
| "[1633920055, 7:2, cbrpnm-storaged-3.cbrpnm-storaged-headless.default.svc.cluster.local:9779->cbrpnm-storaged-2.cbrpnm-storaged-headless.default.svc.cluster.local:9779]" | "IN_PROGRESS" |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+
| "[1633920055, 7:2, cbrpnm-storaged-4.cbrpnm-storaged-headless.default.svc.cluster.local:9779->cbrpnm-storaged-5.cbrpnm-storaged-headless.default.svc.cluster.local:9779]" | "IN_PROGRESS" |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+
| "Total:4, Succeeded:0, Failed:0, In Progress:4, Invalid:0"                                                                                                                | 0.0           |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+
Got 5 rows (time spent 657/4243 us)

Mon, 11 Oct 2021 10:40:55 CST

(root@nebula) [s1]> balance data 1633920055
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
| balanceId, spaceId:partId, src->dst                                                                                                                                       | status      |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
| "[1633920055, 7:1, cbrpnm-storaged-3.cbrpnm-storaged-headless.default.svc.cluster.local:9779->cbrpnm-storaged-1.cbrpnm-storaged-headless.default.svc.cluster.local:9779]" | "SUCCEEDED" |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
| "[1633920055, 7:1, cbrpnm-storaged-4.cbrpnm-storaged-headless.default.svc.cluster.local:9779->cbrpnm-storaged-5.cbrpnm-storaged-headless.default.svc.cluster.local:9779]" | "SUCCEEDED" |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
| "[1633920055, 7:2, cbrpnm-storaged-3.cbrpnm-storaged-headless.default.svc.cluster.local:9779->cbrpnm-storaged-2.cbrpnm-storaged-headless.default.svc.cluster.local:9779]" | "SUCCEEDED" |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
| "[1633920055, 7:2, cbrpnm-storaged-4.cbrpnm-storaged-headless.default.svc.cluster.local:9779->cbrpnm-storaged-5.cbrpnm-storaged-headless.default.svc.cluster.local:9779]" | "SUCCEEDED" |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
| "Total:4, Succeeded:4, Failed:0, In Progress:0, Invalid:0"                                                                                                                | 100.0       |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
Got 5 rows (time spent 448/6128 us)

Mon, 11 Oct 2021 10:42:00 CST
(root@nebula) [s1]> balance leader
Execution succeeded (time spent 2279/9206 us)

Mon, 11 Oct 2021 10:42:30 CST

(root@nebula) [s1]> show parts
+--------------+-----------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
| Partition ID | Leader                                                                      | Peers                                                                                                                                                                                                                             | Losts |
+--------------+-----------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
| 1            | "cbrpnm-storaged-0.cbrpnm-storaged-headless.default.svc.cluster.local:9779" | "cbrpnm-storaged-0.cbrpnm-storaged-headless.default.svc.cluster.local:9779, cbrpnm-storaged-1.cbrpnm-storaged-headless.default.svc.cluster.local:9779, cbrpnm-storaged-5.cbrpnm-storaged-headless.default.svc.cluster.local:9779" | ""    |
+--------------+-----------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
| 2            | "cbrpnm-storaged-1.cbrpnm-storaged-headless.default.svc.cluster.local:9779" | "cbrpnm-storaged-1.cbrpnm-storaged-headless.default.svc.cluster.local:9779, cbrpnm-storaged-2.cbrpnm-storaged-headless.default.svc.cluster.local:9779, cbrpnm-storaged-5.cbrpnm-storaged-headless.default.svc.cluster.local:9779" | ""    |
+--------------+-----------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
| 3            | "cbrpnm-storaged-4.cbrpnm-storaged-headless.default.svc.cluster.local:9779" | "cbrpnm-storaged-4.cbrpnm-storaged-headless.default.svc.cluster.local:9779, cbrpnm-storaged-2.cbrpnm-storaged-headless.default.svc.cluster.local:9779, cbrpnm-storaged-3.cbrpnm-storaged-headless.default.svc.cluster.local:9779" | ""    |
+--------------+-----------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
| 4            | "cbrpnm-storaged-3.cbrpnm-storaged-headless.default.svc.cluster.local:9779" | "cbrpnm-storaged-4.cbrpnm-storaged-headless.default.svc.cluster.local:9779, cbrpnm-storaged-0.cbrpnm-storaged-headless.default.svc.cluster.local:9779, cbrpnm-storaged-3.cbrpnm-storaged-headless.default.svc.cluster.local:9779" | ""    |
+--------------+-----------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
Got 4 rows (time spent 558/11249 us)

Mon, 11 Oct 2021 10:42:32 CST
@HarrisChu HarrisChu added type/bug Type: something is unexpected need to discuss Solution: issue or PR without a clear conclusion on whether to handle it labels Oct 11, 2021
@HarrisChu
Copy link
Contributor Author

HarrisChu commented Oct 11, 2021

seems hasUnbalancedHost just verify partSize per host, but do not verify partSize per zone.

if (!hasUnbalancedHost || taskCount == 0) {

need to discuss.

@HarrisChu HarrisChu added this to the v2.6.0 milestone Oct 11, 2021
@HarrisChu HarrisChu added the priority/med-pri Priority: medium label Oct 11, 2021
@critical27
Copy link
Contributor

critical27 commented Oct 12, 2021

balance leader by design only consider balance across hosts, could take zones into consideration later by providing more strategy. WDYT?

@HarrisChu
Copy link
Contributor Author

ok for me, close issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need to discuss Solution: issue or PR without a clear conclusion on whether to handle it priority/med-pri Priority: medium type/bug Type: something is unexpected
Projects
None yet
Development

No branches or pull requests

3 participants