Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Platform] Incorrect masters selection leads to universe creation failures #9391

Open
SergeyPotachev opened this issue Jul 20, 2021 · 1 comment
Assignees
Labels
area/platform Yugabyte Platform kind/bug This issue is a bug priority/high High Priority
Projects
Milestone

Comments

@SergeyPotachev
Copy link
Contributor

SergeyPotachev commented Jul 20, 2021

The problem is in PlacementInfoUtil.selectMasters(). It selects a number of masters to make the universe balanced (not under-replicated). As example, we can request to select 3 masters for a universe with 9 nodes and RF=3 and receive 4 or even 5 nodes marked as master as a result.
The junit test for such problem is here:

  @Test
  public void testSelectMasters_9nodes3regionsMixed() {
    List<NodeDetails> nodes = new ArrayList<NodeDetails>();
    nodes.add(ApiUtils.getDummyNodeDetails(1, NodeDetails.NodeState.ToBeAdded, false, true,
        "onprem", "31df", "us-2a", null));
    nodes.add(ApiUtils.getDummyNodeDetails(2, NodeDetails.NodeState.ToBeAdded, true, true, "onprem",
        "a2c5", "ap-1a", null));
    nodes.add(ApiUtils.getDummyNodeDetails(3, NodeDetails.NodeState.ToBeAdded, true, true, "onprem",
        "55ce", "eu-1a", null));
    nodes.add(ApiUtils.getDummyNodeDetails(4, NodeDetails.NodeState.ToBeAdded, true, true, "onprem",
        "31df", "us-2a", null));
    nodes.add(ApiUtils.getDummyNodeDetails(5, NodeDetails.NodeState.ToBeAdded, false, true,
        "onprem", "a2c5", "ap-1a", null));
    nodes.add(ApiUtils.getDummyNodeDetails(6, NodeDetails.NodeState.ToBeAdded, false, true,
        "onprem", "31df", "us-2a", null));
    nodes.add(ApiUtils.getDummyNodeDetails(7, NodeDetails.NodeState.ToBeAdded, false, true,
        "onprem", "55ce", "eu-1a", null));
    nodes.add(ApiUtils.getDummyNodeDetails(8, NodeDetails.NodeState.ToBeAdded, false, true,
        "onprem", "55ce", "eu-1a", null));
    nodes.add(ApiUtils.getDummyNodeDetails(9, NodeDetails.NodeState.ToBeAdded, false, true,
        "onprem", "a2c5", "ap-1a", null));

    PlacementInfoUtil.selectMasters(nodes, 3);
    List<NodeDetails> masters = nodes.stream().filter(node -> node.isMaster)
        .collect(Collectors.toList());
    assertEquals(3, masters.size());
  }

Also we need to return back some logging of the selected masters (removed from this function earlier).

cc @Arnav15

@SergeyPotachev SergeyPotachev added kind/bug This issue is a bug area/platform Yugabyte Platform labels Jul 20, 2021
@hsu880 hsu880 added this to Backlog in Platform Jul 20, 2021
@hsu880 hsu880 added this to the 2.7.x milestone Jul 20, 2021
@SergeyPotachev SergeyPotachev added the priority/high High Priority label Aug 11, 2021
@SergeyPotachev
Copy link
Contributor Author

SergeyPotachev commented Aug 11, 2021

Marking this issue as priority/high as it can affect customers with universes having significant number of nodes and turn such universes into inoperable state after the Edit Universe operation.
cc @hsiaosu-yb

@SergeyPotachev SergeyPotachev self-assigned this Sep 27, 2021
SergeyPotachev added a commit that referenced this issue Oct 22, 2021
…erse creation failures

Summary:
1. PLAT-364: Previously, in some rare cases, selectMasters() was able to return more masters placed than required (> RF). This leaded to the `EditUniverse` operation failure. Noticed on one of customer's universe.
2. PLAT-1825: "[Platform] Platform should balance master placement according to node count in each zone (#9620)". The issue is that our platform doesn't re-allocate masters if a number of nodes is changed in some zones.

In this diff I'm introducing different logic implemented in the `selectMasters` function and also adding more logic inside EditUniverse itself.
Some details about the new logic:

1. Each region should have at least one master;
2. Even if some other zones have more nodes, we still prefer to have at least one master in each zone (but not more than RF);
3. After rules 1 and 2 are complied and we still have some masters unallocated, we are allocating them in proportion to their total node amount in each zone.
4. Leader master is always preserved by the function. The function doesn't track a case when the leader master is changed right during the function work. But anyway, this corner case leads to further problems (in EditUniverse) also in very rare cases. (so the probability of such problems is "Rare^2")

More details:
1. The function doesn't require a number of zones to be less or equal to RF (number of AZs could be larger than RF, in such case only the smallest AZs stay without master);
2. Only active nodes are processed (see NodeDetails::isActive() - states Live, ToBeAddded, etc, but not ToBeRemoved as ex,);

Test Plan:
Test scenarios should cover different cases of universes expansion.
Like:
- **Check that masters stay the same in case of even increase of nodes count in each zone:** RF=3, create a universe with 1 nodes in 3 different zones (az1 = 1 node, az2 = 1 node, az3 = 1 node); edit the universe - add one node in each universe; check that the updated universe has the same masters as after the creation;
- **Check that masters are correctly redistributed:** RF=5, create a universe 1-3-1; edit the universe - add 5 nodes to az1 and 5 nodes to az3, so the universe should be 6-3-6; check that az1 and a3 have 2 masters, az2 - only one master; check that one of masters from az1 and az3 is the same as it was after creation.
- **Check that master is stopped on existing node and moved to the new node:** Rf=3, universe 2, 2, 1 -> 2, 2, 4;
- **Check that master is reassigned from one existing node to another one:** Rf=5, universe 2, 2, 4 -> 4, 2, 4.

Check some other scenarios like RF=5, 1-2-3 -> 3-2-1; RF=5, 6-2-1 -> 2-1-6, RF=7, 2-2-3 -> 6-6-2, etc.
Each test scenario should have one additional step at its finish - for each node we need to check which masters are written in the processes configuration files:
   - Select node in the platform UI (universe -> Nodes tab), Actions -> Connect (copy);
   - In console/terminal connect the` yugaware` host/container and paste the command to connect to the node;
   - List running processes:   `ps aux | grep yugabyte`;
     You'll see something like:
```
  yugabyte 11375  0.3  0.5 677228 78052 ?        Sl   00:12   2:54 /home/yugabyte/master/bin/yb-master --flagfile **/home/yugabyte/master/conf/server.conf**
  yugabyte 11454  3.7  0.4 1643016 74844 ?       Sl   00:12  27:56 /home/yugabyte/tserver/bin/yb-tserver --flagfile **/home/yugabyte/tserver/conf/server.conf**

```
   - Check that for nodes without master we don't have master process running; for other nodes check that both processes are running;
   - List a content of the highlighted configuration files:
```
[yugabyte@yb-15-rahul-xcluster-producer-n1 ~]$ cat /home/yugabyte/master/conf/server.conf
--placement_cloud=gcp
--placement_region=us-west1
--placement_zone=us-west1-a
--max_log_size=256
--server_broadcast_addresses=
--fs_data_dirs=/mnt/d0
>>> --master_addresses=10.150.4.238:7100,10.150.4.239:7100,10.150.4.242:7100
--rpc_bind_addresses=10.150.4.239:7100
--webserver_port=7000
--webserver_interface=10.150.4.239
--placement_uuid=b9fa87b7-01aa-4889-82e1-f2b0a6ac9c88
--replication_factor=3
--cql_proxy_bind_address=10.150.4.239:9042
--callhome_collection_level=medium
--enable_ysql=true
--use_cassandra_authentication=false
--metric_node_name=yb-15-rahul-xcluster-producer-n1
--ysql_enable_auth=false
--cluster_uuid=4aa0687b-6996-467b-afc6-4ef201a4a9a6
--pgsql_proxy_bind_address=10.150.4.239:5433
--undefok=enable_ysql
--txn_table_wait_min_ts_count=3
--start_cql_proxy=true
```
Check that correct masters are present in the marked line.
The same for another process:
```
[yugabyte@yb-15-rahul-xcluster-producer-n1 ~]$ cat /home/yugabyte/tserver/conf/server.conf
--placement_cloud=gcp
--placement_region=us-west1
--placement_zone=us-west1-a
--max_log_size=256
--server_broadcast_addresses=
--fs_data_dirs=/mnt/d0
--rpc_bind_addresses=10.150.4.239:9100
>>> --tserver_master_addrs=10.150.4.238:7100,10.150.4.239:7100,10.150.4.242:7100
--webserver_port=9000
--webserver_interface=10.150.4.239
--cql_proxy_bind_address=10.150.4.239:9042
--redis_proxy_bind_address=10.150.4.239:6379
--placement_uuid=b9fa87b7-01aa-4889-82e1-f2b0a6ac9c88
--replication_factor=3
--callhome_collection_level=medium
--enable_ysql=true
--use_cassandra_authentication=false
--metric_node_name=yb-15-rahul-xcluster-producer-n1
--ysql_enable_auth=false
--cluster_uuid=4aa0687b-6996-467b-afc6-4ef201a4a9a6
--pgsql_proxy_bind_address=10.150.4.239:5433
--undefok=enable_ysql
--txn_table_wait_min_ts_count=3
--start_cql_proxy=true
--pgsql_proxy_webserver_port=13000
--start_redis_proxy=false
--cql_proxy_webserver_port=12000

```

Reviewers: amalyshev, sanketh

Reviewed By: amalyshev, sanketh

Subscribers: jenkins-bot, yugaware

Differential Revision: https://phabricator.dev.yugabyte.com/D13236
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/platform Yugabyte Platform kind/bug This issue is a bug priority/high High Priority
Projects
Platform
  
Backlog
Development

No branches or pull requests

2 participants