Implement `HA mode` #21

yaroslav-gwit · 2023-05-17T17:55:41Z

Some ideas for the `HA` mode implementation (subject to change):

Node self-monitoring using REST API and ha-watchdog process
Raft-like Consensus Algorithm, where no matter the cluster size, there are always 3 candidate nodes that control the whole cluster, and among these 3 candidate nodes there is 1 manager that makes failover decisions
All 3 candidate nodes must be specified manually, in the ha_config.json
All worker nodes are dynamically added to and removed from the cluster
In case of a node failure:
- ha-watchdog process will reboot the node it's running on, which serves as a simple fencing mechanism
- based on the failover strategy, the manager will start VMs from the failed node on other nodes in the cluster, prioritising nodes with the freshest VM snapshot
Notify cluster admins about the outage, include the list of VMs that were failed over and/or were ignored
Keep a log of things that happen overtime in plain text and JSON formats for later representation by the Hoster REST API and/or WebUI

The CLI flags to use (subject to change):

To start using HA, execute this: hoster api start --ha-mode, and make sure you are running the latest dev release.

--ha-mode - start the REST API server, and activate the HA mode
--ha-debug - only log actions, and do not actually perform them - useful for the initial cluster setup and troubleshooting

The text was updated successfully, but these errors were encountered:

yaroslav-gwit · 2023-09-16T00:22:39Z

Additional list of requirements:

vm migrate to automatically stop the VM, replicate it to a new endpoint, and start it there (can only be done on the candidate nodes)
running api stop on a worker node, will trigger a graceful exit from the cluster for such a node (which means that it can be rebooted, or shutdown for maintenance without the cluster manager moving over it's VMs to other members)
implement the SSL encryption to protect the communications between nodes (it will be optional, and mostly manual due to the fact that it will rely on the organisational standards that are already in place, eg manual vs automatic SSL Certificate management, internal CAs, etc)
create the documentation on how to use an internal CA to sign the HTTP certificates and use them in the api to activate the SSL encryption mode between HA nodes
restapi_config.json - Implement REST API config file to support multi-tenancy: restapi_config.json #58

Done:

api stop to automatically detect the HA mode, if the node where this command is executed belongs to a candidate group, then it will notify other cluster members to stop the HA mode, otherwise the command will fail asking user to run it on one of the candidate nodes
api status to return production or debug HA mode within the status information
restapi_config.json has been implemented
vm migrate option has been dropped for now, because we will migrating from SSH to REST API for the most functions and it will slow down that project migration

Still to do:

SSL/HTTPs integration/implementation
Docs for the SSL/HTTPs
worker's clean disconnect

yaroslav-gwit self-assigned this May 17, 2023

yaroslav-gwit added feature request New feature request new feature Label to apply to new features development labels May 17, 2023

yaroslav-gwit changed the title ~~Create new ha_watchdog_service binary~~ Create new hoster cluster command May 18, 2023

yaroslav-gwit changed the title ~~Create new hoster cluster command~~ Implement HA mode Sep 1, 2023

yaroslav-gwit added the in-progress label Sep 17, 2023

yaroslav-gwit added sponsored Someone has sponsored this feature, privately or publicly and removed in-progress new feature Label to apply to new features development labels Jan 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `HA mode` #21

Implement `HA mode` #21

yaroslav-gwit commented May 17, 2023 •

edited

Loading

yaroslav-gwit commented Sep 16, 2023 •

edited

Loading

Implement HA mode #21

Implement HA mode #21

Comments

yaroslav-gwit commented May 17, 2023 • edited Loading

Some ideas for the HA mode implementation (subject to change):

The CLI flags to use (subject to change):

yaroslav-gwit commented Sep 16, 2023 • edited Loading

Implement `HA mode` #21

Implement `HA mode` #21

yaroslav-gwit commented May 17, 2023 •

edited

Loading

Some ideas for the `HA` mode implementation (subject to change):

yaroslav-gwit commented Sep 16, 2023 •

edited

Loading