forked from longhorn/longhorn
-
Notifications
You must be signed in to change notification settings - Fork 0
Maintenance Guide
Sheng Yang edited this page May 9, 2020
·
6 revisions
For Longhorn v1.0.0
Currently, it's recommended to shut down the workloads with Longhorn volume before performing the node maintenance. Otherwise, it might cause unnecessary replica failure during the node down period.
If shutting down the workloads is not possible, the user can follow the steps below to minimize the impact for node maintenance:
- Set
Replica Concurrent Rebuild Limit
to 0 in the setting to stop any new replica from rebuilding. - Cordon the node
- Longhorn will automatically disable the node scheduling when a Kubernetes node is cordoned.
- Drain the node to move the workload to somewhere else.
- You will need to use
--ignore-daemonsets
and--force
options to drain the node. - Replica processes on the node will be stopped at this stage. Since the rebuild is not allowed, the new replica will not be created or rebuilt.
- [Upcoming feature] After adding the support of
Replica eviction
, the user will be able to evict the replicas on the node gracefully.
- [Upcoming feature] After adding the support of
- Engine processes on the node will be migrated with the Pod to other nodes.
- After
drain
completed, there should be no engine or replica process running on the node. Two instance managers will still be running on the node, but they're stateless and won't cause interruption to the existing workload.
- You will need to use
- Perform the necessary maintenance, including shutdown/reboot the node.
- Uncordon the node.
- Longhorn will automatically re-enable the node scheduling.
- Set
Replica Concurrent Rebuild Limit
back to the desired number, e.g.10
.- [Upcoming feature] After adding the support of
Reuse existing replica data for rebuild
, the replica rebuild would be faster and took fewer spaces.
- [Upcoming feature] After adding the support of
If the maintenance are performed on multiple nodes, we suggest keeping Replica Concurrent Rebuild Limit
at 0 until all the maintenance work was done.
The user can follow Rancher's Kubernetes upgrade guide to upgrade Kubernetes.
The only thing is, we don't recommend to drain
the node if possible.
To remove a disk:
- Disable the disk scheduling
- Delete all the replicas on the disk
- It's recommended to do it one by one since this step will trigger the replica rebuild.
- [Upcoming feature] Replica eviction feature can also help here.
- Once all the replicas are deleted, delete the disk.
To remove a node:
- Disable the disk scheduling
- Delete all the replicas on the node
- It's recommended to do it one by one since this step will trigger the replica rebuild.
- [Upcoming feature] Replica eviction feature can also help here.
- Once all the replicas are deleted, remove the node from Kubernetes, using
kubectl delete node <node-name>
- Once the node removed from Kubernetes, delete the node in Longhorn.