Maintenance Guide

For Longhorn v1.0.0

Update NodeOS or Container runtime

Currently, it's recommended to shut down the workloads with Longhorn volume before performing the node maintenance. Otherwise, it might cause unnecessary replica failure during the node down period.

If shutting down the workloads is not possible, the user can follow the steps below to minimize the impact for node maintenance:

Set Replica Concurrent Rebuild Limit to 0 in the setting to stop any new replica from rebuilding.
Cordon the node
1. Longhorn will automatically disable the node scheduling when a Kubernetes node is cordoned.
Drain the node to move the workload to somewhere else.
1. You will need to use --ignore-daemonsets and --force options to drain the node.
2. Replica processes on the node will be stopped at this stage. Since the rebuild is not allowed, the new replica will not be created or rebuilt.
  1. [Upcoming feature] After adding the support of Replica eviction, the user will be able to evict the replicas on the node gracefully.
3. Engine processes on the node will be migrated with the Pod to other nodes.
4. After drain completed, there should be no engine or replica process running on the node. Two instance managers will still be running on the node, but they're stateless and won't cause interruption to the existing workload.
Perform the necessary maintenance, including shutdown/reboot the node.
Uncordon the node.
1. Longhorn will automatically re-enable the node scheduling.
Set Replica Concurrent Rebuild Limit back to the desired number, e.g. 10.
1. [Upcoming feature] After adding the support of Reuse existing replica data for rebuild, the replica rebuild would be faster and took fewer spaces.

If the maintenance are performed on multiple nodes, we suggest keeping Replica Concurrent Rebuild Limit at 0 until all the maintenance work was done.

Update Kubernetes

The user can follow Rancher's Kubernetes upgrade guide to upgrade Kubernetes.

The only thing is, we don't recommend to drain the node if possible.

Remove a disk

To remove a disk:

Disable the disk scheduling
Delete all the replicas on the disk
1. It's recommended to do it one by one since this step will trigger the replica rebuild.
2. [Upcoming feature] Replica eviction feature can also help here.
Once all the replicas are deleted, delete the disk.

Remove Node

To remove a node:

Disable the disk scheduling
Delete all the replicas on the node
1. It's recommended to do it one by one since this step will trigger the replica rebuild.
2. [Upcoming feature] Replica eviction feature can also help here.
Once all the replicas are deleted, remove the node from Kubernetes, using kubectl delete node <node-name>
Once the node removed from Kubernetes, delete the node in Longhorn.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maintenance Guide

Update NodeOS or Container runtime

Update Kubernetes

Remove a disk

Remove Node

Clone this wiki locally