Skip to content

Best practices for deploying Longhorn in production

Sheng Yang edited this page May 8, 2020 · 12 revisions

Hardware

  1. Minimal of 3 nodes
  2. 4 vCPUs per node or more.
  3. 4 GiB per node or more.

Software

OS on the node

  1. Ubuntu 18.04
  2. CentOS 7/8

Node and disk setup

  1. It's recommended to dedicate a disk for Longhorn storage for production, instead of using the root disk.
    1. If you need to use the root disk, use the default minimal available storage percentage setup which is 25%, and set overprovisioning percentage to 200% to minimize the chance of DiskPressure.
    2. If you're using a dedicated disk to Longhorn, you can lower the setting minimal available storage percentage to 10%.
    3. On Overprovisioning percentage, it depends on how much space does your volume use on average. For example, if your workload only used half of the available volume size, you can set Overprovisioning percentage to 200, which means Longhorn will consider the disk has twice the schedulable size as it's full size minus the reserved space.
  2. Since Longhorn doesn't support sharding between the different disks at the moment, it's recommended to use LVM to aggregate all the disks for Longhorn into a single partition, so it can be easily extended in the future.
  3. Any extra disks must be written in the /etc/fstab to allow automatic mounting after the machine reboots.
  4. Don't use symbolic link for the extra disks. Use mount --bind instead of ln -s and make sure it's in the fstab. See here for details.

Pre-installation configuration

  1. For using a directory other than the default /var/lib/longhorn for storage, change the setting Default Data Path before installing the system.
    1. You can also use Default node/disk configuration feature to customize the default disk after installation. Remember to enable Create default disk only on labeled node if you want to use it.

Deploying workload

Pod automatic restart using liveness check

If you're using ext4 as the filesystem of the volume, adding liveness check to workload can help automatically recovery from network caused interruption or node reboot/docker restart. See https://longhorn.io/docs/0.8.0/users-guide/recover-volume/ for details.

Volume maintenance

  1. We highly recommend using the built-in backup feature of Longhorn.
  2. For each volume, schedule at least one recurring backup. If you must run Longhorn in production without backupstore, then schedule at least one recurring snapshot for each volume.
  3. Longhorn system will create snapshots automatically when rebuilding a replica. Recurring snapshot or backup can also automatically clean up the system generated snapshot.