Description
Is your feature request related to a problem? Please describe.
The GP3 volume type is cost-effective, but relatively slow. If we manage to provision multiple GP3 volumes, attach them to a single node, bootstrap them with striped LVM, we should gain increased throughput and IOPS at the same GB-month price.
Describe the solution you'd like in detail
Let's say I would create this Storage Class:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: multi-ebs-gp3-xfs
provisioner: ebs.csi.aws.com
allowVolumeExpansion: true # I suggest this to be unsupported in the initial LVM-supported version
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
mountOptions:
- noatime
- nodiratime
parameters:
type: gp3
csi.storage.k8s.io/fstype: xfs
encrypted: "true"
throughput: "1000" # per volume
iops: "16000" # per volume
# New parameter:
lvm_striped_volumes: "10"
Then, when a PVC requests say 1000GB, the CSI driver would provision 10 GP3 volumes (100GB each), bringing theoretical IOPS to 160000 and throughput to 10000 MiB/s.
Under the hood, the driver would attach the volumes to the node, create a striped LVM on top of them, format with XFS and attach to the POD. If the POD is later scheduled on another node, it would ensure the same 10 GP3 volumes are mounted in the same order to another node and the process is repeated.
Once we have a Persistent Volume created in k8s, I would expect the volume IDs to as well as their mount order to be stored within the annotations of that PV so that it can be mounted to any new node as needed.
Describe alternatives you've considered
There exists a CSI driver with LVM support here, but it only supports locally attached disks. While it should be possible to provision EC2 nodes and mount EBS volumes to them using an init script, there is no option to migrate these EBS volumes to another node.
Additional context
Other LVM mount options such as mirror
or linear
are not needed IMO. The EBS is already replicated, so mirrored
doesn't add any value. Linear volume simply appends one volume to the other, which doesn't bring improved I/O performance compared to a striped volume.