Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support full system encryption / encrypted storage #463

Open
fetzerms opened this issue Dec 8, 2020 · 43 comments
Open

Support full system encryption / encrypted storage #463

fetzerms opened this issue Dec 8, 2020 · 43 comments

Comments

@fetzerms
Copy link

fetzerms commented Dec 8, 2020

Vision:
In sensible environments where encryption for all data is required, it would be very handy to have a full disk encrypted xcp-ng install. Preferably with remote-unlocking capabilities.

This Idea requires two kinds of encryption:

  • Encrypted local storage for VMs
  • Encrypted system storage

Current state:

  • Currently, I encrypt local disks with cryptsetup and set up LVM storage on top of cryptsetup. This works very smooth.
  • For sensitive data, such as logs, I create another partition on encrypted local storage and mount it to the system.

Desired state:

  • Encrypted storage should be supported out of the box, preferably with the ability to unlock via XOA and/or XCP-ng Center. Or even better: Automatically unlocked after unlocking the system itself.
  • As it is unclear, where sensitive data ends up, a fully encrypted xcp-ng install might be handy.
  • The encrypted system storage should be unlockable via SSH

Ideas:

What do you guys think?

@stormi
Copy link
Member

stormi commented Dec 11, 2020

I don't know all the technical details, but I think this wouldn't be trivial to do.

I think an encrypted storage repository would be something doable, probably at a significant performance cost. What additional security would it bring over encrypting VM disks themselves?

About system encryption, I'm not entirely sure what sensitive data dom0 would contain, since all it manages is VMs and resources. Actual data is in the VMs.

@fetzerms
Copy link
Author

You might be right about dom0. I was thinking about access logs, bash history and other things that might contain sensitive info. Often a "everything is encrypted to be safe"-approach is preferred.

About encryption of storage: This would allow VMs to be encrypted, that do not offer some kind of encryption for their OS. Be it some old DOS-VM or some custom OS. Furthermore, the storage repository only needs to be unlocked once and all the VMs can automatically boot/reboot without worrying about encryption anymore. So its completely transparent for the guest OS.

The performance penalty is there for sure. I am using cryptsetup to do this and it works pretty nice.

@stormi
Copy link
Member

stormi commented Dec 11, 2020

As long as you can create an encrypted filesystem, you should be able to use it as a storage repository using the file SR type, or any more appropriate SR type if exists (such as zfs for... ZFS obviously). I don't know if it's been already tested, and what the performances would be.

If you're using a shared storage such as NFS, I suppose you could very well encrypt everything directly on the file server, too.

@nagilum99
Copy link

@stormi: The costs should be minimal these days. It adds a bit of latency but throughput is above all common storage devices.
You just need to take care of AES-NI or similar support from Intel/AMD - they en-/decrypt several GB/s without blocking too much CPU ressources.

@stormi
Copy link
Member

stormi commented Dec 11, 2020

Thanks for the insight.

Update: However I always more or less expect to find out something we hadn't foreseen when used in the context of virtualization :)

@fetzerms
Copy link
Author

fetzerms commented Dec 12, 2020

As long as you can create an encrypted filesystem, you should be able to use it as a storage repository using the file SR type, or any more appropriate SR type if exists (such as zfs for... ZFS obviously). I don't know if it's been already tested, and what the performances would be.

If you're using a shared storage such as NFS, I suppose you could very well encrypt everything directly on the file server, too.

Currently, we can already do this by hand. I encrypt my local drive with cryptsetup and set up a LVM storage repo on top of it. The performance looks fine to me, but I did not do any benchmarks. But as this is something that is not supported, I fear that one day it might stop working.

It would be handy to be able to set up and manage encrypted SRs directly with xcp-ng (and through xoa / xcp-ng center). One step further would be to have some sort of KMS support, like VMware does. But that is something for the future.

About system encryption: I am not sure what needs to be done to "transform" a CentOS-Install into xcp-ng. But having a fully encrypted CentOS install is quite straight forward. In the area of xcp-ng, it gets more complicated withe update iso etc. For yum-style of upgrading it shouldnt be too complicated.

@olivierlambert
Copy link
Member

A good recap: it might be easy to setup once, but then it's really hard to manage everything around (keys, ISO upgrade etc.)

If I wanted to go that route, I would:

  1. Modify the installer to allow encrypted XCP-ng during install
  2. Modify the upgrader and enter the key to be able to make the upgrade
  3. Key management in XAPI exposed in XO
  4. How to deal with decrypt on boot?

That's a lot of work, but fortunately it's a community project, contributions are really welcome!

@rjt
Copy link

rjt commented Dec 12, 2020 via email

@olivierlambert
Copy link
Member

I can't see how it's connected to our questions here, but maybe I missed something?

@fetzerms
Copy link
Author

Thank you @olivierlambert for summarizing what needs to be done. I think encrypted storage could be added relatively easy, as I am currently doing it by hand, using cryptsetup. Encrypted storage does not really interfere with updates/upgrades etc. But (currently) needs to be unlocked remotely via ssh.

@olivierlambert
Copy link
Member

@fetzerms so at each boot, you need to connect to your host, unlock the SR, and "reconnect" it (since it couldn't be mounted without the passwd). Is that right? Otherwise, feel free to explain your current process 👍

@nagilum99
Copy link

Ideally you work with some auth service on the hosts that could unlock the upcomming server, as long as it belongs to the pool.

@fetzerms
Copy link
Author

@fetzerms so at each boot, you need to connect to your host, unlock the SR, and "reconnect" it (since it couldn't be mounted without the passwd). Is that right? Otherwise, feel free to explain your current process

Yes exactly. Actually my setup is as follows:

  • On my "key server"* I have a script that periodically logs into my xcp-ng instances and checks if the luks device is unlocked.
  • If not, it unlocks the device and restarts the xe toolstack (this is brute force, I guess I could also reconnect the device).
  • Upon server-reboot this happens automatically from the key server-box.

The steps from the key server could also be done manually.

*) just some server which stores the keys and has ssh keys to connect to the xcp-ng instances.

@DSJ2
Copy link

DSJ2 commented Dec 14, 2020

Have you looked at clevis and tang?

@fetzerms
Copy link
Author

@DSJ2 thanks, thats a very good idea. I actually never heard about clevis and tang before.

@olivierlambert
Copy link
Member

@fetzerms feel free to share the results of your experiments! If your work can be streamlined/automated/integrated in XCP-ng, we'll be happy to assist 👍

@TylerDurden2019
Copy link

@fetzerms Do you have step by step instructions to encrypt local disks with cryptsetup and set up LVM storage on top of cryptsetup? I'm interested in trying this on XCP-ng 8.2. Thanks.

@fetzerms
Copy link
Author

fetzerms commented Feb 4, 2021

@TylerDurden2019: Sorry for my little late response. I intended to do some proper write up, but I am currently really lacking time and/or motivation. Hence, the following steps somehow give a brief walkthrough, but do not explain anything in depth.

First time setup:

1. Make sure that your local drive is not hosting a SR. Deactivate and delete the SR from xcp-ng. I suggest to also use wipefs on the drive.
2. yum install cryptsetup # I think its now pre-installed, I just checked my old scripts...
3. cryptsetup luksFormat /dev/your/local/disk
4. cryptsetup luksOpen /dev/your/local/disk data
5. xe sr-create host-uuid=<uuid> content-type=user device-config:device=/dev/mapper/data name-label="Encrypted_SR" shared=false type=lvm

Then you are done and should see the SR in xcp-ng.

After reboot:

1. cryptsetup luksOpen /dev/your/local/disk data
2. pvscan && lvscan && vgscan && vgchange -ay --config global{metadata_read_only=0}
3. xe-toolstack-restart

In addition to this, I also create a LV after creating the SR and mount it as /var/log after rebooting.

@TylerDurden2019
Copy link

@fetzerms
Thanks for writing that up. It's helpful, appreciated.

@DSJ2 Have you used clevis and tang with XCP-ng? Would you have any step by step instructions to set that up?
I'm currently following the instructions here https://wiki.dev0.sh/books/homelab/page/encrypted-sr which uses a USB key to automatically unlock the LUKS volume but it defeats the purpose of encrypting it when the key easily accessible so I want to try clevis and tang or other methods.

@TylerDurden2019
Copy link

TylerDurden2019 commented Feb 10, 2021

I've tested out using Clevis and Tang server for automatical unlocking the encrypted local SR on XCP-ng 8.2
Following on from @fetzerms 's post above, here is what I did.

Tang Server Setup

Install more than one Tang server on multiple VMs for redundancy if needed.

Install Ubuntu and get latest updates

sudo apt update
sudo apt upgrade

Install Tang

sudo apt install tang

Set Tang to auto start on boot

sudo systemctl enable tangd.socket --now
NOTE: The service doesn start automatically due to a bug that's supposed to be fixed in tang-7-5. The workaround is to comment out the lines that start with "After=" in /usr/lib/systemd/system/tangd.socket as suggested in https://bugzilla.redhat.com/show_bug.cgi?id=1745177

Show the Tang server keys

tang-show-keys

Show Tang logs on Ubuntu

tail -f /var/log/syslog

Clevis Client Setup on XCP-ng 8.2

Install Clevis

yum --enablerepo=base install clevis-dracut

Enable Clevis to automatically unlock a non-root crypttab partition at boot time using a Tang server.

systemctl enable clevis-luks-askpass.path

Get the luks UUID for the encrypted device

cryptsetup luksUUID /dev/your/local/disk
xxxxxxx-0fa-4fba-a274-XXXXXXXXXXX

Edit and add to /etc/crypttab

vi /etc/crypttab
crypt0 UUID=xxxxxxx-0fa-4fba-a274-XXXXXXXXXXX none _netdev

Example 1

Add tang server using SSS for the device in luks with threshold of 1, which means one of the listed Tang server must be online to unlock the volume.
clevis luks bind -d /dev/your/local/disk sss '{"t": 1, "pins": {"tang": [{"url": "http://10.0.1.2"}, {"url": "http://10.0.1.3"}]}}'

Example 2

Add tang server using SSS for the device in the specific luks Slot 2 with threshold of 2, which means two of the listed Tang server must be online to unlock the volume.
clevis luks bind -s 2 -d /dev/your/local/disk sss '{"t": 2, "pins": {"tang": [{"url": "http://10.0.1.2"}, {"url": "http://10.0.1.3"}]}}'

Other useful commands:

Check Luks metadata and information

luksmeta show -d /dev/your/local/disk
cryptsetup luksDump /dev/your/local/disk

Remove Luks metadata in slot 2

cryptsetup luksKillSlot /dev/your/local/disk 2
luksmeta wipe -d /dev/your/local/disk -s 2

@sonoracomm
Copy link

No criticism of any sort is being implied here. I promise. I'm just stating a desire.

While all the information above is really helpful, and I may test @TylerDurden2019's howto, I really think this situation needs to be implemented from above...built into XCP-ng...for enterprise reliability.

As a user, interested in reliability first, I am truly not interested in customizing XCP-ng. It just doesn't sound like a good idea to me.

Is there a 'bounty' for this possible new feature in XCP-ng? I can't afford much, but I would pledge a few bucks.

For systems with shared storage, I doubt this is much of an issue.

However, small shops with few XCP-ng servers and no shared storage could REALLY benefit from this functionality. I need this for a couple of SMB clients who have Windows Server VMs and a regulatory burden requiring encrypted storage.

Microsoft is not overly helpful in this situation either.

I feel there is a definite use case for local, encrypted VM storage.

I think that some sort of network unlock would be very important. If the XCP-ng server gets stolen, we need to make sure the data is unreadable, so (as previously mentioned) a USB key or floppy or VHD is not sufficient.

Thank you so much to Oliver, Vates and all contributors for this fantastic XCP-ng project!

G

@olivierlambert
Copy link
Member

Hello @sonoracomm and thanks for your feedback 👍

This sounds reasonable for a new driver on top of SMAPIv3. However, this really must come after SMAPIv3, because implementing stuff on legacy storage stack will be never merged upstream anyway.

So SMAPIv3 + one encryption driver, represents something like 5 man-year, so we are easily around half a million euro. It's not that I want to refuse any money, but this is only possible with companies/industries pushing for it. We'll do SMAPIv3 anyway, but the pace will be also depending on commercial success and priorities (as we are fully independent from big vendors)

@sonoracomm
Copy link

Thanks much for the status update and explanation, @olivierlambert.

I did not understand the complications or scope!

G

@olivierlambert
Copy link
Member

Note that SMAPIv3 is a big priority for us this year, if we succeed (at least with partial features) we might try to make an encrypted driver (but we'll probably won't have snapshots, backup, live migration and so on).

@fefe79
Copy link

fefe79 commented Apr 4, 2022

Anyone knows or already succeed with tang/clevis and provide not just the key but the detached header file too in any way shape or form or perhaps to provide the detached header from a local workstation using ssh?

I tried the below using ZSH & BASH too to provide the detached header using the command substitution below
"<(cat).....", :

# ssh -t my.xcpng.localdomain cryptsetup luksOpen /dev/nvme0n1p7 Encrypted_SR --header=<(cat) SR_luks_header.img < ~/SR_luks_header.img

however cryptsetup doesn't like it as it expect either a device or file, so for now it just gives an error:

Device /dev/fd/11 doesn't exist or access denied.

@dngray
Copy link

dngray commented Jun 17, 2022

SMAPIv3 is a big priority for us this year, if we succeed (at least with partial features) we might try to make an encrypted driver (but we'll probably won't have snapshots, backup, live migration and so on).

ZFS is a big part of what I use currently. From what I can tell the only way to have encryption is to use the file storage method on a zfs dataset. From what I can tell not using the zfs storage driver has drawbacks.

We already provided zfs packages in our repositories before, but there was no dedicated SR driver. Users would use the file driver, which has a major drawback: if the zpool is not active, that driver may believe that the SR suddenly became empty, and drop all VDI metadata.

I subscribed to this issue and dropped a few bucks in bug bounty. For now I'll probably stay with ProxMox. IT-Gateway mentioned a few ZFS things that I am likely to use.

ProxMox natively supports clone, destroy, snapshot and replicate features of ZFS. It can be installed on top of ZFS pool, which makes it easy to roll back a bad update or missconfig, etc. On contrary XCP-Ng has only started it’s ZFS journey and it has a lot of rough edges. No ability to install it on top of ZFS, no support for encryption, nor there are snapshot/destroy/replicate features included. It treats it as a regular file system, and not CoW file system ZFS is.

– Storage encryption. None of the projects natively support REST encrypted volumes. ProxMox is a little better, because you can use encrypted ZFS datasets, but only on a secondary zpool due to compatibility issues with GRUB.

I seem to remember this being brought up in the Lawrence Systems videos.

I'll be keeping an eye on XCP-ng though, I really like the interface of Xen-Orchestra. I especially like that it can run in a VM and be decoupled from the node.

@olivierlambert
Copy link
Member

@MatiasVara started to work on a ZFS driver for SMAPIv3, as a good way to explore it and push its limits :)

@pebenito
Copy link

pebenito commented Dec 16, 2022

I definitely hope for progress on this. I have a KVM system with mdadm raid -> luks -> lvm that I'd like to migrate to XCP-ng, though I don't need the network unlock.

@rjt
Copy link

rjt commented Dec 17, 2022 via email

@0x1F680
Copy link

0x1F680 commented Mar 30, 2024

Is it possible to encrypt the root drive with cryptsetup (full disk encryption)? Currently not worried about network unlock.

@ydirson
Copy link
Contributor

ydirson commented Apr 2, 2024

Is it possible to encrypt the root drive with cryptsetup (full disk encryption)?

No, that's not supported. Since we're essentially targeting server setups, unlock would really be something to be solved first.

@0x1F680
Copy link

0x1F680 commented Apr 13, 2024

How about loading a couple of systemd hooks and network driver modules into initramfs (with mkinitcpio) with the necessary configuration files to allow for remote/autonomous luks-cryptdev unlocking?

@fefe79
Copy link

fefe79 commented Sep 24, 2024

Is it possible to encrypt the root drive with cryptsetup (full disk encryption)?

No, that's not supported. Since we're essentially targeting server setups, unlock would really be something to be solved first.

are you referring to TANG here or dracut? On server setups isn’t that the defacto? I do not get it, how server setups are managing data at rest and secure boot if xcpn ng does not support it? What kind of physical security is that?

How about and I am just brainstorming here:
RED HAT’s NBDE - Tang/Clevis Set-Up or optionally an similar simple thing as an option at Install using luks and perhaps setup automatically and use dracut or similar with a local or a remote TANG if router/firewall is setup to forward things properly over VPN? Wouldn’t this be as you put it an “ targeting server setups” scenario?

@rwjack
Copy link

rwjack commented Sep 24, 2024

Encryption at rest is a literal must for compliance. It's quite odd how there hasn't been much news regarding this issue in XCP-ng.

@olivierlambert
Copy link
Member

Hi,

We are interested in getting that, but so far no customer pushed for it, despite having very sensitive installations. It's also probably because you can have your shared storage encrypted at REST or do encryption in your VM directly.

Anyway, I'm not against it at all, it's not just a top priority for now, but this can change depending on the demand and our progress on other more urgent requests.

@rwjack
Copy link

rwjack commented Sep 25, 2024

but so far no customer pushed for it

Understandable.

have your shared storage encrypted at REST

This is really not a proper solution. Take a TrueNAS VM for example, yes the storage is encrypted at rest, but the VM itself with keys to that storage is not, making the whole encryption effectively useless.

or do encryption in your VM directly.

This is basically a workaround, which is not scalable at all, one or two VMs, sure, but more than that, becomes impossible to manage.

Anyway, I'm not against it at all, it's not just a top priority for now, but this can change depending on the demand and our progress on other more urgent requests.

Got it, thanks for clarifying. Hope we get some updates on this soon!

@olivierlambert
Copy link
Member

This is really not a proper solution. Take a TrueNAS VM for example, yes the storage is encrypted at rest, but the VM itself with keys to that storage is not, making the whole encryption effectively useless.

I'm not sure to understand. First, using a VM as a shared SR isn't a good practice outside a home lab, so you'll never see this in production. Also, the first goal is to avoid getting the drives physically stolen (and only the drives). For example, if you unlock with the local TPM, if the entire machine is stolen, you'll access the data. Sensitive installations are airgap, so we cannot rely on using an external resource to automatically unlock the drive. Having a pwd on boot is also not acceptable for a server device.

This is basically a workaround, which is not scalable at all, one or two VMs, sure, but more than that, becomes impossible to manage.

It's a viable solution, eg you have Packer/Terraform or similar solutions (IaC) to generate your VMs automatically and have your templates with the right configuration. We've seen that for users/customers at scale.

Got it, thanks for clarifying. Hope we get some updates on this soon!

As you can see, there's not only one solution for this, and it mostly depends on the use case (air gap in sensitive context, home lab, size of the infrastructure). That's why it's not "odd" because it's not a simple/single thing to solve.

@rwjack
Copy link

rwjack commented Sep 25, 2024

using a VM as a shared SR isn't a good practice

I know, I'm not using it as a shared SR, it's used as NAS. The disks are passed through directly to the VM, so they are encrypted by the VM, but nothing encrypts the VM which holds the encryption keys for the disks.


Also, the first goal is to avoid getting the drives physically stolen (and only the drives).

Exactly, so what happens when someone takes the drive with VMs on it? Not good, since they're not encrypted.


Having a pwd on boot is also not acceptable for a server device.

Not acceptable through a console, I agree, but:

  • over the network
  • with an integrated TPM
  • with a pluggable "TPM" (Yubikey or an USB drive for example)

Are all valid options for admins to chose, that cover most use cases, airgapped / sensitive or just compliance wise.


or do encryption in your VM directly.

This is basically a workaround, which is not scalable at all, one or two VMs, sure, but more than that, becomes impossible to manage.

It's a viable solution, eg you have Packer/Terraform or similar solutions (IaC) to generate your VMs automatically and have your templates with the right configuration. We've seen that for users/customers at scale.

Right, but we're back to square one. Someone takes the entire host or even just the VM disk (in my single host non-shared-SR case). That opens a pathway to the encrypted storage disks. And even in the case of Shared SR storage, when someone takes the disk of the Shared SR storage master, then they have access to the Shared SR storage master decryption keys.


So what we really need is one master/top level encryption lock to rule them all. If you can imagine an infrastructure pyramid regarding proper encryption configuration, XCP-ng would be at the top there.

A blunt example, you usually lock your main door when you're at home, not just your work room and bedroom, right?
Locking the main door would be complete VM disk, or even XCP-ng encryption, locking just the work room and bedroom would be per VM encryption.

@olivierlambert
Copy link
Member

I see your point and since XCP-ng is fully open source, we'll be happy to assist on merging code providing a solution, knowing the constraints (maintenance, updates, dealing with pools where you can lose the master, backups and so on).

@fefe79
Copy link

fefe79 commented Sep 26, 2024

I see your point and since XCP-ng is fully open source, we'll be happy to assist on merging code providing a solution, knowing the constraints (maintenance, updates, dealing with pools where you can lose the master, backups and so on).

What do you guys think:

  • For starter, the option to setup LUKS at installation should be made (This should be already achievable by the OS I believe, so I guess it is just matter of changing the installation to allow it optionally).
  • Than go for adding SECURE BOOT (I believe this also supported by redhat/centos/debian/ubuntu).
  • Than add NBDE (RedHat definitely supports it) using Tang/Clevis or any other dracut supported automatic unlock with password fallback and remote unlock by ssh to the boot level dracut in case automatic NBDE fail, because no network or network down, however this behaviour should be editable once system fully boots.

@olivierlambert
Copy link
Member

Hi,

Sadly it's far from being that simple. Host secure boots needs to have some work done in Xen (or around Xen). There's people working on it as we speak, but this first requirement is not finished yet (and then you need to integrate it in XCP-ng and make it stable/updated etc.)

LUKS on install: so you will encrypt system disks right? And then password on boot? Aren't we looking for VM encryption? Because that is done in a different place, in tapdisk (and this kills the performance probably).

@earzur
Copy link

earzur commented Sep 27, 2024

what we do here for encrypted storage (Debian 12):

  • prepare a preseed image that will automatically install FDE using the default partition scheme using a random password
  • the preseed's d-i/late_command encrypt the password using a gpg pubkey baked into the image, push it into initramfs and setup auto-unlock (cryptsetup luks addKey...) using a random "boot.key" that is also pushed into initramfs (install dropbear-initramfs and make it a requirement to have proper network else the lack of copy-pasting in the console will kill you)
  • we then make a template from that image
  • when a VM from that template is booted (1st boot)
    • disk is re-keyed (cryptsetup reencrypt...) this is important else every child VMs of that template share the same root LUKS key. It's not what you want (and that's the reason for auto-unlock using boot.key)
    • disk is bound using clevis to a set of 3 tang servers (sss + tang pins using clevis) depending on the threshold you set if you lose your tang servers, you lose all the VMs bound to them ! so backup, backup, backup (and backup - at least the tang keys !). You could also use the vTPM for auto unlock but i'm not sure how snapshots handles the vTPM disk (xcp-ng 8.3 only, and i find vTPM useless for that as the goal of a TPM LUKS pin is to tie your disk to the hardware it's connected to, while tang pins are to tie it to the network environment)
    • boot disk password is changed, boot.key LUKS slot is removed
  • provided you have network at boot time, you can happily use that FDE VMs with xcp-ng.

Issues:

  • the lack of support for passing kernel parameters (ip=) is the major PITA for us. We have to deploy "router VMs" with dnsmasq on the subnet everytime we want to use that. It is too late when we are at the cloud-init stage. We don't get to that if the VM can't reach the tang server or isn't set to auto-unlock.
  • emergency recovery of the VMs is pretty weak - i would say non-existent. Lack of copy / paste prevent us from decrypting the boot disk password in a shell with access to the private key - dropbear can help but see above - no support for static IPs

So far, this is as secure as we could go, but yet, the lack of proper emergency recovery procedure prevents making it to production...

@earzur
Copy link

earzur commented Sep 27, 2024

finally, before deciding, make sure to check cryptsetup benchmark on your dom0... we're seeing 4x difference on our older Xeon systems compared to modern CPUs (even laptops)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests