ZFS Encryption #5769

tcaputi · 2017-02-09T20:20:10Z

Native encryption in zfsonlinux (See issue #494)

The change incorporates 2 major pieces:

The first feature is a keystore that manages wrapping and encryption keys for encrypted datasets. The commands are similar to that of Solaris but with a few key enhancements to make it more predictable, more consistent, and require less manual maintenance. It is fully integrated with the existing zfs create functions and zfs clone functions. It also exposes a new set of commands via zfs key for managing the keystore. For more info on the issues with the Solaris implementation see my comments here and here. The keystore operates on a few rules.

All wrapping keys are 32 bytes (256 bits), even for 128 and 192 bit encryption types.
Encryption must be specified at dataset creation time.
Specifying a keylocation or keyformat while creating a dataset causes the dataset to become the root of an encryption tree.
All members of an encryption tree share the same wrapping key.
Each dataset can have up to 1 keychain (if it is encrypted) that is not shared with anybody.

The second feature is the actual data and metadata encryption. All user data in an encrypted dataset is stored encrypted on-disk. User-provided metadata is also encrypted, but metadata structures have been left plain so that scrubbing and resilvering still works without the keys loaded. The design was originallly inspired by this article but has been changed fairly significantly since.

Implementation details that should be looked at

Encrypting data going to disk requires creating a key_mapping_t during dsl_dataset_tryown(). I added a flag to this function for code that wishes to own the dataset, but that does not require encrypted data, such as the scrub functions. I did my best to confirm that all owners set this flag correctly, but someone should confirm them, just to be sure.
zfs send and zfs recv do not currently do anything special with regards to encryption. The format of the send file has not changed and zfs send requires the keys to be loaded in order to work. At some point there should probably be a way to do raw sends.
I altered the prototype of lzc_create() and lzc_clone() to support hidden arguments. I understand that the purpose of libzfs_core is to have a stable api interacting with the ZFS ioctls. However, these functions need to accept wrapping keys separately from the rest of their parameters because they need to use the (new) hidden_args framework to support hiding arguments from the logs. Without this, the wrapping keys would get printed to the zpool history.

EDIT 5/4/16: Updated to reflect the current state of the PR
EDIT 1/3/17: Updated to reflect the current state of the PR
EDIT 2/9/17: reopened under a new PR due to long Github load times (previous PR was #4329)

mention-bot · 2017-02-09T20:20:16Z

@tcaputi, thanks for your PR! By analyzing the history of the files in this pull request, we identified @behlendorf, @grwilson and @ahrens to be potential reviewers.

tcaputi · 2017-02-09T20:37:03Z

Several updates have been made to the code since it was moved (thanks to @ahrens for the review thus far). These are the highlights:

the keysource property has been split into keylocation and keyformat
the keylocation property can now be set with zfs set (with some restrictions)
the zfs key sub-command has been split into zfs load-key, zfs unload-key and zfs change-key.
zfs load-key and zfs unload-key now support -r and -a for recursive key loading / unloading
zfs load-key now supports -n (noop) for checking that a key is correct without loading it
zfs load-key now supports -L (location) for loading a key from an alternate keylocation
zfs change-key now supports -i (inherit) for making a dataset inherit its parent's keys
the default number of pbkdf2 iterations has been bumped up to 350000 (minimum is now 100000)
the man pages have been updated and some clarifications have been made
the encryption feature now uses the ZFEATURE_FLAG_PER_DATASET framework for refcounting
various smaller code cleanups and fixes

These changes do constitute an on-disk format change, so anyone updating from an earlier version of the PR will have to recreate their pool.

grahamperrin · 2017-02-10T04:51:42Z

Thanks to everyone so far 👍

… User-provided metadata is also encrypted, but metadata structures have been left plain …

For clarity (for any newcomer to this PR), from https://github.com/tcaputi/zfs/blob/ae18e6d0a8c1b32991094594d1b751646bc5b4ca/man/man8/zfs.8 with added emphasis:

… will encrypt all user data including file and zvol data, file attributes, ACLs, permission bits, directory listings, and FUID mappings.

… will not encrypt metadata related to the pool structure, including dataset names, dataset hierarchy, file size, file holes, and dedup tables. …

(The first – a definition of user data – includes some things that might otherwise be viewed as metadata.)

Also from the opening of this PR:

Specifying a keysource …

– could be updated to reflect the split that is highlighted in the first (human) comment.

grahamperrin

Documentation: first pass …

grahamperrin · 2017-02-10T07:22:28Z

man/man5/zpool-features.5

-accessing the data allow for it. Deduplication with encryption will leak
-information about which blocks are equivalent in a dataset and will incur an
-extra CPU cost per block written.
+This feature enables the creation and management of natively encrypted datasets.


As details of the feature are more in the manual page for zfs(8) than in the page for zpool(8), so the SEE ALSO part of this zpool-features(5) page should be expanded to include zfs(8).

grahamperrin · 2017-02-10T07:30:20Z

man/man8/zfs.8

+Selecting \fBencryption\fR=\fBon\fR when creating a dataset indicates that the
+default encryption suite will be selected, which is currently \fBaes-256-ccm\fR.
+In order to provide consistent data protection, encryption must be specified at
+dataset creation time and it cannot be changed afterwards.
 .sp
 For more details and caveats about encryption see \fBzpool-features\fR.


The SEE ALSO part of this zfs(8) manual page should be expanded to include zpool-features(5).

grahamperrin · 2017-02-10T07:51:09Z

man/man5/zpool-features.5

-accessing the data allow for it. Deduplication with encryption will leak
-information about which blocks are equivalent in a dataset and will incur an
-extra CPU cost per block written.
+This feature enables the creation and management of natively encrypted datasets.


Basics.

Can creation of a pool comprise a single encrypted dataset?

Yes. This functions the same way as any other zfs property (compression, deduplication, etc)

niieani · 2017-02-11T21:18:47Z

Fantastic work 👍 . I'm curious, how far would you say are you from publishing the changes from this PR? A rough estimate would do, say: weeks, months, Q2, Q3?

tcaputi · 2017-02-11T21:48:35Z

@niieani It's tough to say. That largely depends on how the review goes. At this point @ahrens has reviewed the DSL changes and I've made adjustments (which still need a bit of stabilizing) to the code to account for his notes, but that is all that has happened so far.

ahrens · 2017-02-12T04:05:27Z

include/sys/dsl_dataset.h

@@ -245,26 +247,35 @@ dsl_dataset_phys(dsl_dataset_t *ds)
 #define	DS_UNIQUE_IS_ACCURATE(ds)	\
 	((dsl_dataset_phys(ds)->ds_flags & DS_FLAG_UNIQUE_ACCURATE) != 0)

+/* flags for holding the dataset */
+#define	DS_HOLD_FLAG_DECRYPT	(1 << 0) /* needs access encrypted data */


Lets use an enum for this, so that it's more clear what values the function argument can have. (Unfortunately there's a bad example set by some other flags - cleanup has been on my to-do list for a while now.)

good to know. will fix

ahrens · 2017-02-12T04:11:35Z

module/zfs/dsl_dir.c

 	if (dd == NULL) {
 		dsl_dir_t *winner;

 		dd = kmem_zalloc(sizeof (dsl_dir_t), KM_SLEEP);
 		dd->dd_object = ddobj;
 		dd->dd_dbuf = dbuf;
 		dd->dd_pool = dp;
+
+		if (doi.doi_type == DMU_OTN_ZAP_METADATA &&


I think you can use dsl_dir_is_zapified(), instead. (at a cost of maybe doing dmu_object_info twice - shouldn't be a big deal)

ahrens · 2017-02-12T04:16:53Z

module/zfs/dsl_dir.c

+
+			/*
+			 * encrypted datasets can only be moved if they are
+			 * an encryption root (locally set keyformat).


Why don't we allow that? Seems like a common use case would be "encrypt all datasets below this the same way". "whoops, named a child dataset wrong, rename it". As long as the new location is under the same encryption root, I think we can allow the rename. Would that be hard to implement?

I thought there was a restriction that an unencrypted dataset couldn't be under an encrypted dataset. Doesn't that need to be enforced here?

I didn't see either of these restrictions in the manpages. Seems like they should at least be in the zfs rename section, and maybe also the Encryption section.

Why don't we allow that? Seems like a common use case would be "encrypt all datasets below this the same way". "whoops, named a child dataset wrong, rename it"

To clarify, renaming an encryption child is ok. Moving to a new parent is not currently.

As long as the new location is under the same encryption root, I think we can allow the rename. Would that be hard to implement?

I think I can implement it like that. Shouldn't be a big deal.

I thought there was a restriction that an unencrypted dataset couldn't be under an encrypted dataset. Doesn't that need to be enforced here?

Whoops. It looks like I must've rebased that away at some point. I will put it back.

I didn't see either of these restrictions in the manpages. Seems like they should at least be in the zfs rename section, and maybe also the Encryption section.

I will make sure this is documented in both places (I think there is a sentence about it in the encryption section).

ahrens · 2017-02-12T04:19:34Z

man/man8/zfs.8

+locally, the dataset is an encryption root. Encryption roots share their keys
+with all datasets that inherit this property. This means that when a key is
+loaded for the encryption root, all children that inherit the \fBkeylocation\fR
+property are automatically loaded as well.


how about the key for all children ... is automatically loaded as well

ahrens · 2017-02-12T04:21:21Z

man/man8/zfs.8

+.sp
+Even though the encryption suite cannot be changed after dataset creation, the
+keylocation can be with \fBzfs change-key\fR. If the dataset is an encryption 
+root this property may also be changed with \fBzfs set\fR. If \fBprompt\fR is


If it's not an encryption root, can the key be changed with zfs change-key? The current wording implies it can, by calling out the restriction only for zfs set

That is correct. non-encryption roots can use zfs change-key, which will make them an encryption root. I will add a sentence to the man page to reflect this.

ahrens · 2017-02-12T05:42:13Z

module/zfs/zfs_ioctl.c

+	if (ret != 0)
+		goto error;
+
+	return (0);


seems like this (success case) would work right if it fell through to the error label.

I can change it. I personally like separate error labels that don't share code.

ahrens · 2017-02-12T05:44:23Z

module/zfs/zil.c

+	 * the keys may not be loaded.
+	 */
+	if (wbuf == NULL && BP_IS_ENCRYPTED(bp))
+		zio_flags |= ZIO_FLAG_RAW;


could we change this to pass RAW even if not encrypted?

I don't see why not. I just didn't want to impact existing functionality. I'll run some tests to make sure that works and make the change.

ahrens · 2017-02-12T05:52:55Z

module/zfs/zio.c

+		BP_SET_ENCRYPTED(bp, B_TRUE);
+	}
+
+	/* Perform the encryption. This should not fail */


Can we use a more precise word than "should"? You're gracefully handling failure just below, so presumably it does fail in some cases... Maybe This will fail for block types that are partially encrypted (dnode, zil) if no encryption is needed.

I will change the "should". I am not gracefully handling errors. The only acceptable return codes here are 0 and ZIO_NO_ENCRYPTION_NEEDED (see the assertion below).

OK - I wasn't sure if ZIO_NO_ENCRYPTION_NEEDED was considered an error.

No, its a special return code to indicate that the block didn't need to be encrypted for some reason (for instance a dnode block with no encryptable bonus buffers).

ahrens · 2017-02-12T05:55:35Z

module/zfs/zio_checksum.c

+			 * evenly across the bits of the checksum. Therefore,
+			 * when truncating a weak checksum we XOR the first
+			 * 2 words with the last 2 so that we don't "lose" any
+			 * entropy unnecessarily.


Is there any downside to doing this for all checksum types?

I'm not aware of any, but I know all secure checksums support being truncated without issue whereas there may be weaknesses for XOR folding.

ahrens · 2017-02-12T06:05:02Z

module/zfs/zio_crypt.c

+	uint32_t *iv2 = (uint32_t *)(iv + sizeof (uint64_t));
+
+	ASSERT(BP_IS_ENCRYPTED(bp));
+	bp->blk_dva[2].dva_word[0] = LE_64(*((uint64_t *)salt));


This trick doesn't work on processors which don't support unaligned memory access. You'll need to do it byte at a time (if it's unaligned, at least - but if you implement it both ways, make sure we can test hitting both code paths).

Same goes for the following functions.

See zap_leaf_array_read() for an example (the la_array is not aligned).

sempervictus · 2017-02-13T10:20:30Z

Thanks for updating the PR, will build out a new test mule this week.
I do have a potential concern/question however regarding operating logic in terms of resource consumption:
If users are allowed to attempt their own key loads, and use high iteration PBKDF derivations, whats to prevent them from exhausting resources on the host system by just pummeling it with passphrase load attempts until its blue in the face? Do the internal implementation or interface provide any load request/expensive operation throttling capability?

tcaputi · 2017-02-13T15:18:55Z

If users are allowed to attempt their own key loads, and use high iteration PBKDF derivations, whats to prevent them from exhausting resources on the host system by just pummeling it with passphrase load attempts until its blue in the face? Do the internal implementation or interface provide any load request/expensive operation throttling capability?

There are no restrictions, but I would argue that there is also nothing that would prevent them from running a while(1); program as well. The pbkdf2 iterations are done in userspace so they are killable.

ahrens · 2017-02-22T23:29:52Z

module/zfs/arc.c

+ *
+ * The L1ARC has a slightly different system for storing encrypted data.
+ * Raw (encrypted + possibly compressed) data has a few subtle differences from
+ * data that is just compressed. The biggest difference is that it is not always


I think you can delete always, because it is not possible to decrypt ... if keys aren't loaded (right?)

ahrens · 2017-02-22T23:37:09Z

module/zfs/arc.c

+ * may have both an encrypted version and a decrypted version of its data at
+ * once. When a caller needs a raw arc_buf_t, it is allocated and the data is
+ * copied out of this header. To avoid complications with b_pabd, raw buffers
+ * cannot be shared.


Not sure where is the best place to explain this in the code, but at least for my own edification:

When would we have both encrypted and unencrypted data in the ARC? If we scrub when the keys are not loaded, we'll have encrypted dnode blocks, I think? And once we have encrypted send, that could cache encrypted user data in the ARC? And if we later load the keys and do an unencrypted read, it would decrypt from b_rabd to b_abd.

Once we have both b_rabd and b_abd, do we ever evict one of them, to reduce memory usage? Or are we stuck with 2x memory usage until we happen to evict the whole hdr?

When would we have both encrypted and unencrypted data in the ARC? If we scrub when the keys are not loaded, we'll have encrypted dnode blocks, I think? And once we have encrypted send, that could cache encrypted user data in the ARC? And if we later load the keys and do an unencrypted read, it would decrypt from b_rabd to b_abd.

All correct.

Once we have both b_rabd and b_abd, do we ever evict one of them, to reduce memory usage? Or are we stuck with 2x memory usage until we happen to evict the whole hdr?

If we have both we evict the encrypted one as soon as all encrypted buffers are destroyed

Not sure where is the best place to explain this in the code...

There is actually already a comment for this in arc_buf_destroy_impl() (where the header freeing takes place).

OK cool, I haven't got to reading that part of the code yet, it's a lot to fit in my brain at once :-)

ahrens · 2017-02-22T23:38:26Z

module/zfs/arc.c

@@ -817,6 +835,8 @@ static arc_state_t	*arc_l2c_only;
 #define	arc_need_free	ARCSTAT(arcstat_need_free) /* bytes to be freed */
 #define	arc_sys_free	ARCSTAT(arcstat_sys_free) /* target system free bytes */

+/* encrypted + compressed size of entire arc */
+#define	arc_raw_size	ARCSTAT(arcstat_raw_size)


Encryption doesn't change the size of the data, so how is this different from arc_compressed_size?

this just gives an extra stat for how much raw encrypted data is stored in the ARC. I thought it would be useful seeing as how this memory operates separately from compressed memory.

So it's the sum of the size of the b_rabds? Could you update the comment to say that?

ahrens · 2017-02-22T23:44:47Z

include/sys/arc.h

@@ -112,20 +123,22 @@ typedef enum arc_flags
 	ARC_FLAG_L2_WRITING		= 1 << 11,	/* write in progress */
 	ARC_FLAG_L2_EVICTED		= 1 << 12,	/* evicted during I/O */
 	ARC_FLAG_L2_WRITE_HEAD		= 1 << 13,	/* head of write list */
+	/* encrypted on disk (may or may not be encrypted in memory) */
+	ARC_FLAG_ENCRYPT		= 1 << 14,


How about ARC_FLAG_ENCRYPTED (and HDR_ENCRYPTED()), since this is not being used as a verb.

sure. will fix.

ahrens · 2017-02-22T23:56:29Z

include/sys/arc_impl.h

+	/* encryption parameters */
+	uint8_t			b_salt[DATA_SALT_LEN];
+	uint8_t			b_iv[DATA_IV_LEN];
+	uint8_t			b_mac[DATA_MAC_LEN];


b_mac seems different from the other encryption parameters, because it's used not when writing to L2ARC, but rather when decrypting an encrypted hdr. In that case I think the caller could pass in the BP and we could get the mac there. It might be worth explaining that we keep the mac here for convenience in this sake, so the caller doesn't have to pass in the BP again.

Interesting point. it looks like the only place this is really used is in arc_hdr_decrypt(). I will add that comment. I could probably also assert that the mac in l2arc_apply_transforms() matches the one in the header.

ahrens · 2017-02-23T00:12:21Z

module/zfs/arc.c

-	 * pool. When this is the case, we must first compress it if it is
-	 * compressed on the main pool before we can validate the checksum.
-	 */
-	if (!HDR_COMPRESSION_ENABLED(hdr) && compress != ZIO_COMPRESS_OFF) {


Is this code moved somewhere else?

No, see comment below.

Which comment specifically?

I think this code is now assuming that we write blocks to the L2ARC exactly as they are in the main pool, including compressed, unencrypted blocks, even if the b_pabd is uncompressed?

Oh. I'm sorry I thought this was a different piece of code. You are correct, this change allows the L2ARC to write out data as it is in the main pool, removing the need for this step.

ahrens · 2017-02-23T00:17:46Z

module/zfs/arc.c

@@ -1652,15 +1701,14 @@ arc_hdr_set_compress(arc_buf_hdr_t *hdr, enum zio_compress cmp)
 	 */
 	if (!zfs_compressed_arc_enabled || HDR_GET_PSIZE(hdr) == 0) {
 		arc_hdr_clear_flags(hdr, ARC_FLAG_COMPRESSED_ARC);
-		HDR_SET_COMPRESS(hdr, ZIO_COMPRESS_OFF);


Is this an intentional change to now set the hdr's compression algorithm even if it is not compressed?

Yes. Previously, disabled arc compression worked by simply not storing the compression at all. This change allows us to keep track of the compression algorithm for raw blocks even if arc compression is disabled.

ahrens · 2017-02-23T00:19:21Z

module/zfs/arc.c

+ * This function will take a header that only has raw encrypted data in
+ * b_crypt_hdr.b_rabd and decrypts it into a new buffer which is stored in
+ * b_l1hdr.b_pabd. If designated on the header, this function will also
+ * decompress the data as well.


remove as well (already has also)

ahrens · 2017-02-23T00:29:53Z

module/zfs/arc.c

+	}
+
+	if (HDR_GET_COMPRESS(hdr) != ZIO_COMPRESS_OFF &&
+	    !HDR_COMPRESSION_ENABLED(hdr)) {


This could use a comment explaining this case. I think it's saying that the data on disk is compressed, and therefore the raw data in b_rabd is compressed, but we don't want this hdr to be stored compressed (i.e. b_pabd should not be compressed), so we need to decompress it.

That is the correct interpretation. I will add a comment.

ahrens · 2017-02-23T01:12:50Z

module/zfs/arc.c

+		tmp = abd_borrow_buf(cabd, arc_hdr_size(hdr));
+
+		ret = zio_decompress_data(HDR_GET_COMPRESS(hdr),
+		    hdr->b_l1hdr.b_pabd, tmp, HDR_GET_PSIZE(hdr),


If it's possible this is used in a performance-sensitive code path, it would be good to avoid the copy from b_pabd to linearize it for decompression. This could be avoided by allocating b_pabd as linear, before decrypting into it. (And then replacing b_pabd with the decompressed scatter ABD, as you're already doing here.)

My only issue with that is that then we will not be honoring zfs_abd_scatter_enabled. I do not think this is an incredibly performance-sensitive path either, since this will realistically only be run in 2 circumstances: decrypting dnode blocks and reading data that is already in the ARC due to a scrub.

Let me know what you think and I can change it if you still feel it is appropriate.

We could still honor zfs_abd_scatter_enabled because the data stored in b_pabd would be scatter (or not). It's only the intermediary, short-lived buffer (decrypted but not yet uncompressed) that would be linear.

But it does seem like not a super common code path so I could go either way.

I see what you mean now... This might be a bit messy to do since arc_hdr_alloc_abd() cannot currently decide how it would like its abd to look. Unfortunately the code for that is a few functions up the stack. I'm inclined to leave it for now unless it becomes a problem.

lundman · 2017-03-04T00:31:47Z

Oh I should mention, in the IllumOS patches, before I included your commits yesterday, when we did a zfs send -c | zfs recv -vu we would trigger:

Mar  1 19:46:39 omnios ^Mpanic[cpu1]/thread=ffffff0261a72800:
Mar  1 19:46:39 omnios genunix: [ID 403854 kern.notice] assertion failed: (encrypted) implies (compressed), file: ../../common/fs/zfs/arc.c, line: 2675
Mar  1 19:46:39 omnios unix: [ID 100000 kern.notice]
Mar  1 19:46:39 omnios genunix: [ID 655072 kern.notice] ffffff0008d675a0 genunix:process_type+17f230 ()
Mar  1 19:46:39 omnios genunix: [ID 655072 kern.notice] ffffff0008d67630 zfs:arc_buf_alloc_impl+2ea ()
Mar  1 19:46:39 omnios genunix: [ID 655072 kern.notice] ffffff0008d676c0 zfs:arc_alloc_compressed_buf+eb ()
Mar  1 19:46:39 omnios genunix: [ID 655072 kern.notice] ffffff0008d67700 zfs:arc_loan_compressed_buf+38 ()
Mar  1 19:46:39 omnios genunix: [ID 655072 kern.notice] ffffff0008d677a0 zfs:receive_read_record+e0 ()
Mar  1 19:46:39 omnios genunix: [ID 655072 kern.notice] ffffff0008d67990 zfs:dmu_recv_stream+399 ()
Mar  1 19:46:39 omnios genunix: [ID 655072 kern.notice] ffffff0008d67bd0 zfs:zfs_ioc_recv+20e ()
Mar  1 19:46:39 omnios genunix: [ID 655072 kern.notice] ffffff0008d67c70 zfs:zfsdev_ioctl+50f ()
Mar  1 19:46:39 omnios genunix: [ID 655072 kern.notice] ffffff0008d67cb0 genunix:cdev_ioctl+39 ()
Mar  1 19:46:39 omnios genunix: [ID 655072 kern.notice] ffffff0008d67d00 specfs:spec_ioctl+60 ()
Mar  1 19:46:39 omnios genunix: [ID 655072 kern.notice] ffffff0008d67d90 genunix:fop_ioctl+55 ()
Mar  1 19:46:39 omnios genunix: [ID 655072 kern.notice] ffffff0008d67eb0 genunix:ioctl+9b ()
Mar  1 19:46:39 omnios genunix: [ID 655072 kern.notice] ffffff0008d67f00 unix:brand_sys_sysenter+2b7 ()

But it could be an old bug now, but if it is trivial for you to set lz4, and test zfs send -c to rule it out...

rottegift · 2017-03-04T02:56:41Z

Further to @lundman 's comment, we have openzfsonosx/zfs#557

tcaputi · 2017-03-04T03:51:28Z

@lundman I dont know what version you were on before, but the problem seems to be fixed now.

lundman · 2017-03-04T04:17:14Z

Woo \o/

tcaputi · 2018-03-28T19:55:40Z

@sempervictus
ZFS currently has a generic implementation, an accelerated x86-64 implementation, and an even further accelerated AESNI implementation. Normally even AMD chips would use the x86-64 accelerated version of the code (which still seems to be compatible). It is only the generic implementation that is causing issues which should only matter for other architectures and (as I believe @teknoman117 is doing here) for testing purposes.

tcaputi · 2018-03-29T20:29:59Z

@teknoman117 The issue does not seem to reproduce on Illumos, so we must have changed something at some point. I hope that I can get to the bottom of it tomorrow.

tcaputi · 2018-03-30T20:12:44Z

@teknoman117 I figured it out. The problem is actually just with your patch. You need this in the patch as well:

@@ -109,12 +111,14 @@ static int intel_aes_instructions_present(void);
 #define	rijndael_key_setup_enc_raw	rijndael_key_setup_enc
 #endif	/* __amd64 */
 
-#if defined(_LITTLE_ENDIAN) && !defined(__amd64)
+//#if defined(_LITTLE_ENDIAN) && !defined(__amd64)
+#if 1
 #define	AES_BYTE_SWAP
 #endif

Let me know if I'm missing something or if you need anything else.

teknoman117 · 2018-04-01T00:21:25Z

@tcaputi

My assumption to leave that guard in was based on this comment in aes_impl.c
/* * For _LITTLE_ENDIAN machines (except AMD64), reverse every * 4 bytes in the key. On _BIG_ENDIAN and AMD64, copy the key * without reversing bytes. * For AMD64, do not byte swap for aes_setupkeys(). * * SPARCv8/v9 uses a key schedule array with 64-bit elements. * X86/AMD64 uses a key schedule array with 32-bit elements. */

That being said, this does indeed seem to let the generic c implementation load a dataset encrypted using the the aesni or x86_64 optimized method. However, removing the guard completely breaks the accelerated methods. I suppose this means that for x86_64 the byte swap only is only needed for the generic implementation.

jbrodriguez · 2018-04-01T00:36:18Z

My question is .. why is this not stable yet ? I need to run zfs-linux-git on Arch Linux ... sort of unofficially .. is it testing what's keeping this from stable ?

grantwwu · 2018-04-01T00:47:42Z

@jbrodriguez It's not stable yet because it's a Hard Problem, and because the developers have run into some issues, most notably #6845.

I don't think you really ought to complain unless you're paying the OpenZFS developers... and if you are you probably have internal channels for that :)

jbrodriguez · 2018-04-01T00:51:50Z

Thanks, I don't know how much you officially speak for @grantwwu, I was just asking a question. Nevertheless, thanks for pointing out the issue, I'll subscribe to it.

tcaputi · 2018-04-01T02:11:19Z

@teknoman117

I suppose this means that for x86_64 the byte swap only is only needed for the generic implementation.

Yeah. I think that comment was meant to be taken in the context of the entire file. In other words, I think the #define was written assuming that the accelerated implementations wouldn't be forcefully removed from under it.

@jbrodriguez

My question is .. why is this not stable yet ? I need to run zfs-linux-git on Arch Linux ... sort of unofficially .. is it testing what's keeping this from stable ?

The 3 big reasons:

As @grantwwu alluded, this patch was 26000 lines of code (~15000 of which were completely new and not ported from elsewhere) and required modifications to virtually all parts of ZFS (except possibly the vdev and metaslab layers). We have hit issues with this patch since it was merged, most notably problems with raw sends, which is to be expected for a patch this large. One of these problems even required an on-disk format update to fix. That said, I have not heard of any new problems in the past 3 weeks or so, so I'm hoping that's a good sign.
We haven't released a major version of ZFS (0.8.0) since encryption was merged. That is still pending on a few other features which are slated to be released in the new major version. ZFS encryption will be included in the 0.8.0 release.
I have another patch coming (hopefully as a PR early next week if I can resolve the one last issue) which will add support for zfs recv -o / -x with encryption properties, which is sorely needed to allow people to convert their existing pools easily. Right now you can't do a zfs send -R | zfs recv to convert your pool to encrypted because zfs send -R includes all properties and one of them happens to be encryption=off. Since you can't tell zfs to ignore encryption properties at the moment, you can't use this to transition your datasets.

jbrodriguez · 2018-04-01T15:36:16Z

Thanks @tcaputi, the major version comment is spot on.

I'm looking to start from scratch, so I may have overlooked some considerations.

Can't praise enough the work you've done here, kudos.

This change incorporates three major pieces: The first change is a keystore that manages wrapping and encryption keys for encrypted datasets. These commands mostly involve manipulating the new DSL Crypto Key ZAP Objects that live in the MOS. Each encrypted dataset has its own DSL Crypto Key that is protected with a user's key. This level of indirection allows users to change their keys without re-encrypting their entire datasets. The change implements the new subcommands "zfs load-key", "zfs unload-key" and "zfs change-key" which allow the user to manage their encryption keys and settings. In addition, several new flags and properties have been added to allow dataset creation and to make mounting and unmounting more convenient. The second piece of this patch provides the ability to encrypt, decyrpt, and authenticate protected datasets. Each object set maintains a Merkel tree of Message Authentication Codes that protect the lower layers, similarly to how checksums are maintained. This part impacts the zio layer, which handles the actual encryption and generation of MACs, as well as the ARC and DMU, which need to be able to handle encrypted buffers and protected data. The last addition is the ability to do raw, encrypted sends and receives. The idea here is to send raw encrypted and compressed data and receive it exactly as is on a backup system. This means that the dataset on the receiving system is protected using the same user key that is in use on the sending side. By doing so, datasets can be efficiently backed up to an untrusted system without fear of data being compromised. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes openzfs#494 Closes openzfs#5769

taoeffect · 2019-05-30T21:04:34Z

This looks awesome, congrats! Is there a guide / tutorial anywhere on how to set this up?

Can be integrated with an existing LUKS encrypted partition setup? (How?) Thanks for helping out a newb!

sempervictus · 2019-05-30T21:16:10Z

Manpages show usage, luks is just the header for most dm-crypt setups, and dm-crypt CAN be used under it to protect its unencrypted metadata at the overhead of a whole encrypt/decrypt round under the zfs impl.
@tcaputi: am I correct in my recollection that dm-crypt is using coproc/simd accelerated crypto implementations in kernel while zfs does not?

taoeffect · 2019-05-30T21:21:04Z

Which manpage, for LUKS or ZFS? (And if ZFS which command?).

I might not have been super clear. It's my understanding that "ZFS on root" (i.e. ZFS managing the entire drive) doesn't work well on Linux, so what people do is they create a ZFS partition.

That's what this guide does and that's what I've been following.

In that guide a partition is created for /boot, the main OS under / (LUKS encrypted), and then a third partition is created for ZFS to manage, which is also LUKS encrypted. The key for the third partition is stored on the main / partition.

But now that zfsonlinux has encryption, what's the best thing to do, and how?

benley · 2019-05-30T21:27:50Z

I'm not sure if this is the best thing to do, but on my laptop I've been using a small fat32 partition for /boot and the UEFI ESP, and / is an encrypted ZFS dataset without any LUKS header.

rlaager · 2019-05-30T21:47:59Z

I have an experimental version of the HOWTO for Debian that builds 0.8.0 from experimental:
https://github.com/zfsonlinux/zfs/wiki/Debian-Buster-Encrypted-Root-on-ZFS

I haven't updated that since the rc4 package. The final 0.8.0 will likely be in experimental soon, at which point I'll update that again. I'm not sure how much I want to publicize this, though, as a hand-built backport isn't necessarily the best idea.

sempervictus · 2019-05-30T22:04:42Z

LUKS is a Linux component, zfs is not, they share zero code, and one is a header for full block encryption of a device while the other is data encryption inside a filesystem. They're not related by any direct linkage.
Zfs on root works fine, on the 0.7.x branches anyway and 0.6.x when we used those. That goes for with or without dm-crypt, the zfs crypto bits won't play with grub and I am not aware of efi bootloaders supporting it either. Though an efi-loaded kernel with built-in zfs having crypto might be worth trying.
The efi partition is always fat, nothing we can do about that unfortunately. Lowest common denominators or something

taoeffect · 2019-05-31T02:35:25Z

@rlaager wrote:

I have an experimental version of the HOWTO for Debian that builds 0.8.0 from experimental:
https://github.com/zfsonlinux/zfs/wiki/Debian-Buster-Encrypted-Root-on-ZFS

Wow, thanks for that. It seems like it could be useful as a reference, but also seems quite complicated.

I'm wondering if there's a simpler way? Currently exploring the following approach (with Fedora) for a VPS setup:

Create two partitions (/boot and a 15GB /) as XFS, install via GUI installer
Boot the machine and install ZFS
Create a 3rd partition in remaining space and tell ZFS to manage it
Use rsync to copy all data in / to the new ZFS partition
Somehow make the new ZFS partition the new root partition and boot from it
Format the old root partition and add its device to the root pool to use up all the space available

Is that doable? Would it be simpler? (anyone know of any guides for that?)

taoeffect · 2019-05-31T02:47:58Z

I have an experimental version of the HOWTO for Debian that builds 0.8.0 from experimental: https://github.com/zfsonlinux/zfs/wiki/Debian-Buster-Encrypted-Root-on-ZFS

I should have mentioned in my previous comments, one of the reasons I can't follow that approach is that, as mentioned in that link, you don't always have the required RAM available to use a Live CD (as with a VPS), but also my web host does not offer a Fedora Live CD, so even if I had the RAM, I'd be forced to use the "anaconda" troublshooting environment which doesn't even have a working package manager and is a nightmare to use.

tcaputi · 2019-05-31T18:31:28Z

@tcaputi: am I correct in my recollection that dm-crypt is using coproc/simd accelerated crypto implementations in kernel while zfs does not?

We both use accelerated encryption implementations, but due to licensing restructions they are separate implementations.

sempervictus · 2019-05-31T21:23:00Z

Is this still true now that SIMD support is cut upstream due to GPL-only export of FPU functions? 4.14+ have that commit applied nowadays since it got back ported to 4.14.120

teknoman117 · 2019-06-01T07:27:55Z

@taoeffect I run ZFS as the root file system on my Linux machine. There's a small fat32 partition at the beginning which uses an efistub kernel and the zfs "pool" is the second partition which contains datasets for /, /home, and some others.

tcaputi · 2019-06-01T16:33:33Z

@sempervictus

Is this still true now that SIMD support is cut upstream due to GPL-only export of FPU functions? 4.14+ have that commit applied nowadays since it got back ported to 4.14.120

If you have that commit then yes SIMD support is disabled. We aer still working on figuring out a workaround there.

sempervictus · 2019-06-01T17:26:08Z

We build kernels in-house with zfs as a built-in to the main kernel image (avoids certain instrumentation passes for modules at runtime), which permits us the opportunity to simply revert that commit. Most of our clients cannot do this, and folks are talking about moving to 0.8 this year already due to crypto being available. Im a bit worried that more will stop using zfs for being too slow - we've had a few drop in the last year or so because modern storage hw is much faster than the zfs io pipeline, kpti smashed even more bs into the execution of the "slow" fs code paths, and with no SIMD this is just going to get worse and worse.
It looks to me from the exchange @kpande had with Greg that the Linux team is actively trying to cripple us in order to favor filesystems under their preferred license model. SPL wrappers to re-expose FPU functions seem to me the most reasonable way to do this. What's Linus going to do, sue for infringement? VMware is full of such things and worse, and so far, they're winning their case

pashford · 2019-06-01T19:13:01Z

@sempervictus,

the Linux team is actively trying to cripple us in order to favor filesystems under their preferred license model

Based on what little I saw, I must, unfortunately. agree. You wouldn't happen to have a link to the issue in question, would you?

At this time, it looks to me like the Linux Kernel project is effectively controlled by zealots. I see this as the first step in a "whack-a-mole" process (with aggressive back-porting) to cripple ZoL in favor of the broken BTRFS. As with any religious war, the only thing that matters is the religion; facts are irrelevant. I don't believe this is a war that ZoL can win.

With that in mind, I'm in the process of creating a message to all my ZoL customers, suggesting that they consider migrating their file servers from Linux/ZoL to FreeBSD/OpenZFS.

jbrodriguez · 2019-06-01T21:32:14Z

If need be, as the five and dime shop we are, we'll figure out how to push our own kernel build (+patches) into our archlinux setup.

We used to use Oracle Solaris 11 11/11 for its ZFS encryption capabilities.

Switched to dm-crypt and jumped in blindfolded into linux zfs encryption as soon as tom caputi and team made it available.

We're not going back. Only forward.

taoeffect · 2019-06-01T23:19:38Z

@teknoman117 wrote:

@taoeffect I run ZFS as the root file system on my Linux machine. There's a small fat32 partition at the beginning which uses an efistub kernel and the zfs "pool" is the second partition which contains datasets for /, /home, and some others.

You and @benley both seem to be doing this. However, every tutorial I have seen for ZFS on root is insanely complicated, and also does not seem to always work on VPS setups with only 2GB of RAM because of the LiveCD requirements.

I wish deeply that ZFS were the default on Linux, and that it were simple to setup. That unfortunately is not the reality.

Also, I have not seen any comparisons between the latest ZFS encryption stuff and ZFS on LUKS. It would be useful to know what the tradeoffs are in terms of both ease-of-use, security, and performance.

tcaputi · 2019-06-02T00:03:37Z

With that in mind, I'm in the process of creating a message to all my ZoL customers, suggesting that they consider migrating their file servers from Linux/ZoL to FreeBSD/OpenZFS.

I'm not really one to get into the politics of this whole situation, but I will say that despite the statement from some of the Linux maintainers, nothing has really changed in this regard. The Linux kernel has always had a set of functions and symbols that it exports in one release and stops exporting in the next and its internal APIs are always changing. As a "third party" module, we have always had to adapt our software to these changes and we must always write code that can support past, present, and future versions as best as possible. Every time a new kernel comes out, our testing builders fail and within a few days / weeks someone takes a look and fixes any new incompatibilities that have arisen. This particular issue is a bit trickier to work around, but I don't believe we are fundamentally incapable of dealing with it as we have in the past.

sempervictus · 2019-06-02T00:07:50Z

@pashford: I wouldn't send that email out yet, likely not to work. ZoL has the accounting feature which, iirc, prevents use on non-Linux systems. This is why we started making all new pools with that feature disabled, then patched zfs to make disabled the default.

@taoeffect: livecd is squashfs mapped to ram, not much to do with zfs. Far as dm-crypt vs ANY filesystem crypto, you probably want to get a firm understanding of what block data and file data are relative to each other, because you keep asking questions indicative of a thought process which does not account for the differences in the logical tiers of storage required to provide a user with access to a file (encrypted or otherwise). Far as performance goes, see above, Linux is actively trying to degrade our performance for by making the relevant interface GPL-only. So performance where? On 4.14.119 or today? What are you trying to use zfs for from a business-logic perspective, and why not pay the hosting company for the proper amount of resources?

taoeffect · 2019-06-02T00:30:42Z

Per @kpande's request, I've opened up two discussion threads on the mailing list:

(BTW, this is amazing mailing list software! Never seen it before!)

But to quickly answer your questions @sempervictus:

So performance where? On 4.14.119 or today?

I mean Linux in general. Whatever the latest kernel is. I doubt people will stay on old kernels for long due to the constant security problems that are being created, found, fixed, and re-created.

What are you trying to use zfs for from a business-logic perspective, and why not pay the hosting company for the proper amount of resources?

A 2GB VPS seems like a reasonable target for ZFS, no?

Feel free to post any replies via the mailing list!

sempervictus · 2019-06-02T01:19:52Z

So far as 2G goes, really depends on the workload. ARC usually takes half your ram, so if the kernel and userspace are under a gig, you're OK till you run software. If you have a large pool with a bunch of dedup blocks, your DDTs will take up even more. Swap and ARC/DDTs don't mix well...
For your benchmark question, the test setup would be ZFS atop a dm-crypt volume vs zfs on disk using its own crypto. On the newest kernel, zfs doesn't get SIMD, so even if the algos are the same, it'll be slower. They're not as Tom said, ZFS can't touch Linux crypto for license reasons and it wouldn't be portable. So if you're looking for performance, and have a decent chip backing the vps, the native Linux crypto will run faster pound for pound as its using the "fastpath" of SIMD instructions. It also protects all of the metadata, so more coverage, but less flexibility and granularity.
Lastly, this entire FPU thing is casting a shadow of instability on ZoL again as they may GPL-only anything and break ZoL entirely at their whim. There's no guarantee that Linux will "permit" ZoLs continued existence. There's always grsec... :)

tcaputi mentioned this pull request Feb 9, 2017

ZFS Encryption #4329

Closed

tcaputi force-pushed the master branch 2 times, most recently from 6cf566f to ae18e6d Compare February 9, 2017 23:36

grahamperrin reviewed Feb 10, 2017

View reviewed changes

tcaputi force-pushed the master branch 2 times, most recently from c4daa03 to 8594e08 Compare February 11, 2017 10:19

ahrens reviewed Feb 12, 2017

View reviewed changes

tcaputi force-pushed the master branch 4 times, most recently from afee592 to 2284451 Compare February 19, 2017 16:24

tcaputi force-pushed the master branch 2 times, most recently from 84eeb12 to 5b21838 Compare February 23, 2017 00:17

ahrens reviewed Feb 23, 2017

View reviewed changes

tcaputi force-pushed the master branch from 5b21838 to 96bce02 Compare February 23, 2017 07:05

crawfxrd mentioned this pull request Feb 24, 2017

Fix checksumflags assignment in cksummer #5830

Merged

11 tasks

tcaputi mentioned this pull request Mar 2, 2017

8727 Native data and metadata encryption for zfs openzfs/openzfs#124

Closed

wli5 mentioned this pull request Mar 2, 2017

GZIP compression offloading with QAT accelerator #5846

Merged

11 tasks

tcaputi mentioned this pull request Mar 2, 2017

DRR_WRITE_EMBEDDED records need to include their own byteswap info #5854

Open

ZFS Encryption #5769

ZFS Encryption #5769

Conversation

tcaputi commented Feb 9, 2017 • edited

Native encryption in zfsonlinux (See issue #494)

Implementation details that should be looked at

mention-bot commented Feb 9, 2017

tcaputi commented Feb 9, 2017

grahamperrin commented Feb 10, 2017 • edited

grahamperrin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

niieani commented Feb 11, 2017

tcaputi commented Feb 11, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sempervictus commented Feb 13, 2017

tcaputi commented Feb 13, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lundman commented Mar 4, 2017 • edited

rottegift commented Mar 4, 2017

tcaputi commented Mar 4, 2017

lundman commented Mar 4, 2017

tcaputi commented Mar 28, 2018

tcaputi commented Mar 29, 2018

tcaputi commented Mar 30, 2018 • edited

teknoman117 commented Apr 1, 2018 • edited

jbrodriguez commented Apr 1, 2018

grantwwu commented Apr 1, 2018

jbrodriguez commented Apr 1, 2018

tcaputi commented Feb 9, 2017 •

edited

grahamperrin commented Feb 10, 2017 •

edited

lundman commented Mar 4, 2017 •

edited

tcaputi commented Mar 30, 2018 •

edited

teknoman117 commented Apr 1, 2018 •

edited

jbrodriguez commented Apr 1, 2018 •

edited

taoeffect commented May 30, 2019 •

edited

taoeffect commented May 31, 2019 •

edited