Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZFS Encryption #5769

Merged
merged 1 commit into from Aug 14, 2017

Conversation

Projects
None yet
@tcaputi
Copy link
Contributor

tcaputi commented Feb 9, 2017

Native encryption in zfsonlinux (See issue #494)

The change incorporates 2 major pieces:

The first feature is a keystore that manages wrapping and encryption keys for encrypted datasets. The commands are similar to that of Solaris but with a few key enhancements to make it more predictable, more consistent, and require less manual maintenance. It is fully integrated with the existing zfs create functions and zfs clone functions. It also exposes a new set of commands via zfs key for managing the keystore. For more info on the issues with the Solaris implementation see my comments here and here. The keystore operates on a few rules.

  • All wrapping keys are 32 bytes (256 bits), even for 128 and 192 bit encryption types.
  • Encryption must be specified at dataset creation time.
  • Specifying a keylocation or keyformat while creating a dataset causes the dataset to become the root of an encryption tree.
  • All members of an encryption tree share the same wrapping key.
  • Each dataset can have up to 1 keychain (if it is encrypted) that is not shared with anybody.

The second feature is the actual data and metadata encryption. All user data in an encrypted dataset is stored encrypted on-disk. User-provided metadata is also encrypted, but metadata structures have been left plain so that scrubbing and resilvering still works without the keys loaded. The design was originallly inspired by this article but has been changed fairly significantly since.

Implementation details that should be looked at

  • Encrypting data going to disk requires creating a key_mapping_t during dsl_dataset_tryown(). I added a flag to this function for code that wishes to own the dataset, but that does not require encrypted data, such as the scrub functions. I did my best to confirm that all owners set this flag correctly, but someone should confirm them, just to be sure.
  • zfs send and zfs recv do not currently do anything special with regards to encryption. The format of the send file has not changed and zfs send requires the keys to be loaded in order to work. At some point there should probably be a way to do raw sends.
  • I altered the prototype of lzc_create() and lzc_clone() to support hidden arguments. I understand that the purpose of libzfs_core is to have a stable api interacting with the ZFS ioctls. However, these functions need to accept wrapping keys separately from the rest of their parameters because they need to use the (new) hidden_args framework to support hiding arguments from the logs. Without this, the wrapping keys would get printed to the zpool history.

EDIT 5/4/16: Updated to reflect the current state of the PR
EDIT 1/3/17: Updated to reflect the current state of the PR
EDIT 2/9/17: reopened under a new PR due to long Github load times (previous PR was #4329)

@mention-bot

This comment has been minimized.

Copy link

mention-bot commented Feb 9, 2017

@tcaputi, thanks for your PR! By analyzing the history of the files in this pull request, we identified @behlendorf, @grwilson and @ahrens to be potential reviewers.

@tcaputi tcaputi referenced this pull request Feb 9, 2017

Closed

ZFS Encryption #4329

@tcaputi

This comment has been minimized.

Copy link
Contributor Author

tcaputi commented Feb 9, 2017

Several updates have been made to the code since it was moved (thanks to @ahrens for the review thus far). These are the highlights:

  • the keysource property has been split into keylocation and keyformat
  • the keylocation property can now be set with zfs set (with some restrictions)
  • the zfs key sub-command has been split into zfs load-key, zfs unload-key and zfs change-key.
  • zfs load-key and zfs unload-key now support -r and -a for recursive key loading / unloading
  • zfs load-key now supports -n (noop) for checking that a key is correct without loading it
  • zfs load-key now supports -L (location) for loading a key from an alternate keylocation
  • zfs change-key now supports -i (inherit) for making a dataset inherit its parent's keys
  • the default number of pbkdf2 iterations has been bumped up to 350000 (minimum is now 100000)
  • the man pages have been updated and some clarifications have been made
  • the encryption feature now uses the ZFEATURE_FLAG_PER_DATASET framework for refcounting
  • various smaller code cleanups and fixes

These changes do constitute an on-disk format change, so anyone updating from an earlier version of the PR will have to recreate their pool.

@tcaputi tcaputi force-pushed the tcaputi:master branch 2 times, most recently from 6cf566f to ae18e6d Feb 9, 2017

@grahamperrin

This comment has been minimized.

Copy link

grahamperrin commented Feb 10, 2017

Thanks to everyone so far 👍


… User-provided metadata is also encrypted, but metadata structures have been left plain …

For clarity (for any newcomer to this PR), from https://github.com/tcaputi/zfs/blob/ae18e6d0a8c1b32991094594d1b751646bc5b4ca/man/man8/zfs.8 with added emphasis:

will encrypt all user data including file and zvol data, file attributes, ACLs, permission bits, directory listings, and FUID mappings.

will not encrypt metadata related to the pool structure, including dataset names, dataset hierarchy, file size, file holes, and dedup tables. …

(The first – a definition of user data – includes some things that might otherwise be viewed as metadata.)


Also from the opening of this PR:

Specifying a keysource …

– could be updated to reflect the split that is highlighted in the first (human) comment.

@grahamperrin
Copy link

grahamperrin left a comment

Documentation: first pass …

accessing the data allow for it. Deduplication with encryption will leak
information about which blocks are equivalent in a dataset and will incur an
extra CPU cost per block written.
This feature enables the creation and management of natively encrypted datasets.

This comment has been minimized.

@grahamperrin

grahamperrin Feb 10, 2017

As details of the feature are more in the manual page for zfs(8) than in the page for zpool(8), so the SEE ALSO part of this zpool-features(5) page should be expanded to include zfs(8).

Selecting \fBencryption\fR=\fBon\fR when creating a dataset indicates that the
default encryption suite will be selected, which is currently \fBaes-256-ccm\fR.
In order to provide consistent data protection, encryption must be specified at
dataset creation time and it cannot be changed afterwards.
.sp
For more details and caveats about encryption see \fBzpool-features\fR.

This comment has been minimized.

@grahamperrin

grahamperrin Feb 10, 2017

The SEE ALSO part of this zfs(8) manual page should be expanded to include zpool-features(5).

accessing the data allow for it. Deduplication with encryption will leak
information about which blocks are equivalent in a dataset and will incur an
extra CPU cost per block written.
This feature enables the creation and management of natively encrypted datasets.

This comment has been minimized.

@grahamperrin

grahamperrin Feb 10, 2017

Basics.

Can creation of a pool comprise a single encrypted dataset?

This comment has been minimized.

@tcaputi

tcaputi Feb 10, 2017

Author Contributor

Yes. This functions the same way as any other zfs property (compression, deduplication, etc)

@tcaputi tcaputi force-pushed the tcaputi:master branch 2 times, most recently from c4daa03 to 8594e08 Feb 10, 2017

@niieani

This comment has been minimized.

Copy link

niieani commented Feb 11, 2017

Fantastic work 👍 . I'm curious, how far would you say are you from publishing the changes from this PR? A rough estimate would do, say: weeks, months, Q2, Q3?

@tcaputi

This comment has been minimized.

Copy link
Contributor Author

tcaputi commented Feb 11, 2017

@niieani It's tough to say. That largely depends on how the review goes. At this point @ahrens has reviewed the DSL changes and I've made adjustments (which still need a bit of stabilizing) to the code to account for his notes, but that is all that has happened so far.

@@ -245,26 +247,35 @@ dsl_dataset_phys(dsl_dataset_t *ds)
#define DS_UNIQUE_IS_ACCURATE(ds) \
((dsl_dataset_phys(ds)->ds_flags & DS_FLAG_UNIQUE_ACCURATE) != 0)

/* flags for holding the dataset */
#define DS_HOLD_FLAG_DECRYPT (1 << 0) /* needs access encrypted data */

This comment has been minimized.

@ahrens

ahrens Feb 12, 2017

Contributor

Lets use an enum for this, so that it's more clear what values the function argument can have. (Unfortunately there's a bad example set by some other flags - cleanup has been on my to-do list for a while now.)

This comment has been minimized.

@tcaputi

tcaputi Feb 12, 2017

Author Contributor

good to know. will fix

if (dd == NULL) {
dsl_dir_t *winner;

dd = kmem_zalloc(sizeof (dsl_dir_t), KM_SLEEP);
dd->dd_object = ddobj;
dd->dd_dbuf = dbuf;
dd->dd_pool = dp;

if (doi.doi_type == DMU_OTN_ZAP_METADATA &&

This comment has been minimized.

@ahrens

ahrens Feb 12, 2017

Contributor

I think you can use dsl_dir_is_zapified(), instead. (at a cost of maybe doing dmu_object_info twice - shouldn't be a big deal)

This comment has been minimized.

@tcaputi

tcaputi Feb 12, 2017

Author Contributor

will fix


/*
* encrypted datasets can only be moved if they are
* an encryption root (locally set keyformat).

This comment has been minimized.

@ahrens

ahrens Feb 12, 2017

Contributor

Why don't we allow that? Seems like a common use case would be "encrypt all datasets below this the same way". "whoops, named a child dataset wrong, rename it". As long as the new location is under the same encryption root, I think we can allow the rename. Would that be hard to implement?

I thought there was a restriction that an unencrypted dataset couldn't be under an encrypted dataset. Doesn't that need to be enforced here?

I didn't see either of these restrictions in the manpages. Seems like they should at least be in the zfs rename section, and maybe also the Encryption section.

This comment has been minimized.

@tcaputi

tcaputi Feb 12, 2017

Author Contributor

Why don't we allow that? Seems like a common use case would be "encrypt all datasets below this the same way". "whoops, named a child dataset wrong, rename it"

To clarify, renaming an encryption child is ok. Moving to a new parent is not currently.

As long as the new location is under the same encryption root, I think we can allow the rename. Would that be hard to implement?

I think I can implement it like that. Shouldn't be a big deal.

I thought there was a restriction that an unencrypted dataset couldn't be under an encrypted dataset. Doesn't that need to be enforced here?

Whoops. It looks like I must've rebased that away at some point. I will put it back.

I didn't see either of these restrictions in the manpages. Seems like they should at least be in the zfs rename section, and maybe also the Encryption section.

I will make sure this is documented in both places (I think there is a sentence about it in the encryption section).

locally, the dataset is an encryption root. Encryption roots share their keys
with all datasets that inherit this property. This means that when a key is
loaded for the encryption root, all children that inherit the \fBkeylocation\fR
property are automatically loaded as well.

This comment has been minimized.

@ahrens

ahrens Feb 12, 2017

Contributor

how about the key for all children ... is automatically loaded as well

This comment has been minimized.

@tcaputi

tcaputi Feb 12, 2017

Author Contributor

will fix

.sp
Even though the encryption suite cannot be changed after dataset creation, the
keylocation can be with \fBzfs change-key\fR. If the dataset is an encryption
root this property may also be changed with \fBzfs set\fR. If \fBprompt\fR is

This comment has been minimized.

@ahrens

ahrens Feb 12, 2017

Contributor

If it's not an encryption root, can the key be changed with zfs change-key? The current wording implies it can, by calling out the restriction only for zfs set

This comment has been minimized.

@tcaputi

tcaputi Feb 12, 2017

Author Contributor

That is correct. non-encryption roots can use zfs change-key, which will make them an encryption root. I will add a sentence to the man page to reflect this.

if (ret != 0)
goto error;

return (0);

This comment has been minimized.

@ahrens

ahrens Feb 12, 2017

Contributor

seems like this (success case) would work right if it fell through to the error label.

This comment has been minimized.

@tcaputi

tcaputi Feb 12, 2017

Author Contributor

I can change it. I personally like separate error labels that don't share code.

* the keys may not be loaded.
*/
if (wbuf == NULL && BP_IS_ENCRYPTED(bp))
zio_flags |= ZIO_FLAG_RAW;

This comment has been minimized.

@ahrens

ahrens Feb 12, 2017

Contributor

could we change this to pass RAW even if not encrypted?

This comment has been minimized.

@tcaputi

tcaputi Feb 12, 2017

Author Contributor

I don't see why not. I just didn't want to impact existing functionality. I'll run some tests to make sure that works and make the change.

BP_SET_ENCRYPTED(bp, B_TRUE);
}

/* Perform the encryption. This should not fail */

This comment has been minimized.

@ahrens

ahrens Feb 12, 2017

Contributor

Can we use a more precise word than "should"? You're gracefully handling failure just below, so presumably it does fail in some cases... Maybe This will fail for block types that are partially encrypted (dnode, zil) if no encryption is needed.

This comment has been minimized.

@tcaputi

tcaputi Feb 12, 2017

Author Contributor

I will change the "should". I am not gracefully handling errors. The only acceptable return codes here are 0 and ZIO_NO_ENCRYPTION_NEEDED (see the assertion below).

This comment has been minimized.

@ahrens

ahrens Feb 12, 2017

Contributor

OK - I wasn't sure if ZIO_NO_ENCRYPTION_NEEDED was considered an error.

This comment has been minimized.

@tcaputi

tcaputi Feb 13, 2017

Author Contributor

No, its a special return code to indicate that the block didn't need to be encrypted for some reason (for instance a dnode block with no encryptable bonus buffers).

* evenly across the bits of the checksum. Therefore,
* when truncating a weak checksum we XOR the first
* 2 words with the last 2 so that we don't "lose" any
* entropy unnecessarily.

This comment has been minimized.

@ahrens

ahrens Feb 12, 2017

Contributor

Is there any downside to doing this for all checksum types?

This comment has been minimized.

@tcaputi

tcaputi Feb 12, 2017

Author Contributor

I'm not aware of any, but I know all secure checksums support being truncated without issue whereas there may be weaknesses for XOR folding.

uint32_t *iv2 = (uint32_t *)(iv + sizeof (uint64_t));

ASSERT(BP_IS_ENCRYPTED(bp));
bp->blk_dva[2].dva_word[0] = LE_64(*((uint64_t *)salt));

This comment has been minimized.

@ahrens

ahrens Feb 12, 2017

Contributor

This trick doesn't work on processors which don't support unaligned memory access. You'll need to do it byte at a time (if it's unaligned, at least - but if you implement it both ways, make sure we can test hitting both code paths).

Same goes for the following functions.

See zap_leaf_array_read() for an example (the la_array is not aligned).

This comment has been minimized.

@tcaputi

tcaputi Feb 12, 2017

Author Contributor

will fix.

@sempervictus

This comment has been minimized.

Copy link
Contributor

sempervictus commented Feb 13, 2017

Thanks for updating the PR, will build out a new test mule this week.
I do have a potential concern/question however regarding operating logic in terms of resource consumption:
If users are allowed to attempt their own key loads, and use high iteration PBKDF derivations, whats to prevent them from exhausting resources on the host system by just pummeling it with passphrase load attempts until its blue in the face? Do the internal implementation or interface provide any load request/expensive operation throttling capability?

@tcaputi

This comment has been minimized.

Copy link
Contributor Author

tcaputi commented Feb 13, 2017

If users are allowed to attempt their own key loads, and use high iteration PBKDF derivations, whats to prevent them from exhausting resources on the host system by just pummeling it with passphrase load attempts until its blue in the face? Do the internal implementation or interface provide any load request/expensive operation throttling capability?

There are no restrictions, but I would argue that there is also nothing that would prevent them from running a while(1); program as well. The pbkdf2 iterations are done in userspace so they are killable.

@tcaputi tcaputi force-pushed the tcaputi:master branch 4 times, most recently from afee592 to 2284451 Feb 13, 2017

@tcaputi tcaputi force-pushed the tcaputi:master branch 2 times, most recently from 84eeb12 to 5b21838 Feb 22, 2017

*
* The L1ARC has a slightly different system for storing encrypted data.
* Raw (encrypted + possibly compressed) data has a few subtle differences from
* data that is just compressed. The biggest difference is that it is not always

This comment has been minimized.

@ahrens

ahrens Feb 23, 2017

Contributor

I think you can delete always, because it is not possible to decrypt ... if keys aren't loaded (right?)

This comment has been minimized.

@tcaputi

tcaputi Feb 23, 2017

Author Contributor

will fix

* may have both an encrypted version and a decrypted version of its data at
* once. When a caller needs a raw arc_buf_t, it is allocated and the data is
* copied out of this header. To avoid complications with b_pabd, raw buffers
* cannot be shared.

This comment has been minimized.

@ahrens

ahrens Feb 23, 2017

Contributor

Not sure where is the best place to explain this in the code, but at least for my own edification:

When would we have both encrypted and unencrypted data in the ARC? If we scrub when the keys are not loaded, we'll have encrypted dnode blocks, I think? And once we have encrypted send, that could cache encrypted user data in the ARC? And if we later load the keys and do an unencrypted read, it would decrypt from b_rabd to b_abd.

Once we have both b_rabd and b_abd, do we ever evict one of them, to reduce memory usage? Or are we stuck with 2x memory usage until we happen to evict the whole hdr?

This comment has been minimized.

@tcaputi

tcaputi Feb 23, 2017

Author Contributor

When would we have both encrypted and unencrypted data in the ARC? If we scrub when the keys are not loaded, we'll have encrypted dnode blocks, I think? And once we have encrypted send, that could cache encrypted user data in the ARC? And if we later load the keys and do an unencrypted read, it would decrypt from b_rabd to b_abd.

All correct.

Once we have both b_rabd and b_abd, do we ever evict one of them, to reduce memory usage? Or are we stuck with 2x memory usage until we happen to evict the whole hdr?

If we have both we evict the encrypted one as soon as all encrypted buffers are destroyed

This comment has been minimized.

@tcaputi

tcaputi Feb 23, 2017

Author Contributor

Not sure where is the best place to explain this in the code...

There is actually already a comment for this in arc_buf_destroy_impl() (where the header freeing takes place).

This comment has been minimized.

@ahrens

ahrens Feb 23, 2017

Contributor

OK cool, I haven't got to reading that part of the code yet, it's a lot to fit in my brain at once :-)

@@ -817,6 +835,8 @@ static arc_state_t *arc_l2c_only;
#define arc_need_free ARCSTAT(arcstat_need_free) /* bytes to be freed */
#define arc_sys_free ARCSTAT(arcstat_sys_free) /* target system free bytes */

/* encrypted + compressed size of entire arc */
#define arc_raw_size ARCSTAT(arcstat_raw_size)

This comment has been minimized.

@ahrens

ahrens Feb 23, 2017

Contributor

Encryption doesn't change the size of the data, so how is this different from arc_compressed_size?

This comment has been minimized.

@tcaputi

tcaputi Feb 23, 2017

Author Contributor

this just gives an extra stat for how much raw encrypted data is stored in the ARC. I thought it would be useful seeing as how this memory operates separately from compressed memory.

This comment has been minimized.

@ahrens

ahrens Feb 23, 2017

Contributor

So it's the sum of the size of the b_rabds? Could you update the comment to say that?

This comment has been minimized.

@tcaputi

tcaputi Feb 23, 2017

Author Contributor

will fix

@@ -112,20 +123,22 @@ typedef enum arc_flags
ARC_FLAG_L2_WRITING = 1 << 11, /* write in progress */
ARC_FLAG_L2_EVICTED = 1 << 12, /* evicted during I/O */
ARC_FLAG_L2_WRITE_HEAD = 1 << 13, /* head of write list */
/* encrypted on disk (may or may not be encrypted in memory) */
ARC_FLAG_ENCRYPT = 1 << 14,

This comment has been minimized.

@ahrens

ahrens Feb 23, 2017

Contributor

How about ARC_FLAG_ENCRYPTED (and HDR_ENCRYPTED()), since this is not being used as a verb.

This comment has been minimized.

@tcaputi

tcaputi Feb 23, 2017

Author Contributor

sure. will fix.

/* encryption parameters */
uint8_t b_salt[DATA_SALT_LEN];
uint8_t b_iv[DATA_IV_LEN];
uint8_t b_mac[DATA_MAC_LEN];

This comment has been minimized.

@ahrens

ahrens Feb 23, 2017

Contributor

b_mac seems different from the other encryption parameters, because it's used not when writing to L2ARC, but rather when decrypting an encrypted hdr. In that case I think the caller could pass in the BP and we could get the mac there. It might be worth explaining that we keep the mac here for convenience in this sake, so the caller doesn't have to pass in the BP again.

This comment has been minimized.

@tcaputi

tcaputi Feb 23, 2017

Author Contributor

Interesting point. it looks like the only place this is really used is in arc_hdr_decrypt(). I will add that comment. I could probably also assert that the mac in l2arc_apply_transforms() matches the one in the header.

* pool. When this is the case, we must first compress it if it is
* compressed on the main pool before we can validate the checksum.
*/
if (!HDR_COMPRESSION_ENABLED(hdr) && compress != ZIO_COMPRESS_OFF) {

This comment has been minimized.

@ahrens

ahrens Feb 23, 2017

Contributor

Is this code moved somewhere else?

This comment has been minimized.

@tcaputi

tcaputi Feb 23, 2017

Author Contributor

No, see comment below.

This comment has been minimized.

@ahrens

ahrens Feb 23, 2017

Contributor

Which comment specifically?

I think this code is now assuming that we write blocks to the L2ARC exactly as they are in the main pool, including compressed, unencrypted blocks, even if the b_pabd is uncompressed?

This comment has been minimized.

@tcaputi

tcaputi Feb 23, 2017

Author Contributor

Oh. I'm sorry I thought this was a different piece of code. You are correct, this change allows the L2ARC to write out data as it is in the main pool, removing the need for this step.

@@ -1652,15 +1701,14 @@ arc_hdr_set_compress(arc_buf_hdr_t *hdr, enum zio_compress cmp)
*/
if (!zfs_compressed_arc_enabled || HDR_GET_PSIZE(hdr) == 0) {
arc_hdr_clear_flags(hdr, ARC_FLAG_COMPRESSED_ARC);
HDR_SET_COMPRESS(hdr, ZIO_COMPRESS_OFF);

This comment has been minimized.

@ahrens

ahrens Feb 23, 2017

Contributor

Is this an intentional change to now set the hdr's compression algorithm even if it is not compressed?

This comment has been minimized.

@tcaputi

tcaputi Feb 23, 2017

Author Contributor

Yes. Previously, disabled arc compression worked by simply not storing the compression at all. This change allows us to keep track of the compression algorithm for raw blocks even if arc compression is disabled.

* This function will take a header that only has raw encrypted data in
* b_crypt_hdr.b_rabd and decrypts it into a new buffer which is stored in
* b_l1hdr.b_pabd. If designated on the header, this function will also
* decompress the data as well.

This comment has been minimized.

@ahrens

ahrens Feb 23, 2017

Contributor

remove as well (already has also)

This comment has been minimized.

@tcaputi

tcaputi Feb 23, 2017

Author Contributor

will fix

}

if (HDR_GET_COMPRESS(hdr) != ZIO_COMPRESS_OFF &&
!HDR_COMPRESSION_ENABLED(hdr)) {

This comment has been minimized.

@ahrens

ahrens Feb 23, 2017

Contributor

This could use a comment explaining this case. I think it's saying that the data on disk is compressed, and therefore the raw data in b_rabd is compressed, but we don't want this hdr to be stored compressed (i.e. b_pabd should not be compressed), so we need to decompress it.

This comment has been minimized.

@tcaputi

tcaputi Feb 23, 2017

Author Contributor

That is the correct interpretation. I will add a comment.

tmp = abd_borrow_buf(cabd, arc_hdr_size(hdr));

ret = zio_decompress_data(HDR_GET_COMPRESS(hdr),
hdr->b_l1hdr.b_pabd, tmp, HDR_GET_PSIZE(hdr),

This comment has been minimized.

@ahrens

ahrens Feb 23, 2017

Contributor

If it's possible this is used in a performance-sensitive code path, it would be good to avoid the copy from b_pabd to linearize it for decompression. This could be avoided by allocating b_pabd as linear, before decrypting into it. (And then replacing b_pabd with the decompressed scatter ABD, as you're already doing here.)

This comment has been minimized.

@tcaputi

tcaputi Feb 23, 2017

Author Contributor

My only issue with that is that then we will not be honoring zfs_abd_scatter_enabled. I do not think this is an incredibly performance-sensitive path either, since this will realistically only be run in 2 circumstances: decrypting dnode blocks and reading data that is already in the ARC due to a scrub.

Let me know what you think and I can change it if you still feel it is appropriate.

This comment has been minimized.

@ahrens

ahrens Feb 23, 2017

Contributor

We could still honor zfs_abd_scatter_enabled because the data stored in b_pabd would be scatter (or not). It's only the intermediary, short-lived buffer (decrypted but not yet uncompressed) that would be linear.

But it does seem like not a super common code path so I could go either way.

This comment has been minimized.

@tcaputi

tcaputi Feb 23, 2017

Author Contributor

I see what you mean now... This might be a bit messy to do since arc_hdr_alloc_abd() cannot currently decide how it would like its abd to look. Unfortunately the code for that is a few functions up the stack. I'm inclined to leave it for now unless it becomes a problem.

@tcaputi tcaputi force-pushed the tcaputi:master branch from 5b21838 to 96bce02 Feb 23, 2017

crawfxrd added a commit to crawfxrd/zfs that referenced this pull request Feb 24, 2017

Fix a copy-paste error for checksum flags
This will matter when flags are added for encrypted receives as part of
the ZFS encryption work (zfsonlinux#5769).

Signed-off-by: Tim Crawford <tcrawford@datto.com>

crawfxrd added a commit to crawfxrd/zfs that referenced this pull request Feb 24, 2017

Fix checksumflags assignment in cksummer
This will matter when flags are added for encrypted receives as part of
the ZFS encryption work (zfsonlinux#5769).

Signed-off-by: Tim Crawford <tcrawford@datto.com>

@crawfxrd crawfxrd referenced this pull request Feb 24, 2017

Merged

Fix checksumflags assignment in cksummer #5830

2 of 11 tasks complete

@wli5 wli5 referenced this pull request Mar 2, 2017

Merged

GZIP compression offloading with QAT accelerator #5846

4 of 11 tasks complete
@lundman

This comment has been minimized.

Copy link
Contributor

lundman commented Mar 4, 2017

Oh I should mention, in the IllumOS patches, before I included your commits yesterday, when we did a zfs send -c | zfs recv -vu we would trigger:

Mar  1 19:46:39 omnios ^Mpanic[cpu1]/thread=ffffff0261a72800:
Mar  1 19:46:39 omnios genunix: [ID 403854 kern.notice] assertion failed: (encrypted) implies (compressed), file: ../../common/fs/zfs/arc.c, line: 2675
Mar  1 19:46:39 omnios unix: [ID 100000 kern.notice]
Mar  1 19:46:39 omnios genunix: [ID 655072 kern.notice] ffffff0008d675a0 genunix:process_type+17f230 ()
Mar  1 19:46:39 omnios genunix: [ID 655072 kern.notice] ffffff0008d67630 zfs:arc_buf_alloc_impl+2ea ()
Mar  1 19:46:39 omnios genunix: [ID 655072 kern.notice] ffffff0008d676c0 zfs:arc_alloc_compressed_buf+eb ()
Mar  1 19:46:39 omnios genunix: [ID 655072 kern.notice] ffffff0008d67700 zfs:arc_loan_compressed_buf+38 ()
Mar  1 19:46:39 omnios genunix: [ID 655072 kern.notice] ffffff0008d677a0 zfs:receive_read_record+e0 ()
Mar  1 19:46:39 omnios genunix: [ID 655072 kern.notice] ffffff0008d67990 zfs:dmu_recv_stream+399 ()
Mar  1 19:46:39 omnios genunix: [ID 655072 kern.notice] ffffff0008d67bd0 zfs:zfs_ioc_recv+20e ()
Mar  1 19:46:39 omnios genunix: [ID 655072 kern.notice] ffffff0008d67c70 zfs:zfsdev_ioctl+50f ()
Mar  1 19:46:39 omnios genunix: [ID 655072 kern.notice] ffffff0008d67cb0 genunix:cdev_ioctl+39 ()
Mar  1 19:46:39 omnios genunix: [ID 655072 kern.notice] ffffff0008d67d00 specfs:spec_ioctl+60 ()
Mar  1 19:46:39 omnios genunix: [ID 655072 kern.notice] ffffff0008d67d90 genunix:fop_ioctl+55 ()
Mar  1 19:46:39 omnios genunix: [ID 655072 kern.notice] ffffff0008d67eb0 genunix:ioctl+9b ()
Mar  1 19:46:39 omnios genunix: [ID 655072 kern.notice] ffffff0008d67f00 unix:brand_sys_sysenter+2b7 ()

But it could be an old bug now, but if it is trivial for you to set lz4, and test zfs send -c to rule it out...

@rottegift

This comment has been minimized.

Copy link

rottegift commented Mar 4, 2017

Further to @lundman 's comment, we have openzfsonosx/zfs#557

@pashford

This comment has been minimized.

Copy link
Contributor

pashford commented Jan 12, 2018

@tcaputi

Since ZFS is CDDL and the Linux kernel is GPL, we can't use the Linux kernel's hardware interaction functions and would have to reimplement them ourselves, probably by porting them from Illumos.

This is incorrect. ZoL calls a large number of Linux routines, sometimes directly and sometimes through the SPL, to perform specific functions (device access, memory request, etc.), This is no different.

you can pass the keys on stdin.

That's not all that good of a workaround. If there are 3 Encryption Roots on a system, they'll need to be individually mounted, each with it's own key generation/retrieval script. It will work, but it's cumbersome.

@ptx0

This comment has been minimized.

Copy link

ptx0 commented Jan 12, 2018

@pashford well, certain functions in the Linux kernel are explicitly marked GPL only. the crypto code is indeed, one such example...

@tcaputi

This comment has been minimized.

Copy link
Contributor Author

tcaputi commented Jan 12, 2018

@pashford

This is incorrect. ZoL calls a large number of Linux routines, sometimes directly and sometimes through the SPL, to perform specific functions (device access, memory request, etc.), This is no different.

The TPM code and crypto code are exported with the EXPORT_SYMBOL_GPL() macro so we cannot use them without changing ZFS's declared license. The SPL technically uses a GPL license, but using these functions is still very much a legal gray area, so the SPL doesn't call any functions exported with this macro.

That's not all that good of a workaround. If there are 3 Encryption Roots on a system, they'll need to be individually mounted, each with it's own key generation/retrieval script. It will work, but it's cumbersome.

I know it's not ideal but it is just something you can do in the meantime before it gets implemented.

@teknoman117

This comment has been minimized.

Copy link
Contributor

teknoman117 commented Jan 31, 2018

Are there any benchmarks or guides for performance for encryption? I built both the master branches of both spl and zfs and with the default encryption options (just encryption=on) and I'm getting ~1350 MB/s on a 16 core CPU (AMD Threadripper 1950x) (from a ramdisk). top shows that the whole CPU is loaded. I can get ~2700 MB/s with a single core via cryptsetup for aes-xts 512b so I'm curious if I've done something wrong.

update: some info:
Gentoo Linux w/ Linux Kernel 4.14.7
aes-128-ccm: ~1600 MiB/s
aes-128-gcm: ~380 MiB/s

Do the hardware AES instructions get used on AMD cpus? AMD cpus have have AES-NI since bulldozer.

@tcaputi

This comment has been minimized.

Copy link
Contributor Author

tcaputi commented Jan 31, 2018

@teknoman117

This comment has been minimized.

Copy link
Contributor

teknoman117 commented Jan 31, 2018

@tcaputi

This comment has been minimized.

Copy link
Contributor Author

tcaputi commented Jan 31, 2018

@fmikker

This comment has been minimized.

Copy link

fmikker commented Mar 16, 2018

@tcaputi Any news on detecting AES-NI on AMD Threadripper?

@rincebrain

This comment has been minimized.

Copy link
Contributor

rincebrain commented Mar 16, 2018

@fmikker I think you and #7103 should be friends.

@kmatt

This comment has been minimized.

Copy link

kmatt commented Mar 19, 2018

Is this feature included in 0.7.6, or scheduled for a later release?

v0.7.6 on CentOS from the yum repo does not seem to have the encryption feature enabled.

@kpande

This comment has been minimized.

Copy link
Member

kpande commented Mar 19, 2018

no.. see the release notes. scheduled for 0.8.0.

@teknoman117

This comment has been minimized.

Copy link
Contributor

teknoman117 commented Mar 21, 2018

@tcaputi regarding #7103, does the generic C implementation work on x86-64? When I forced it to enabled on x86-64 (removed the ifdef's) I couldn't load a dataset I made with the amd64 implementation.

@tcaputi

This comment has been minimized.

Copy link
Contributor Author

tcaputi commented Mar 21, 2018

@teknoman117 It definitely should. All implementations were tested against the NIST test vectors before they were merged, and even more concerning all are taken directly from the Illumos KCF. Can you provide a diff for testing this?

@kpande

This comment has been minimized.

Copy link
Member

kpande commented Mar 21, 2018

@tcaputi the ubuntu VM I gave you access to is running with host-passthrough on Threadripper, you can use that if needed.

@tcaputi

This comment has been minimized.

Copy link
Contributor Author

tcaputi commented Mar 21, 2018

@kpande That will still use the accelerated x86-64 accelerated code (but not the AES-NI code yet). I should be able to run this locally on my own VMs, but I'd like to use @teknoman117 's diff for consistency.

@teknoman117

This comment has been minimized.

Copy link
Contributor

teknoman117 commented Mar 21, 2018

@tcaputi patch against master - https://gist.github.com/Teknoman117/01eefe4497b74baff81634a8c990e90e

branch on my fork - https://github.com/Teknoman117/zfs/tree/cimpl

Tested on an AMD Threadripper 1950X. If you create a pool with an encrypted datasize with zfs built from zfsonlinux/zfs master, export it, swap all the modules for zfs built without the aes_amd64.S implementation, 'zfs load-key' seems to fail when provided the correct passphrase. I don't have any Intel systems available to check this on. This is also with gcc 7.3.0 and linux kernel 4.15.9.

@tcaputi

This comment has been minimized.

Copy link
Contributor Author

tcaputi commented Mar 21, 2018

Ok. I'll try to take a look tomorrow.

@teknoman117

This comment has been minimized.

Copy link
Contributor

teknoman117 commented Mar 28, 2018

@tcaputi

This comment has been minimized.

Copy link
Contributor Author

tcaputi commented Mar 28, 2018

I'm sorry. I must have deleteed this off my todo list on accident. I will look at it today.

@tcaputi

This comment has been minimized.

Copy link
Contributor Author

tcaputi commented Mar 28, 2018

@teknoman117 I can reproduce the issue locally here. I asked someone on Illumos to try it over there to see if this is a problem with the port or the original implementation. I hope to have an answer on that by the end of today (it takes a while to rebuild Illumos).

@sempervictus

This comment has been minimized.

Copy link
Contributor

sempervictus commented Mar 28, 2018

Ouch. Question I have is if you can force key load thru the asm, but actual decryption without it, is your data readable? Might be an issue for portability now that everyone wants an AMD system, or did till the last mess.

@tcaputi

This comment has been minimized.

Copy link
Contributor Author

tcaputi commented Mar 28, 2018

@sempervictus
ZFS currently has a generic implementation, an accelerated x86-64 implementation, and an even further accelerated AESNI implementation. Normally even AMD chips would use the x86-64 accelerated version of the code (which still seems to be compatible). It is only the generic implementation that is causing issues which should only matter for other architectures and (as I believe @teknoman117 is doing here) for testing purposes.

@tcaputi

This comment has been minimized.

Copy link
Contributor Author

tcaputi commented Mar 29, 2018

@teknoman117 The issue does not seem to reproduce on Illumos, so we must have changed something at some point. I hope that I can get to the bottom of it tomorrow.

@tcaputi

This comment has been minimized.

Copy link
Contributor Author

tcaputi commented Mar 30, 2018

@teknoman117 I figured it out. The problem is actually just with your patch. You need this in the patch as well:

@@ -109,12 +111,14 @@ static int intel_aes_instructions_present(void);
 #define	rijndael_key_setup_enc_raw	rijndael_key_setup_enc
 #endif	/* __amd64 */
 
-#if defined(_LITTLE_ENDIAN) && !defined(__amd64)
+//#if defined(_LITTLE_ENDIAN) && !defined(__amd64)
+#if 1
 #define	AES_BYTE_SWAP
 #endif
 

Let me know if I'm missing something or if you need anything else.

@teknoman117

This comment has been minimized.

Copy link
Contributor

teknoman117 commented Apr 1, 2018

@tcaputi

My assumption to leave that guard in was based on this comment in aes_impl.c
/* * For _LITTLE_ENDIAN machines (except AMD64), reverse every * 4 bytes in the key. On _BIG_ENDIAN and AMD64, copy the key * without reversing bytes. * For AMD64, do not byte swap for aes_setupkeys(). * * SPARCv8/v9 uses a key schedule array with 64-bit elements. * X86/AMD64 uses a key schedule array with 32-bit elements. */

That being said, this does indeed seem to let the generic c implementation load a dataset encrypted using the the aesni or x86_64 optimized method. However, removing the guard completely breaks the accelerated methods. I suppose this means that for x86_64 the byte swap only is only needed for the generic implementation.

@jbrodriguez

This comment has been minimized.

Copy link

jbrodriguez commented Apr 1, 2018

My question is .. why is this not stable yet ? I need to run zfs-linux-git on Arch Linux ... sort of unofficially .. is it testing what's keeping this from stable ?

@grantwwu

This comment has been minimized.

Copy link

grantwwu commented Apr 1, 2018

@jbrodriguez It's not stable yet because it's a Hard Problem, and because the developers have run into some issues, most notably #6845.

I don't think you really ought to complain unless you're paying the OpenZFS developers... and if you are you probably have internal channels for that :)

@jbrodriguez

This comment has been minimized.

Copy link

jbrodriguez commented Apr 1, 2018

Thanks, I don't know how much you officially speak for @grantwwu, I was just asking a question. Nevertheless, thanks for pointing out the issue, I'll subscribe to it.

@tcaputi

This comment has been minimized.

Copy link
Contributor Author

tcaputi commented Apr 1, 2018

@teknoman117

I suppose this means that for x86_64 the byte swap only is only needed for the generic implementation.

Yeah. I think that comment was meant to be taken in the context of the entire file. In other words, I think the #define was written assuming that the accelerated implementations wouldn't be forcefully removed from under it.

@jbrodriguez

My question is .. why is this not stable yet ? I need to run zfs-linux-git on Arch Linux ... sort of unofficially .. is it testing what's keeping this from stable ?

The 3 big reasons:

  1. As @grantwwu alluded, this patch was 26000 lines of code (~15000 of which were completely new and not ported from elsewhere) and required modifications to virtually all parts of ZFS (except possibly the vdev and metaslab layers). We have hit issues with this patch since it was merged, most notably problems with raw sends, which is to be expected for a patch this large. One of these problems even required an on-disk format update to fix. That said, I have not heard of any new problems in the past 3 weeks or so, so I'm hoping that's a good sign.
  2. We haven't released a major version of ZFS (0.8.0) since encryption was merged. That is still pending on a few other features which are slated to be released in the new major version. ZFS encryption will be included in the 0.8.0 release.
  3. I have another patch coming (hopefully as a PR early next week if I can resolve the one last issue) which will add support for zfs recv -o / -x with encryption properties, which is sorely needed to allow people to convert their existing pools easily. Right now you can't do a zfs send -R | zfs recv to convert your pool to encrypted because zfs send -R includes all properties and one of them happens to be encryption=off. Since you can't tell zfs to ignore encryption properties at the moment, you can't use this to transition your datasets.
@jbrodriguez

This comment has been minimized.

Copy link

jbrodriguez commented Apr 1, 2018

Thanks @tcaputi, the major version comment is spot on.

I'm looking to start from scratch, so I may have overlooked some considerations.

Can't praise enough the work you've done here, kudos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.