Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SHA256 QAT acceleration #7295

Merged
merged 1 commit into from
Mar 15, 2018
Merged

SHA256 QAT acceleration #7295

merged 1 commit into from
Mar 15, 2018

Conversation

tcaputi
Copy link
Contributor

@tcaputi tcaputi commented Mar 12, 2018

This patch enables acceleration of SHA256 and SHA512
checksums using Intel Quick Assist Technology.

Signed-off-by: Tom Caputi tcaputi@datto.com

How Has This Been Tested?

Tested on an Ubuntu 16.04 machine with a DH895XCC QAT add-in card. I will have performance numbers soon.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • All commit messages are properly formatted and contain Signed-off-by.
  • Change has been approved by a ZFS on Linux member.

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logistically I can see why you opted to put the qat_checksum() functionality in to qat_crypt.c since it makes it easy to share the instance handles. However, it's frankly not very intuitive.

It looks like it would be relatively straight forward to move the common instance management code in to qat.c and then provide a helper function to claim an instance. Then you could split the new checksum functionality in to qat_checksum.c. Or did you already try this and run in to complications?

module/zfs/qat.c Outdated
@@ -99,4 +102,7 @@ qat_fini(void)
qat_dc_fini();
}

module_param(zfs_qat_disable, int, 0644);
MODULE_PARM_DESC(zfs_qat_disable, "Disable QAT acceleration");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any value in making this more granual (i.e discard checksum, crypt, or compress)? Or can this already be effectively controlled with the QAT config file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be done via the config files, but I agree it would be nice to be able to disable each one (for instance if you had another application that you wanted to use the compression units). I will make the change.

if (status != CPA_STATUS_SUCCESS) {
/* don't count CCM as a failure since it's not supported */
if (zio_crypt_table[crypt].ci_crypt_type == ZC_TYPE_GCM)
QAT_STAT_BUMP(crypt_fails);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's either put this little fix in a seperate PR, or at least call it out in the commit comment.

SHA2_CTX ctx;
zio_cksum_t tmp;

if (qat_crypt_use_accel(size)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a qat_checksum_use_accel wrapper to make this interface less aukward.

@@ -192,7 +193,49 @@ init_cy_session_ctx(qat_encrypt_dir_t dir, CpaInstanceHandle inst_handle,
}

static CpaStatus
init_cy_buffer_lists(CpaInstanceHandle inst_handle, uint32_t nr_bufs,
qat_init_cksum_session_ctx(CpaInstanceHandle inst_handle,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: qat_init_cksum_session_ctx -> qat_init_checksum_session_ctx would be consistent with the other function names.

@tcaputi
Copy link
Contributor Author

tcaputi commented Mar 13, 2018

It looks like it would be relatively straight forward to move the common instance management code in to qat.c and then provide a helper function to claim an instance. Then you could split the new checksum functionality in to qat_checksum.c. Or did you already try this and run in to complications?

The only thing about this was that it felt awkward that more than half of the functions in qat.c would only pertain to qat_checksum.c and qat_crypt.c, but not qat_compress.c. If I was trying to put this into OOP terms, the "base class" qat.c would contain functionality that could not be used for all of its "inheriting classes", if that makes any sense.

If you still think its appropriate, I can split it out as you requested.

@tcaputi
Copy link
Contributor Author

tcaputi commented Mar 13, 2018

@wli5 has informed me that the SHA512 implementation needs to be removed, since technically we actually use SHA512/256, which uses a different IV than standard SHA512 and is not supported by QAT. I will remove support for it tomorrow and do a complete test to verify compatibility.

@rlaager
Copy link
Member

rlaager commented Mar 13, 2018

Is there a QAT card in a machine running as part of the test suite? If not, has anyone asked Intel if they'd be willing to donate a card for that?

@tcaputi
Copy link
Contributor Author

tcaputi commented Mar 13, 2018

@rlaager I think my company would be happy to purchase / donate a card to the cause. The only issue would be that I believe all the testing infrastructure runs in AWS, and I don't think we can just pop the card into the cloud :). I definitely plan to completely verify this patch and the performance benefits before it gets merged. Perhaps I should have put a WIP tag on it when I posted it. I Any thoughts on testing going forward @behlendorf?

@tcaputi
Copy link
Contributor Author

tcaputi commented Mar 13, 2018

Some performance testing on my machine (single spinning disk pool running yes | pv > /pool/test/yes.txt) showed CPU utilization reduction of 28% when using QAT checksumming. Disk bandwidth also went from 191MB/s to 204 MB/s.

@behlendorf
Copy link
Contributor

The only thing about this was that it felt awkward that more than half of the functions in qat.c would only pertain to qat_checksum.c and qat_crypt.c, but not qat_compress.c. If I was trying to put this into OOP terms, the "base class" qat.c would contain functionality that could not be used for all of its "inheriting classes", if that makes any sense.

I can understand that argument. Or you could view the qat.c code as a library of which you're only using a subset of its functionality. Alternately, I'd be OK with the proposed organization if we renamed qat_crypt.c to something like qat_inst.c to broaden its scope. Otherwise, it feels to me like we're hiding this accelerted checksum code in the encryption code.

As for testing @tcaputi is right, since all the automated testing is all in AWS it's a bit tricky to add the QAT hardware for testing. We could probably add/modify a builder to verify that everything builds when the QAT source is enabled, but actually testing with real hardware is a bit more difficult. Also since we don't expect this code to change much once all this functionality is working it's probably not critical and we can rely on manual testing for now.

@codecov
Copy link

codecov bot commented Mar 14, 2018

Codecov Report

Merging #7295 into master will increase coverage by 0.03%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7295      +/-   ##
==========================================
+ Coverage   76.36%    76.4%   +0.03%     
==========================================
  Files         328      328              
  Lines      104023   104008      -15     
==========================================
+ Hits        79439    79465      +26     
+ Misses      24584    24543      -41
Flag Coverage Δ
#kernel 76.44% <ø> (+0.38%) ⬆️
#user 65.67% <ø> (+0.22%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update de4f8d5...d3cebb2. Read the comment docs.

ret = qat_checksum(ZIO_CHECKSUM_SHA256, buf, size, &tmp);
abd_return_buf(abd, buf, size);
if (ret == CPA_STATUS_SUCCESS)
goto bswap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for SHA512, we should just fall back to software, instead of jumping to bswap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean here. The goto bswap here is meant to handle a strange implementation detail that ZFS always stores SHA256 checksums in big endian order. For SHA512 we don't do this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we jump to bswap the software implementation will be skipped for SHA512.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only do the jump in the event that QAT checksumming is enabled and if the QAT implementation worked. Is that correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im also a little confused with what you mean by "software implementation of SHA512" I removed the QAT implementation of SHA512 since ZFS actually uses SHA512/256, as you pointed out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for confusion.. I just relook at the code, it is good.
I have thought SHA512 shared the same function, I just noticed there is another sha512 function which is not changed and will be always in "software" mode.
Just ignore this comment. thanks!

flat_src_buf = flat_src_buf_array;
while (bytes_left > 0) {
in_pages[page_num] = qat_mem_to_page(data);
flat_src_buf->pData = kmap(in_pages[page_num]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a problem here when the src buffer is in 4K page boundary, the flat_src_buf->pData need to add the offset of the src address in the page frame. same problem is also in compression code, I will submit a separate PR to fix it for compression, can you please fix it for checksum and encryption code in this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or does it make sense if you can fix the compression code also in this PR? It will be great if we can do it together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add it to this PR. Can you give me an example of how to do this so I'm sure I'm doing it right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the example:
flat_buf_src->pData = kmap(in_page) + ((long)data & ~PAGE_MASK);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would an adjustment need to be made to flat_src_buf->dataLenInBytes as well? I'm assuming this applies to encryption, checksumming, and compression?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reminder, yes, the dataLenInBytes need to be adjusted as well, and it applies to encryption, checksum and compression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix for this has been pushed. I will make a separate PR for fixing compression later today.

This patch enables acceleration of SHA256 checksums using Intel
Quick Assist Technology. This patch also fixes up and refactors
some of the code from QAT encryption to make the behavior
consistent.

Signed-off-by: Chengfeix Zhu <chengfeix.zhu@intel.com>
Signed-off-by: Weigang Li <weigang.li@intel.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
@tcaputi tcaputi changed the title SHA256 / SHA512 QAT acceleration SHA256 QAT acceleration Mar 14, 2018
@behlendorf behlendorf merged commit 3874220 into openzfs:master Mar 15, 2018
@wli5
Copy link
Contributor

wli5 commented Mar 19, 2018

@behlendorf @tcaputi @rlaager regarding to the auto test for QAT, we plan to setup a daily test environment in our lab on a machine with QAT plug-in, hopefully this way we can catch any issue related to QAT timingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants