-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
36% loss of performance with AES #1088
Comments
Thanks @PassMark, Yeah, this became a problem recently. GCC and Clang began removing code when the buffers were the same. I think it was due to alias violations. I don't think it was due to C++ language violations. Also see Issue 1010.
It is what it is. The compilers changed so we had to change with them. The other alternative - broken encryption and decryption due to GCC removing code - was even worse. We could not leave the broken behavior and add a README note because no one reads them. If interested, this change happened because of incorrect behavior for CFB, OFB and CTR modes. They were the modes producing incorrect results on occasion. |
Thanks again @PassMark, I updated the library documentation at Commit 9dbb3c47aa8b. For runtime, I don't think failing or throwing an exception is a good idea. That turns code that works into code that does not work. I can put an assertion in
It lights up some of the self tests, like DES and SHARK:
What do you think? |
Thanks for the follow up. Assertion and documentation should be OK. Just having this issue indexed by Google should help a lot. We spend a couple of hours checking if it was a known problem before starting to look at the code. |
Done, see Commit e546fb74d711. Thanks for the report. |
In retrospect I should have put the other comment in this thread. ##1103 This assert is triggering with AES running in GCM with the following code, and similarly with CFB_CipherTemplate.
Given the duplicate buffer is being created by crytpopp code this seems like an actual bug rather than something which should be just flagged by an assert.
This is the callstack
|
@PassMark, @dgm3333, @clementmartin971 I checked in some code to avoid the extra buffer and memcpy's. It is sitting on the The code tested Ok for me on x86_64. That's where HIGHT bock cipher had problems. ARMv7 had problems on Linux, and I it also tested Ok. Can you guys test it on your platforms, please? You can fetch the code with:
We announced the testing branch on the mailing list at strciphr.cpp updates. If I don't get any complaints over the next week or so, I will merge the changes. Thanks in advance. |
I merged the changes into master last night. You should test master now. |
It turns out we went down a rabbit hole when we added the volatile cast gyrations in an attempt to change the compiler behavior. We are seeing the same failures from AES, Rabbit, HIGHT, HC-128 and HC-256 with and without the gyrations. We were able to work out the problems with Rabbit, HIGHT, HC-128 and HC-256. See GH #1231 and GH #1234. We are also not able to successfully cut-in Cryptogams AES on ARMv7, so it is now disabled. See GH #1236. Since the volatile casts were not a solution, we are backing it out along with associated comments.
Going from V7 to V8.6 resulted in a 36% loss of performance with AES.
With an Intel Core i7-8700K, running 12 threads, performance was:
This function was used:
CFB_Mode<AES>::Encryption cfbEncryption(AESKey, AESKey.size(), AESiv); cfbEncryption.ProcessData(AESData, AESData, AESData.size());
We eventually tracked back the performance change to this code change
71a812e#diff-2867583d2009b9c826fbac3a19ae0f72a110684052acd1800ab505e0d75b47ad
The difference was not so noticeable with 1 thread. So the built in benchmarks don't highlight the problem very well. But with multiple threads the performance difference is dramatic. We presume this is because the code change above added a lot of overhead for memory buffer copying when the input and output buffer are the same. We even observed negative performance as more threads were added. (So 8 threads was significantly faster than 12 threads on a CPU with 12 virtual cores).
The solution is fairly easy. Don't use the same input and output buffers (AESData). But I think using the same value, was suggested practice in the past. Lots of web sites still suggest this. So now the problem is that there are dozens of public code examples where the buffers are the same and I suspect the performance flaw is now in 100s of apps.
Once different input and output buffers are used, performance was,
V8.6 of Crypto++: 5055 MB/sec, or a 64% performance improvement.
Might have been better just to immediately fail in
ProcessData()
if parameters are the same, then the user would be more aware of the problem.The text was updated successfully, but these errors were encountered: