Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

36% loss of performance with AES #1088

Closed
PassMark opened this issue Nov 22, 2021 · 7 comments
Closed

36% loss of performance with AES #1088

PassMark opened this issue Nov 22, 2021 · 7 comments

Comments

@PassMark
Copy link
Contributor

PassMark commented Nov 22, 2021

Going from V7 to V8.6 resulted in a 36% loss of performance with AES.

With an Intel Core i7-8700K, running 12 threads, performance was:

  • V7 of Crypto++: 4830 MB/sec
  • V8.6 of Crypto++: 3078 MB/sec

This function was used:

CFB_Mode<AES>::Encryption cfbEncryption(AESKey, AESKey.size(), AESiv);
cfbEncryption.ProcessData(AESData, AESData, AESData.size());

We eventually tracked back the performance change to this code change
71a812e#diff-2867583d2009b9c826fbac3a19ae0f72a110684052acd1800ab505e0d75b47ad

The difference was not so noticeable with 1 thread. So the built in benchmarks don't highlight the problem very well. But with multiple threads the performance difference is dramatic. We presume this is because the code change above added a lot of overhead for memory buffer copying when the input and output buffer are the same. We even observed negative performance as more threads were added. (So 8 threads was significantly faster than 12 threads on a CPU with 12 virtual cores).

The solution is fairly easy. Don't use the same input and output buffers (AESData). But I think using the same value, was suggested practice in the past. Lots of web sites still suggest this. So now the problem is that there are dozens of public code examples where the buffers are the same and I suspect the performance flaw is now in 100s of apps.

Once different input and output buffers are used, performance was,
V8.6 of Crypto++: 5055 MB/sec, or a 64% performance improvement.

Might have been better just to immediately fail in ProcessData() if parameters are the same, then the user would be more aware of the problem.

@noloader
Copy link
Collaborator

noloader commented Nov 28, 2021

Thanks @PassMark,

Yeah, this became a problem recently. GCC and Clang began removing code when the buffers were the same. I think it was due to alias violations. I don't think it was due to C++ language violations. Also see Issue 1010.

The solution is fairly easy. Don't use the same input and output buffers (AESData). But I think using the same value, was suggested practice in the past. Lots of web sites still suggest this. So now the problem is that there are dozens of public code examples where the buffers are the same and I suspect the performance flaw is now in 100s of apps.

It is what it is. The compilers changed so we had to change with them.

The other alternative - broken encryption and decryption due to GCC removing code - was even worse. We could not leave the broken behavior and add a README note because no one reads them.

If interested, this change happened because of incorrect behavior for CFB, OFB and CTR modes. They were the modes producing incorrect results on occasion.

@noloader
Copy link
Collaborator

noloader commented Nov 28, 2021

Thanks again @PassMark,

I updated the library documentation at Commit 9dbb3c47aa8b. For runtime, I don't think failing or throwing an exception is a good idea. That turns code that works into code that does not work.

I can put an assertion in ProcessData to warn when the buffers are the same:

$ git diff strciphr.cpp
diff --git a/strciphr.cpp b/strciphr.cpp
index f73a05e1..d2e5feb3 100644
--- a/strciphr.cpp
+++ b/strciphr.cpp
@@ -81,6 +81,10 @@ void AdditiveCipherTemplate<S>::ProcessData(byte *outString, const byte *inStrin
        CRYPTOPP_ASSERT(outString); CRYPTOPP_ASSERT(inString);
        CRYPTOPP_ASSERT(length % this->MandatoryBlockSize() == 0);
 
+       // If this assert fires, then outString == inString. You could experience a^M
+       // performance hit. Also see https://github.com/weidai11/cryptopp/issues/1088'^M
+       CRYPTOPP_ASSERT(outString != inString);^M
+^M
        PolicyInterface &policy = this->AccessPolicy();
        unsigned int bytesPerIteration = policy.GetBytesPerIteration();
 
@@ -225,6 +229,10 @@ void CFB_CipherTemplate<BASE>::ProcessData(byte *outString, const byte *inString
        CRYPTOPP_ASSERT(outString); CRYPTOPP_ASSERT(inString);
        CRYPTOPP_ASSERT(length % this->MandatoryBlockSize() == 0);
 
+       // If this assert fires, then outString == inString. You could experience a^M
+       // performance hit. Also see https://github.com/weidai11/cryptopp/issues/1088'^M
+       CRYPTOPP_ASSERT(outString != inString);^M
+^M
        PolicyInterface &policy = this->AccessPolicy();
        unsigned int bytesPerIteration = policy.GetBytesPerIteration();
        byte *reg = policy.GetRegisterBegin();

It lights up some of the self tests, like DES and SHARK:

SHARK validation suite running...

passed:  Algorithm key lengths
Assertion failed: strciphr.cpp(234): ProcessData
Assertion failed: strciphr.cpp(234): ProcessData
passed   00000000000000000000000000000000   0000000000000000   214BCF4E7716420A
Assertion failed: strciphr.cpp(234): ProcessData
Assertion failed: strciphr.cpp(234): ProcessData
passed   000102030405060708090A0B0C0D0E0F   0000000000000000   C76C696289898137
Assertion failed: strciphr.cpp(234): ProcessData
Assertion failed: strciphr.cpp(234): ProcessData
passed   000102030405060708090A0B0C0D0E0F   C76C696289898137   077A4A59FAEEEA4D
Assertion failed: strciphr.cpp(234): ProcessData
Assertion failed: strciphr.cpp(234): ProcessData
passed   915F4619BE41B2516355A50110A9CE91   21A5DBEE154B8F6D   6FF33B98F448E95A
Assertion failed: strciphr.cpp(234): ProcessData
Assertion failed: strciphr.cpp(234): ProcessData
passed   783348E75AEB0F2FD7B169BB8DC16787   F7C013AC5B2B8952   E5E554ABE9CED2D2
Assertion failed: strciphr.cpp(234): ProcessData
Assertion failed: strciphr.cpp(234): ProcessData
passed   DC49DB1375A5584F6485B413B5F12BAF   2F42B3B70369FC92   9AE068313F343A7A
Assertion failed: strciphr.cpp(234): ProcessData
Assertion failed: strciphr.cpp(234): ProcessData
passed   5269F149D41BA0152497574D7F153125   65C178B284D197CC   D3F111A282F17F29

What do you think?

noloader added a commit that referenced this issue Nov 28, 2021
@PassMark
Copy link
Contributor Author

Thanks for the follow up.

Assertion and documentation should be OK.

Just having this issue indexed by Google should help a lot. We spend a couple of hours checking if it was a known problem before starting to look at the code.

@noloader
Copy link
Collaborator

@PassMark,

Assertion and documentation should be OK.

Done, see Commit e546fb74d711.

Thanks for the report.

@dgm3333
Copy link

dgm3333 commented Feb 2, 2022

In retrospect I should have put the other comment in this thread. ##1103

This assert is triggering with AES running in GCM with the following code, and similarly with CFB_CipherTemplate.
However in the latter case I can't actually see how this assert makes any sense to be present as it would appear from the call stack that the call to CFB_CipherTemplate in cryptlib.h will guarantee triggering it (although I'm very new to cryptopp so feel free to correct me if I'm reading it wrongly)

		{ProcessData(inoutString, inoutString, length);}

Given the duplicate buffer is being created by crytpopp code this seems like an actual bug rather than something which should be just flagged by an assert.

int encryptFileGCM(std::string_view keyStr, std::string_view fileToRead, std::string_view fileToWrite) {

    const int TAG_SIZE = 12;

    std::ifstream in{ fileToRead.data(), std::ios::binary };
    std::ofstream out{ fileToWrite.data(), std::ios::binary };

    CryptoPP::byte key[32];
    for (int c = 0; c < 32; c++)
        key[c] = keyStr[c];

    AutoSeededRandomPool prng;
    CryptoPP::byte iv[AES::BLOCKSIZE];                   // each file should have a unique initialization vector to decrease risk of attackers inferring data structure
    prng.GenerateBlock(iv, sizeof(iv));                   // This doesn't need to be secret, so can be saved with each file

    out.write((char*)iv, AES::BLOCKSIZE);          //write the iv to the start of the file

    CryptoPP::GCM<CryptoPP::AES>::Encryption cipher{};
    cipher.SetKeyWithIV(key, sizeof(key), iv);

    CryptoPP::FileSource{ in, true,
        new CryptoPP::AuthenticatedEncryptionFilter(cipher,
            new CryptoPP::FileSink(out), false, TAG_SIZE
        )
    };

    return 0;
}

This is the callstack

 	KernelBase.dll!00007ffeea2090f2()	Unknown
>	ABCInsights.exe!CryptoPP::AdditiveCipherTemplate<CryptoPP::AbstractPolicyHolder<CryptoPP::AdditiveCipherAbstractPolicy,CryptoPP::CTR_ModePolicy>>::ProcessData(unsigned char * outString, const unsigned char * inString, unsigned __int64 length) Line 98	C++
 	ABCInsights.exe!CryptoPP::AuthenticatedSymmetricCipherBase::ProcessData(unsigned char * outString, const unsigned char * inString, unsigned __int64 length) Line 134	C++
 	ABCInsights.exe!CryptoPP::StreamTransformation::ProcessString(unsigned char * inoutString, unsigned __int64 length) Line 1061	C++
 	ABCInsights.exe!CryptoPP::StreamTransformationFilter::NextPutModifiable(unsigned char * inString, unsigned __int64 length) Line 696	C++
 	ABCInsights.exe!CryptoPP::FilterWithBufferedInput::NextPutMaybeModifiable(unsigned char * inString, unsigned __int64 length, bool modifiable) Line 422	C++
 	ABCInsights.exe!CryptoPP::FilterWithBufferedInput::PutMaybeModifiable(unsigned char * inString, unsigned __int64 length, int messageEnd, bool blocking, bool modifiable) Line 389	C++
 	ABCInsights.exe!CryptoPP::FilterWithBufferedInput::PutModifiable2(unsigned char * inString, unsigned __int64 length, int messageEnd, bool blocking) Line 367	C++
 	ABCInsights.exe!CryptoPP::BufferedTransformation::ChannelPutModifiable2(const std::string & channel, unsigned char * inString, unsigned __int64 length, int messageEnd, bool blocking) Line 479	C++
 	ABCInsights.exe!CryptoPP::FileStore::TransferTo2(CryptoPP::BufferedTransformation & target, unsigned __int64 & transferBytes, const std::string & channel, bool blocking) Line 141	C++
 	ABCInsights.exe!CryptoPP::BufferedTransformation::TransferMessagesTo2(CryptoPP::BufferedTransformation & target, unsigned int & messageCount, const std::string & channel, bool blocking) Line 656	C++
 	ABCInsights.exe!CryptoPP::BufferedTransformation::TransferAllTo2(CryptoPP::BufferedTransformation & target, const std::string & channel, bool blocking) Line 702	C++
 	ABCInsights.exe!CryptoPP::SourceTemplate<CryptoPP::FileStore>::PumpAll2(bool blocking) Line 1444	C++
 	ABCInsights.exe!CryptoPP::Source::PumpAll() Line 1383	C++
 	ABCInsights.exe!CryptoPP::Source::SourceInitialize(bool pumpAll, const CryptoPP::NameValuePairs & parameters) Line 1421	C++
 	ABCInsights.exe!CryptoPP::FileSource::FileSource(std::basic_istream<char,std::char_traits<char>> & in, bool pumpAll, CryptoPP::BufferedTransformation * attachment) Line 102	C++
 	ABCInsights.exe!encryptFileGCM(std::basic_string_view<char,std::char_traits<char>> keyStr, std::basic_string_view<char,std::char_traits<char>> fileToRead, std::basic_string_view<char,std::char_traits<char>> fileToWrite) Line 197	C++
 	ABCInsights.exe!drawS1SRMonitoringWindow(globalSettings & gS, bool * p_open) Line 778	C++
 	ABCInsights.exe!main(int __formal, char * * __formal) Line 363	C++
 	[External Code]	

@noloader
Copy link
Collaborator

noloader commented Feb 8, 2022

@PassMark, @dgm3333, @clementmartin971

I checked in some code to avoid the extra buffer and memcpy's. It is sitting on the strciphr branch.

The code tested Ok for me on x86_64. That's where HIGHT bock cipher had problems. ARMv7 had problems on Linux, and I it also tested Ok. Can you guys test it on your platforms, please?

You can fetch the code with:

$ git fetch origin
$ git checkout strciphr -f && git pull

We announced the testing branch on the mailing list at strciphr.cpp updates. If I don't get any complaints over the next week or so, I will merge the changes.

Thanks in advance.

noloader added a commit that referenced this issue Feb 14, 2022
This commit attempts to restore performance while taming the optimizer.

Also see GH #683, GH #1010, GH #1088, GH #1103.
@noloader
Copy link
Collaborator

I merged the changes into master last night. You should test master now.

noloader referenced this issue Sep 30, 2023
It turns out we went down a rabbit hole when we added the volatile cast gyrations in an attempt to change the compiler behavior. We are seeing the same failures from AES, Rabbit, HIGHT, HC-128 and HC-256 with and without the gyrations.
We were able to work out the problems with Rabbit, HIGHT, HC-128 and HC-256. See GH #1231 and GH #1234.
We are also not able to successfully cut-in Cryptogams AES on ARMv7, so it is now disabled. See GH #1236.
Since the volatile casts were not a solution, we are backing it out along with associated comments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants