Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ARM AES asm implementation from Cryptogams #683

Closed
noloader opened this issue Jul 11, 2018 · 2 comments
Closed

Add ARM AES asm implementation from Cryptogams #683

noloader opened this issue Jul 11, 2018 · 2 comments

Comments

@noloader
Copy link
Collaborator

noloader commented Jul 11, 2018

Add ARM AES asm implementation from Cryptogams.

Cryptogams is Andy Polyakov's project used to create high speed crypto algorithms and share them with other developers. Cryptogams has a dual license. First is the OpenSSL license because Andy contributes to OpenSSL. Second is a BSD license for those who want a more permissive license.

The integration instructions are documented at Cryptogams AES on the OpenSSL wiki.

noloader referenced this issue Jul 11, 2018
What a surprise... Clang pretends to be GCC with __GNUC__ but fails to consume the source file
@noloader
Copy link
Collaborator Author

Cleared at Commit 3ff7d7f0286a.

noloader added a commit that referenced this issue Mar 17, 2021
…1010, PR #1019)

We found we can avoid the memcpy in the previous workaround by using a volatile pointer. The pointer appears to tame the optimizer so the compiler does not short-circuit some calls when outString == inString.
noloader added a commit that referenced this issue Mar 17, 2021
We think this is another instance problem that surfaced under GH #683 when inString==outString. It violates aliasing rules and the compiler begins removing code.

The ultimate workaround was to add a member variable m_tempOutString as scratch space when inString==outString. We did not loose much in the way of perforamce for some reason. It looks like AES/CTR lost about 0.03-0.05 cpb.

When combined with the updated xorbuf from GH #1020, the net result was a speedup of 0.1-0.6 cpb. In fact, some ciphers like RC6, gained almost 5 cpb.
noloader added a commit that referenced this issue Mar 17, 2021
We think this is another instance problem that surfaced under GH #683 when inString==outString. It violates aliasing rules and the compiler begins removing code.

The ultimate workaround was to add a member variable m_tempOutString as scratch space when inString==outString. We did not loose much in the way of perforamce for some reason. It looks like AES/CTR lost about 0.03-0.05 cpb.

When combined with the updated xorbuf from GH #1020, the net result was a speedup of 0.1-0.6 cpb. In fact, some ciphers like RC6, gained almost 5 cpb.
EAddario pushed a commit to EAddario/cryptopp that referenced this issue Apr 10, 2021
…, GH weidai11#1010, PR weidai11#1019)

We found we can avoid the memcpy in the previous workaround by using a volatile pointer. The pointer appears to tame the optimizer so the compiler does not short-circuit some calls when outString == inString.
EAddario pushed a commit to EAddario/cryptopp that referenced this issue Apr 10, 2021
We think this is another instance problem that surfaced under GH weidai11#683 when inString==outString. It violates aliasing rules and the compiler begins removing code.

The ultimate workaround was to add a member variable m_tempOutString as scratch space when inString==outString. We did not loose much in the way of perforamce for some reason. It looks like AES/CTR lost about 0.03-0.05 cpb.

When combined with the updated xorbuf from GH weidai11#1020, the net result was a speedup of 0.1-0.6 cpb. In fact, some ciphers like RC6, gained almost 5 cpb.
EAddario pushed a commit to EAddario/cryptopp that referenced this issue Apr 10, 2021
We think this is another instance problem that surfaced under GH weidai11#683 when inString==outString. It violates aliasing rules and the compiler begins removing code.

The ultimate workaround was to add a member variable m_tempOutString as scratch space when inString==outString. We did not loose much in the way of perforamce for some reason. It looks like AES/CTR lost about 0.03-0.05 cpb.

When combined with the updated xorbuf from GH weidai11#1020, the net result was a speedup of 0.1-0.6 cpb. In fact, some ciphers like RC6, gained almost 5 cpb.
noloader added a commit that referenced this issue Feb 14, 2022
This commit attempts to restore performance while taming the optimizer.

Also see GH #683, GH #1010, GH #1088, GH #1103.
@noloader
Copy link
Collaborator Author

I merged the changes into master last night. You should test master now.

noloader referenced this issue Sep 30, 2023
It turns out we went down a rabbit hole when we added the volatile cast gyrations in an attempt to change the compiler behavior. We are seeing the same failures from AES, Rabbit, HIGHT, HC-128 and HC-256 with and without the gyrations.
We were able to work out the problems with Rabbit, HIGHT, HC-128 and HC-256. See GH #1231 and GH #1234.
We are also not able to successfully cut-in Cryptogams AES on ARMv7, so it is now disabled. See GH #1236.
Since the volatile casts were not a solution, we are backing it out along with associated comments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant