Skip to content

AES-GCM x86_64 MSVC ASM: XMM6-15 are non-volatile#6617

Merged
JacobBarthelmeh merged 1 commit intowolfSSL:masterfrom
SparkiDev:aes_gcm_win_asm_fix
Jul 31, 2023
Merged

AES-GCM x86_64 MSVC ASM: XMM6-15 are non-volatile#6617
JacobBarthelmeh merged 1 commit intowolfSSL:masterfrom
SparkiDev:aes_gcm_win_asm_fix

Conversation

@SparkiDev
Copy link
Copy Markdown
Contributor

Description

Put XMM6-15, when used, on the stack at start of function and restore at end of function.

Fixes #6608

Testing

Standard

Checklist

  • added tests
  • updated/added doxygen
  • updated appropriate READMEs
  • Updated manual and documentation

Copy link
Copy Markdown

@ilka1999 ilka1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Common mistake is the wrong offset for parameters which passed by stack.
These parameters are loaded before stack reservation for local purposes, so their offset should be the same, as it was before stack usage increase.

Comment thread wolfcrypt/src/aes_gcm_asm.asm Outdated
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why +256 ?
it should be +96

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use 20 64-bit words of stack in the function for temporary storage.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sean,

It is loading parameters from the stack

Let's calculate offset for 5-th parameter:

  1. first four parameters = 32 bytes
  2. return address = 8 bytes
  3. seven non-volatile registers saving = 56 bytes

totally 96 bytes
so, 5-th parameter has offset 96 from RSP

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like I confused myself and treated the xmm registers to save as parameters.
Fix up now.

Comment thread wolfcrypt/src/aes_gcm_asm.asm Outdated
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again why +264 ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, stack used for temporary storage.

Comment thread wolfcrypt/src/aes_gcm_asm.asm Outdated
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why AVX version instead of SSE (movdqu) ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.
Generated code and it didn't know to produce SSE2 only code.
Fixed this.

Comment thread wolfcrypt/src/aes_gcm_asm.asm Outdated
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about xmm8-14 ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed generating code.

@SparkiDev SparkiDev force-pushed the aes_gcm_win_asm_fix branch from 74398d5 to a2aafe0 Compare July 19, 2023 21:49
Copy link
Copy Markdown

@ilka1999 ilka1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least you need

  1. change commands for all XMM saves from vmovdqu to movdqu
  2. check the parameter offsets, they should not be changed unless there are no new stack modifications between the start of the function and the loading of the parameters.

be careful about addressing parameters from the middle of the code, it should be increased if such addressing occurs after new stack modifications

Comment thread wolfcrypt/src/aes_gcm_asm.asm Outdated
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sean,

It is loading parameters from the stack

Let's calculate offset for 5-th parameter:

  1. first four parameters = 32 bytes
  2. return address = 8 bytes
  3. seven non-volatile registers saving = 56 bytes

totally 96 bytes
so, 5-th parameter has offset 96 from RSP

@SparkiDev SparkiDev force-pushed the aes_gcm_win_asm_fix branch from a2aafe0 to 4dc3924 Compare July 20, 2023 21:54
@SparkiDev
Copy link
Copy Markdown
Contributor Author

Fixed generation to put on stack and take off stack in the right place.

@ilka1999
Copy link
Copy Markdown

Ok

Last version is working with my friend's tests

But, ASM code is not perfect:

  1. there are some oddish lines
  2. no proc frames

@SparkiDev
Copy link
Copy Markdown
Contributor Author

Hi @ilka1999

  1. Let me know what is odds in the lines. Specifics or generally.
  2. Are you suggesting adding 'FRAMES' to the PROC lines?

Thanks,
Sean

@ilka1999
Copy link
Copy Markdown

IMHO:

  1. the code still uses vmovdqu instead of movdqu
  2. savings of XMM registers which does not changed
  3. allocates stack for temporary local variables in the middle of body

I did not check the stack alignment of the code, with alignment you can use ALIGNED loading/store instead of UNALIGNED

Put XMM6-15, when used, on the stack at start of function and restore at
end of function.
@SparkiDev SparkiDev force-pushed the aes_gcm_win_asm_fix branch from 4dc3924 to cfac603 Compare July 25, 2023 22:23
@SparkiDev
Copy link
Copy Markdown
Contributor Author

SparkiDev commented Jul 25, 2023

Hi @ilka1999,

Thanks for the feedback!

  1. I've pushed an update to fix this.
  2. This has also been fixed.
  3. The temporary stack allocations are not something I intend to change right now.

Note that the stack is not always aligned as I would like. On newer processors, unaligned moves are the same speed as the aligned moves. I won't be making changes for this but thank you for bringing it up.

Sean

@ilka1999
Copy link
Copy Markdown

Tested successfully

@JacobBarthelmeh JacobBarthelmeh merged commit 4b80dcf into wolfSSL:master Jul 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: aes_gcm_asm.asm doesn't support Windows x86-64 calling convention

3 participants