AES-GCM x86_64 MSVC ASM: XMM6-15 are non-volatile#6617
AES-GCM x86_64 MSVC ASM: XMM6-15 are non-volatile#6617JacobBarthelmeh merged 1 commit intowolfSSL:masterfrom
Conversation
ilka1999
left a comment
There was a problem hiding this comment.
Common mistake is the wrong offset for parameters which passed by stack.
These parameters are loaded before stack reservation for local purposes, so their offset should be the same, as it was before stack usage increase.
There was a problem hiding this comment.
I use 20 64-bit words of stack in the function for temporary storage.
There was a problem hiding this comment.
Sean,
It is loading parameters from the stack
Let's calculate offset for 5-th parameter:
- first four parameters = 32 bytes
- return address = 8 bytes
- seven non-volatile registers saving = 56 bytes
totally 96 bytes
so, 5-th parameter has offset 96 from RSP
There was a problem hiding this comment.
Looks like I confused myself and treated the xmm registers to save as parameters.
Fix up now.
There was a problem hiding this comment.
As above, stack used for temporary storage.
There was a problem hiding this comment.
why AVX version instead of SSE (movdqu) ?
There was a problem hiding this comment.
Good point.
Generated code and it didn't know to produce SSE2 only code.
Fixed this.
There was a problem hiding this comment.
Fixed generating code.
74398d5 to
a2aafe0
Compare
ilka1999
left a comment
There was a problem hiding this comment.
at least you need
- change commands for all XMM saves from vmovdqu to movdqu
- check the parameter offsets, they should not be changed unless there are no new stack modifications between the start of the function and the loading of the parameters.
be careful about addressing parameters from the middle of the code, it should be increased if such addressing occurs after new stack modifications
There was a problem hiding this comment.
Sean,
It is loading parameters from the stack
Let's calculate offset for 5-th parameter:
- first four parameters = 32 bytes
- return address = 8 bytes
- seven non-volatile registers saving = 56 bytes
totally 96 bytes
so, 5-th parameter has offset 96 from RSP
a2aafe0 to
4dc3924
Compare
|
Fixed generation to put on stack and take off stack in the right place. |
|
Ok Last version is working with my friend's tests But, ASM code is not perfect:
|
|
Hi @ilka1999
Thanks, |
|
IMHO:
I did not check the stack alignment of the code, with alignment you can use ALIGNED loading/store instead of UNALIGNED |
Put XMM6-15, when used, on the stack at start of function and restore at end of function.
4dc3924 to
cfac603
Compare
|
Hi @ilka1999, Thanks for the feedback!
Note that the stack is not always aligned as I would like. On newer processors, unaligned moves are the same speed as the aligned moves. I won't be making changes for this but thank you for bringing it up. Sean |
|
Tested successfully |
Description
Put XMM6-15, when used, on the stack at start of function and restore at end of function.
Fixes #6608
Testing
Standard
Checklist