-
-
Notifications
You must be signed in to change notification settings - Fork 466
Description
Having gotten hold of a box with a Zhaoxin KX-6580 CPU (Chinese x86 cpu vendor; formed as a joint venture of VIA and Shanghai; their designs are mostly a continuation of the VIA C3/C7/Nano series cores, mainly for the Chinese market but have started showing up elsewhere) I decided to do a whole bunch of testing on its PadLock functionality - and in doing so, I've made a number of findings of various undocumented and underdocumented features. The ones most relevant for disassembly tools like, say, Zydis, so far appear to be:
-
The
rep montmulinstruction takes, much to my surprise, a mandatory67haddress size prefix in 64-bit mode (!!). This is observed by the sequencef3 0f a6 c0consistently producing an #UD exception, while something likef3 67 0f a6 c0does not. The issue appears to be thatrep montmultakes a pointer in rSI to a data structure that contains 5 pointers to various buffers needed by this instruction - this data structure does not appear to have ever been updated to work with 64-bit pointers, and so the 67h prefix is needed to force 32-bit addressing for the instruction. This makes the instruction fairly inconvenient to set up, since it becomes necessary to make sure that this structure and all its buffers reside in the bottom 4GB of virtual address space, but once that is done, the instruction variant with the 67h prefix (but not without) will execute a Montgomery multiply just fine. -
The instruction encoding
f3 0f a6 e0is a seemingly undocumented instruction to accelerate SHA-512 hashing. In my testing, it appears to take the following arguments:- rCX = number of 128-byte blocks to hash
- ES:rSI = pointer to source data
- ES:rDI = pointer to a 64-byte digest to update
I haven't been able to find this instruction documented anywhere, but OpenSSL clearly knows about it (see https://github.com/openssl/openssl/blob/master/engines/asm/e_padlock-x86.pl , line 597), referring to it asrep xsha512. The instruction encodingf3 0f a6 d8also appears to be an alias of this instruction.
-
The instruction encoding
f3 0f a6 e8is a Zhaoxin-specific "GMI" instruction:ccs_hash. This instruction is documented ( https://github.com/ZXOpenSource/OpenSSL-ZX-GMI/blob/master/GMI%20User%20Manual%20V1.0.pdf - in Chinese, but gets pretty readable after a trip through google translate) to provide support for the Chinese SM3 hashing algorithm - in my testing, it also provides undocumented support for SHA-1/256/512 that can be obtained by setting rBX to values in the range 0x10 to 0x15. -
The instruction encoding
f3 0f a7 f0is another Zhaoxin-specific "GMI" instruction:ccs_encrypt. This instruction is documented to provide support for the Chinese SM4 encryption algorithm - it also provides undocumented support for AES-128/192/256 that can be obtained by setting rAX to values in the range 0x10 to 0x15. -
The instruction encodings
f3 0f a6 f0andf3 0f a6 f8are undocumented and I haven't been able to figure out what they might do. They produce a #GP exception for all sorts of arguments I've been trying to pass them, suggesting that they either expect a really odd input data format or are privileged instructions. -
At least on this specifc CPU, the
xstoreinstruction accepts therepneprefix, and treats it as a synonym forrep-f2 0f a7 c0produces the same output as I would expect fromrep xstoref3 0f a7 c0. None of the other Padlock instructions accept this prefix (#UD). The instruction encodingf3 0f a7 f8appears to be an alias ofrep xstore, however it doesn't acceptrepne. -
From what I can find, all of the instructions in the Padlock space (
0f a6 c0-ffand0f a7 c0-ff) exhibit partial decode, where the bottom 3 bits of the last byte of the instruction are ignored - e.g.f3 0f a7 f7is accepted as a valid instruction and behaves identically tof3 0f a7 f0.