Amd64

PeterJohnson edited this page Jun 24, 2011 · 1 revision

AMD64 Architecture

The AMD64 architecture is a new 64-bit architecture developed by AMD, based on the 32-bit x86 architecture. It extends the original x86 architecture by doubling the number of general purpose and SIMD registers, extending the arithmetic operations and address space to 64 bits, as well as other features.

Intel has introduced an essentially identical version of AMD64 originally called EM64T but now called Intel64.

Yasm AMD64 Support

Yasm extends the base NASM syntax to support AMD64 as follows. To enable assembly of instructions for the 64-bit mode of AMD64 processors, use the directive BITS 64. As with NASM's BITS directive, this does not change the format of the output object file to 64 bits; it only changes the assembler mode to assume that the instructions being assembled will be run in 64-bit mode. To specify an AMD64 object file, use "-m amd64" on the YasmCommandLine, or explicitly target a 64-bit object format such as "-f win64" or "-f elf64".

Note: the following is a copy of the documentation in the yasm_arch(7) man page.

Register Changes

The additional 64-bit general purpose registers are named r8-r15. There are also 8-bit (rXb), 16-bit (rXw), and 32-bit (rXd) subregisters that map to the least significant 8, 16, or 32 bits of the 64-bit register. The original 8 general purpose registers have also been extended to 64-bits: eax, edx, ecx, ebx, esi, edi, esp, and ebp have new 64-bit versions called rax, rdx, rcx, rbx, rsi, rdi, rsp, and rbp respectively. The old 32-bit registers map to the least significant bits of the new 64-bit registers.

New 8-bit registers are also available that map to the 8 least significant bits of rsi, rdi, rsp, and rbp. These are called sil, dil, spl, and bpl respectively. Unfortunately, due to the way instructions are encoded, these new 8-bit registers are encoded the same as the old 8-bit registers ah, dh, ch, and bh. The processor tells which is being used by the presence of the new REX prefix that is used to specify the other extended registers. This means it is illegal to mix the use of ah, dh, ch, and bh with an instruction that requires the REX prefix for other reasons. For instance:

add ah, [r10]

(NASM syntax) is not a legal instruction because the use of r10 requires a REX prefix, making it impossible to use ah.

In 64-bit mode, an additional 8 SSE2 registers are also available. These are named xmm8-xmm15.

64 Bit Instructions

By default, most operations in 64-bit mode remain 32-bit; operations that are 64-bit usually require a REX prefix (one bit in the REX prefix determines whether an operation is 64-bit or 32-bit). Thus, essentially all 32-bit instructions have a 64-bit version, and the 64-bit versions of instructions can use extended registers "for free" (as the REX prefix is already present). Examples in NASM syntax:

mov eax, 1  ; 32-bit instruction
mov rcx, 1  ; 64-bit instruction

Instructions that modify the stack (push, pop, call, ret, enter, and leave) are implicitly 64-bit. Their 32-bit counterparts are not available, but their 16-bit counterparts are. Examples in NASM syntax:

push eax  ; illegal instruction
push rbx  ; 1-byte instruction
push r11  ; 2-byte instruction with REX prefix

Implicit Zero Extension

Results of 32-bit operations are implicitly zero-extended to the upper 32 bits of the corresponding 64-bit register. 16 and 8 bit operations, on the other hand, do not affect upper bits of the register (just as in 32-bit and 16-bit modes). This can be used to generate smaller code in some instances. Examples in NASM syntax:

mov ecx, 1  ; 1 byte shorter than mov rcx, 1
and edx, 3  ; equivalent to and rdx, 3

Immediates

For most instructions in 64-bit mode, immediate values remain 32 bits; their value is sign-extended into the upper 32 bits of the target register prior to being used. The exception is the mov instruction, which can take a 64-bit immediate when the destination is a 64-bit register. Examples in NASM syntax:

add rax, 1                  ; optimized down to signed 8-bit immediate
add rax, dword 1            ; force immediate size to 32-bit
add rax, 0xffffffff         ; sign-extended 32-bit immediate
add rax, -1                 ; same as above
add rax, 0xffffffffffffffff ; warning (>32 bit), truncated to same as above
mov eax, 1                  ; 5 byte instruction
mov rax, 1                  ; 5 byte instruction (optimized to signed 32-bit immediate)
mov rax, qword 1            ; 10 byte instruction (force 64-bit immediate)
mov rbx, 0x1234567890abcdef ; 10 byte instruction
mov rcx, 0xffffffff         ; 10 byte instruction (does not fit in signed 32-bit size)
mov ecx, -1                 ; 5 byte instruction equivalent to above
mov rcx, sym                ; 5 byte - yasm defaults to 32-bit size for symbols
mov rcx, qword sym          ; 10 byte - override default size

The handling of mov reg64, unsized immediate is different between YASM and NASM 2.x; YASM follows the above behavior, while NASM2 does the following:

add rax, 0xffffffff         ; sign-extended 32-bit immediate
add rax, -1                 ; same as above
add rax, 0xffffffffffffffff ; warning (>32 bit), truncated to same as above
add rax, sym                ; sign-extended 32-bit immediate
mov eax, 1                  ; 5 byte instruction (32-bit immediate)
mov rax, 1                  ; 10 byte instruction (64-bit immediate)
mov rbx, 0x1234567890abcdef ; 10 byte instruction
mov rcx, 0xffffffff         ; 10 byte instruction
mov ecx, -1                 ; 5 byte instruction equivalent to above
mov ecx, sym                ; 5 byte instruction (32-bit immediate)
mov rcx, sym                ; 10 byte instruction
mov rcx, qword sym          ; 10 byte instruction

Displacements

Just like immediates, displacements, for the most part, remain 32 bits and are sign extended prior to use. Again, the exception is one restricted form of the mov instruction: between the al/ax/eax/rax register and a 64-bit absolute address (no registers allowed in the effective address). In NASM syntax, use of the 64-bit absolute form requires [qword]. Examples in NASM syntax:

mov eax, [1]    ; 32 bit, with sign extension
mov al, [rax-1] ; 32 bit, with sign extension
mov al, [qword 0x1122334455667788] ; 64-bit absolute
mov al, [0x1122334455667788] ; truncated to 32-bit (warning)

RIP Relative Addressing

In 64-bit mode, a new form of effective addressing is available to make it easier to write position-independent code. Any memory reference may be made RIP relative (RIP is the instruction pointer register, which contains the address of the location immediately following the current instruction).

In NASM syntax, there are two ways to specify RIP-relative addressing:

mov dword [rip+10], 1

stores the value 1 ten bytes after the end of the instruction. "10" can also be a symbolic constant, and will be treated the same way. On the other hand,

mov dword [symb wrt rip], 1

stores the value 1 into the address of symbol "symb". This is distinctly different than the behavior of:

mov dword [symb+rip], 1

which takes the address of the end of the instruction, adds the address of "symb" to it, then stores the value 1 there. If symb is a variable, this will not store the value 1 into the symb variable!

Yasm also supports the following syntax for RIP-relative addressing from NASM 2.x:

mov [rel sym], rax         ; RIP-relative
mov [abs sym], rax         ; not RIP-relative

The behavior of

mov [sym], rax

Depends on a mode set by the DEFAULT directive, as follows. The default mode is always "abs", and in "rel" mode, use of registers, an fs or gs segment override, or an explicit "abs" override will result in a non-RIP-relative effective address.

default rel
mov [sym], rbx             ; RIP-relative
mov [abs sym], rbx         ; not RIP-relative (explicit override)
mov [rbx+1], rbx           ; not RIP-relative (register use)
mov [fs:sym], rbx          ; not RIP-relative (fs or gs use)
mov [ds:sym], rbx          ; RIP-relative (segment override, but not fs or gs)
mov [rel sym], rbx         ; RIP-relative (redundant override)

default abs
mov [sym], rbx             ; not RIP-relative
mov [abs sym], rbx         ; not RIP-relative
mov [rbx+1], rbx           ; not RIP-relative
mov [fs:sym], rbx          ; not RIP-relative
mov [ds:sym], rbx          ; not RIP-relative
mov [rel sym], rbx         ; RIP-relative (explicit override)

Memory references

Usually the size of a memory reference can be deduced by which registers you're moving--for example, "mov [rax],ecx" is a 32-bit move, because ecx is 32 bits. YASM currently gives the non-obvious "invalid combination of opcode and operands" error if it can't figure out how much memory you're moving. The fix in this case is to add a memory size specifier: qword, dword, word, or byte.

Here's a 64-bit memory move, which sets 8 bytes starting at rax:

mov qword [rax], 1

Here's a 32-bit memory move, which sets 4 bytes:

mov dword [rax], 1

Here's a 16-bit memory move, which sets 2 bytes:

mov word [rax], 1

Here's an 8-bit memory move, which sets 1 byte:

mov byte [rax], 1

Key Links