Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ATT: Missing parenthesis for absolute memory operands #454

Closed
fljmc opened this issue Sep 14, 2023 · 8 comments
Closed

ATT: Missing parenthesis for absolute memory operands #454

fljmc opened this issue Sep 14, 2023 · 8 comments
Labels
A-formatter Area: Formatter C-enhancement Category: Enhancement of existing features

Comments

@fljmc
Copy link

fljmc commented Sep 14, 2023

The following hex instructions:

201168: 48 8b 05 19 10 00 00 movq 0x1019(%rip), %rax
20116f: 8b 0d 1b 10 00 00 movl 0x101b(%rip), %ecx

Are dissasembled incorrectly:
mov 0x0000000000202188, %rax
mov 0x0000000000202190, %ecx

so, for the first mov, for example, operation does not use a value at address 0x202188, but uses an address value by itself.

@fljmc fljmc changed the title Wrong disassembly for movq instruction (x86_64). Wrong disassembly for movq/movl instructions (x86_64). Sep 14, 2023
@flobernd
Copy link
Member

flobernd commented Sep 14, 2023

Hi there!

I don't see a problem here. These instructions are RIP relative which basically means that the effective address will be:

RIP + displacement - instr.length

RIP = the address of the next instruction after your mov instruction.

@fljmc
Copy link
Author

fljmc commented Sep 14, 2023

Hi!

Yes, so why "48 8b 05 19 10 00 00" doesn't disassembly to "movq 0x1019(%rip), %rax" instead of "mov 0x0000000000202188, %rax"?

@flobernd
Copy link
Member

The address is interesting during static analysis and you don't want to always calculate it yourself 🙂

However, this is just our default. You can override this behavior by setting the ZYDIS_FORMATTER_PROP_FORCE_RELATIVE_RIPREL flag in your formatter instance:

ZYDIS_FORMATTER_PROP_FORCE_RELATIVE_RIPREL,

@fljmc
Copy link
Author

fljmc commented Sep 14, 2023

I have a sample dump here for more details:
It is made from the following c++ code:

"const char* Str = "abcde";

int Tmp = 0xaabbccdd;

int main() {
return Str[Tmp - 0xaabbccda];
}"

./main: file format elf64-x86-64

Disassembly of section .rodata:

0000000000200158 <.rodata>:
200158: 61
200159: 62 63 64 65 00

Disassembly of section .text:

0000000000201160

:
201160: c7 44 24 fc 00 00 00 00 movl $0x0, -0x4(%rsp)
201168: 48 8b 05 19 10 00 00 movq 0x1019(%rip), %rax # 0x202188
20116f: 8b 0d 1b 10 00 00 movl 0x101b(%rip), %ecx # 0x202190
201175: 81 e9 da cc bb aa subl $0xaabbccda, %ecx # imm = 0xAABBCCDA
20117b: 89 c9 movl %ecx, %ecx
20117d: 0f be 04 08 movsbl (%rax,%rcx), %eax
201181: c3 retq

Disassembly of section .data:

0000000000202188 :
202188: 58 popq %rax
202189: 01 20 addl %esp, (%rax)
20218b: 00 00 addb %al, (%rax)
20218d: 00 00 addb %al, (%rax)
20218f: 00 dd addb %bl, %ch

0000000000202190 :
202190: dd cc
202192: bb
202193: aa stosb %al, %es:(%rdi)

So, as you can see, "201168: 48 8b 05 19 10 00 00 movq 0x1019(%rip), %rax "
here should place 0x202158 to rax. I.e an address stored at 0x202188 location. So,
when the code does just "mov 0x0000000000202188, %rax" it is incorrect I believe. And it doesn't match llvm objdump output.

@fljmc
Copy link
Author

fljmc commented Sep 14, 2023

The address is interesting during static analysis and you don't want to always calculate it yourself 🙂

However, this is just our default. You can override this behavior by setting the ZYDIS_FORMATTER_PROP_FORCE_RELATIVE_RIPREL flag in your formatter instance:

ZYDIS_FORMATTER_PROP_FORCE_RELATIVE_RIPREL,

Oh, I see. I'll take a look. Thanks for the info!

@fljmc
Copy link
Author

fljmc commented Sep 14, 2023

I am probably closing it as the behavior is intentional, hence it is not an issue as I supposed. Thanks again for help!

@fljmc fljmc closed this as completed Sep 14, 2023
@flobernd
Copy link
Member

flobernd commented Sep 14, 2023

I might have misunderstood you here.

The RIP form should be correct, but you are saying that the absolute form is missing the pointer/address dereference parenthesis, right?

Technically you are correct. Let's reopen this issue and I'll try to remember why Zydis prints the absolute address without () in ATT syntax. It's definitely confusing, I have to admit.

In Intel syntax it seems correct:

== [      ATT ] ============================================================================================
   ABSOLUTE: mov 0x0000000000001020, %rax
   RELATIVE: mov 0x1019(%rip), %rax

== [    INTEL ] ============================================================================================
   ABSOLUTE: mov rax, qword ptr ds:[0x0000000000001020]
   RELATIVE: mov rax, qword ptr ds:[rip+0x1019]

@flobernd flobernd reopened this Sep 14, 2023
@flobernd flobernd changed the title Wrong disassembly for movq/movl instructions (x86_64). ATT: Missing parenthesis for absolute memory operands Sep 15, 2023
@flobernd flobernd added C-enhancement Category: Enhancement of existing features A-formatter Area: Formatter labels Sep 15, 2023
@flobernd
Copy link
Member

flobernd commented Oct 5, 2023

Hi @fljmc, I checked this again and came to the conclusion that this is not a bug.

Literal values in AT&T syntax require the $ prefix which allows us to clearly distinguish an absolute address from a numeric literal.

AT&T syntax as well is a little bit special in a way that there is not "THE" ground of truth. Every assembler/disassembler seems to implement this syntax slightly different. For example, during my investigation I've seen these forms:

  • mov 0x0000000000001020, %rax (what Zydis uses)
  • mov 0x0000000000001020(,1), %rax
  • mov (0x0000000000001020), %rax

cc @athre0z

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-formatter Area: Formatter C-enhancement Category: Enhancement of existing features
Projects
None yet
Development

No branches or pull requests

2 participants