Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MacOS Classic - Analyze dataflow and preserved registers #573

Closed
gbody opened this issue Feb 13, 2018 · 4 comments
Closed

MacOS Classic - Analyze dataflow and preserved registers #573

gbody opened this issue Feb 13, 2018 · 4 comments

Comments

@gbody
Copy link
Contributor

gbody commented Feb 13, 2018

Looking at how some of the procedures that get defined after Analyze Dataflow, it appears to me that some of registers that are being pushed at the begining and pulled at the end of the procedure are being included in the procedure signature. Should the registers being push at the begining and later pulled prior to exiting, be listed in the preserved registers for the procedure after Analyze dataflow has completed.

Pre Analyze Dataflow

Entry code pushing registers to stack
void _DATAINIT()
{
_DATAINIT_entry:
l0010FC02:
a7 = fp
a5 = a5world
a7 = a7 - 0x04
Mem0[a7:word32] = a4
a7 = a7 - 0x04
Mem0[a7:word32] = a3
a7 = a7 - 0x04
Mem0[a7:word32] = a2
a7 = a7 - 0x04
Mem0[a7:word32] = a1
a7 = a7 - 0x04
Mem0[a7:word32] = a0
a7 = a7 - 0x04
Mem0[a7:word32] = d7
a7 = a7 - 0x04
Mem0[a7:word32] = d6
a7 = a7 - 0x04
Mem0[a7:word32] = d5
a7 = a7 - 0x04
Mem0[a7:word32] = d4
a7 = a7 - 0x04
Mem0[a7:word32] = d3
a7 = a7 - 0x04
Mem0[a7:word32] = d2
a7 = a7 - 0x04
Mem0[a7:word32] = d1

=> Procedure processing

Exit code pulling registers from stack
l0010FC48:
d1 = Mem0[a7:word32]
a7 = a7 + 0x04
d2 = Mem0[a7:word32]
a7 = a7 + 0x04
d3 = Mem0[a7:word32]
a7 = a7 + 0x04
d4 = Mem0[a7:word32]
a7 = a7 + 0x04
d5 = Mem0[a7:word32]
a7 = a7 + 0x04
d6 = Mem0[a7:word32]
a7 = a7 + 0x04
d7 = Mem0[a7:word32]
a7 = a7 + 0x04
a0 = Mem0[a7:word32]
a7 = a7 + 0x04
a1 = Mem0[a7:word32]
a7 = a7 + 0x04
a2 = Mem0[a7:word32]
a7 = a7 + 0x04
a3 = Mem0[a7:word32]
a7 = a7 + 0x04
a4 = Mem0[a7:word32]
a7 = a7 + 0x04
return
_DATAINIT_exit:
}
Exit code pulling registers from stack

Post Analyze dataflow
word32 _DATAINIT(word32 d0, word32 d1, word32 a4, ptr32 & d1Out, ptr32 & d7Out, ptr32 & a3Out, ptr32 & a4Out)
{
_DATAINIT_entry:
l0010FC02:
word32 d0_121
word32 a7_120 = fp - 0x0030
branch Mem0[0x0010FDB4:word16] == 0x01 l0010FC16
l0010FC12:
d0_121 = -0x01
goto l0010FC48
l0010FC16:
word32 a3_88 = a5world - Mem0[0x0010FDB0:word32]
ZEROBUFFER(d1, dwLoc3C, Mem0[0x0010FDB0:word32], a3_88)
Mem101[fp - 0x3C:word32] = 0x0010FDB0 + Mem0[0x0010FDB8:word32]
Mem104[fp - 0x40:word32] = a3_88
uncompress_world(dwArg00, dwArg04)
Mem111[fp - 0x3C:word32] = 0x0010FDB0 + Mem104[0x0010FDBC:word32]
Mem114[fp - 0x40:word32] = a3_88
Mem117[fp - 0x44:word32] = a5world
relocate_world(dwArg00, dwArg04, dwArg08)
a7_120 = fp - 0x38
d0_121 = 0x00
l0010FC48:
word32 a7_56 = a7_120 + 0x04
word32 d1_55
*d1Out = Mem0[a7_120:word32]
word32 d7_67
*d7Out = Mem0[a7_56 + 0x0014:word32]
word32 a3_75
*a3Out = Mem0[a7_56 + 0x0024:word32]
word32 a4_77
*a4Out = Mem0[a7_56 + 0x0028:word32]
return d0_121
_DATAINIT_exit:
}

@uxmal
Copy link
Owner

uxmal commented Feb 15, 2018

Not having had time to look at this closely yet, my first suspicion is that Reko is not correctly handling the sequence:

relocate_world(dwArg00, dwArg04, dwArg08)
a7_120 = fp - 0x38

causing it to consider the stack to be unbalanced, which in forces it to consider the register assignments as generating output values, rather than restoring the registers to the values they had on entry to the procedure.

@gbody
Copy link
Contributor Author

gbody commented Feb 15, 2018

@uxmal looking at it further, it appears to me that it might be having problems with the call to ZEROBUFFER. it's pushing a ptr to A5World - word32 buffer size at (a4), then pushing word32 buffer size at (a4), then calls procedure ZEROBUFFER. The procedure ZEROBUFFER is interesting in that it pulls the return address and puts it in a0, and then pulls the size/length of the buffer , then the address of the buffer. Then clears the buffer and returns via jmp.l (a0). The RTL code generated is a call to a0 and then return.
Is it this handling of pulling the return address and jmp.l (a0) throwing the stack off?

Probably should be a seperate question (issue). Looking at RTL code the setup of parameters prior to calling the procedure ZERROBUFFER, there is reference to a5 which is the A5World pointer address. While previously your talked about a structure that points to A5World. The actual pointer to A5World should be an offset from the start of A5World memory region that points to the A5World zero offset address. This pointer would then be used for all offsets relative to the A5World that are being referenced.
Is this how it works, or is there something hidden under the hood to make it work?

Assembly section from _DATAINIT
0010FC16 26 4D movea.l a5,a3
0010FC18 97 D4 suba.l (a4),a3
0010FC1A 2F 0B move.l a3,-(a7)
0010FC1C 2F 14 move.l (a4),-(a7)
0010FC1E 61 00 01 4C bsr ZEROBUFFER
0010FC22 20 2C 00 08 move.l $0008(a4),d0
0010FC26 48 74 08 00 pea (a4,d0)
0010FC2A 2F 0B move.l a3,-(a7)
0010FC2C 61 00 00 2E bsr uncompress_world

RTL code for above assembly
l0010FC16:
a3 = a5
a3 = a3 - Mem0[a4:word32]
CVZNX = cond(a3)
a7 = a7 - 0x04
v22 = a3
Mem0[a7:word32] = v22
CVZN = cond(v22)
v23 = Mem0[a4:word32]
a7 = a7 - 0x04
v24 = v23
Mem0[a7:word32] = v24
CVZN = cond(v24)
call ZEROBUFFER (retsize: 4;)
d0 = Mem0[a4 + 0x08:word32]
CVZN = cond(d0)
a7 = a7 - 0x04
Mem0[a7:word32] = a4 + d0
a7 = a7 - 0x04
v25 = a3
Mem0[a7:word32] = v25
CVZN = cond(v25)
call uncompress_world (retsize: 4;)

@uxmal
Copy link
Owner

uxmal commented Apr 20, 2018

Indeed, the jmp.l (a0) is throwing Reko for a loop. What's going on here is that the return address is being "reified", and treated very similar to how RISC machines use their link registers. Currently Reko doesn't explicitly manage the return address (continuation, in computer science speak). I'm going to have to think about how to make continuations explicit in the codebase.

@uxmal
Copy link
Owner

uxmal commented Nov 10, 2019

I'm going to have to look closer at this tomorrow. It's curious that this has regressed since none of the logic in the data flow analysis "should" affect the symbol generation which happens much earlier in the decompilation process.

@uxmal uxmal closed this as completed in 69513c7 Apr 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants