Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird bug in ARM emulation(investigating but need some help about code behaviour) #287

Closed
JCYang opened this issue Dec 2, 2015 · 22 comments

Comments

@JCYang
Copy link
Contributor

JCYang commented Dec 2, 2015

I emulated some ARMv7(thumb and ARM mixed) code and tried to exam the memory status at some point.
In order to emulate a known imported function, I hooked the entry of the lazy link PLT block(uc_hook_add + UC_HOOK_CODE) of the function and patched it to BX LR(I write r0 in my emulation codes), and it worked.
But a bug found, emulator seems to halt at a certain point later, yet the CPU usage indicate it is still live and running. In order to investigate what's wrong, I tried adding global instruction hook which simply print the instruction ran, in the callback of my patched hook. And weird thing happen, this very bug got "fixed", the engine continue to run to the "until" instruction I specified.

Some code snippet might better tell the story:

def GlobalHook(uc_emu, address, code_size, usr_data):
pass # even place holder helps!!!

def EmulateFunc(uc_emu, address, code_size, usr_data):
r0 = uc_emu.reg_read(UC_ARM_REG_R0)
# some calculation here
uc_emu.reg_write(UC_ARM_REG_R0, result)
uc_emu.hook_add(UC_HOOK_CODE, GlobalHook) # without this hook, the engine will halt at a certain point later

I've tried to read and modify Unicorn to help diagnose this bug, so I know it halt at a certain point.
And I guess, the hook callback codes make a different to the internal states of the engine, but I'm not familiar with Unicorn codebase, and the code generation engine it used, when I try to look for gen_helper_uc_tracecode from the source, only one invoke location and the definition is nowhere.

I'm continue investigating this bug and would like to fix it, but I really need some help from you, thank you.

@aquynh
Copy link
Member

aquynh commented Dec 2, 2015

Can you please put together a sample code exposing this issue, and send in
a pull request to put it under tests/regress/ or tests/unit/ ?

Thanks.

@JCYang
Copy link
Contributor Author

JCYang commented Dec 2, 2015

I'm sorry that I can't provide the codes which trigger this bug for some reasons, and I don't know how to properly construct a minimal test case. So I asked for help, try to find and fix it locally. Of course I will send a pull request once I identify and fix it, if I can't fix it by myself, I will also share my discover with you here.

@aquynh
Copy link
Member

aquynh commented Dec 3, 2015

if you can narrow down the input to just some bytes triggering the issue, i think you can show it without disclosing anything.

@JCYang
Copy link
Contributor Author

JCYang commented Dec 3, 2015

@aquynh Once I know the range of bytes trigger this issue, I will share it.

Now some new discovery, I found the point probably should be responsible for this issue or somehow relate. I attach to the thread continue occupy the CPU(one core full load), and the stack trace is like this:

(gdb) bt
#0 0x051cd70b in static_code_gen_buffer ()
from C:\Python27\lib\site-packages\unicorn\unicorn.dll
#1 0x00000015 in ?? ()
#2 0x02b0b83e in save_globals_arm (s=0x0, allocated_regs=27)
at E:/MyProjects/unicorn/qemu/tcg/tcg.c:1950
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

I run the same debug session in a linux build and the backtrace is also similar, gdb complain stack corrupt.

(gdb) thread 2
[Switching to thread 2 (Thread 0x7f2c68b03700 (LWP 6064))]
#0 0x00007f2c6d86ed7e in static_code_gen_buffer () from /usr/lib/libunicorn.so
(gdb) bt
#0 0x00007f2c6d86ed7e in static_code_gen_buffer () from /usr/lib/libunicorn.so
#1 0x00007f2c6a70b0f0 in ?? ()
#2 0x000000000000001c in ?? ()
#3 0x0000000010000000 in ?? ()
#4 0x00007f2c6a70b0f0 in ?? ()
#5 0x00007f2c68b02970 in ?? ()
#6 0x00007f2c6af38819 in tcg_temp_free_internal_arm (
s=<error reading variable: Cannot access memory at address 0xffffffffffffffe8>,
idx=<error reading variable: Cannot access memory at address 0xffffffffffffffe4>) at /home/jcyang/development/unicorn/qemu/tcg/tcg.c:628
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

And both point that static_code_gen_buffer() as the top of the stack.

@aquynh
Copy link
Member

aquynh commented Dec 4, 2015

you can find out which instruction triggers the crash by hooking every instructions, and print out its bytes before the instruction is executed.

this is possible by having a callback like this: https://github.com/unicorn-engine/unicorn/blob/master/samples/sample_x86.c#L41. then you have to register the callback before emulation, with UC_HOOK_CODE hook, like this https://github.com/unicorn-engine/unicorn/blob/master/samples/sample_x86.c#L205

@JCYang
Copy link
Contributor Author

JCYang commented Dec 5, 2015

@aquynh Not crash, my first post has detail description of this issue.
And now I approach the root by several more steps,
it is tcg_qemu_tb_exec() in static tcg_target_ulong cpu_tb_exec(CPUState *cpu, uint8_t *tb_ptr) continue running and not return, cpu is full loaded, seems like wrong code generated result in infinite loop in the TB? I would like to print the codes in the TB for further analyze but add codes to cpu-exec.c is really difficult, how can I tell unicorn to build my .c and link to it at last?

@JCYang
Copy link
Contributor Author

JCYang commented Dec 5, 2015

hmm... gdb can help me better... I think I don't need to modify the code~

@JCYang
Copy link
Contributor Author

JCYang commented Dec 19, 2015

Finally solved... a bug hidden very deep.
I think I need to elaborate it here before I send a pull request.(hell... I don't know how to send a git pull request yet...will learn very soon>_<)

It's a bug introduced by Unicorn modification to TCG(that's why it's not in qemu, which will surely be encountered and fixed already), in the tcg_liveness_analysis.
Unicorn prevent all temps liveness status to refresh for brcond op, for the reason in the comment, but this solution will make tcg global sometimes not synced properly(in tcg_reg_alloc_op(), especially probably the mostly hurt are the four flags register, ZF, CF, VF, NF. Nearly every ARM code can be executed conditionally, which will introduced such kind of scenario.
Consider the following ARM code(not THUMB):
...
CMP R0,#1
MOV R1,R1,LSR#4
SUBGE R2,R2,#4
BGE somewhere
...
which will be translated to TCG ops like this(after tcg_optimize()):

...
set_label $0x4
movi_i32 tmp8,$0x1
mov_i32 tmp9,r0
sub_i32 NF,r0,tmp8 // Point C
mov_i32 ZF,NF
setcond_i32 CF,r0,tmp8,geu
xor_i32 VF,NF,r0
xor_i32 tmp10,r0,tmp8
and_i32 VF,VF,tmp10
mov_i32 tmp9,NF
mov_i32 tmp8,r1
movi_i32 tmp9,$0x4
shr_i32 tmp8,r1,tmp9
mov_i32 r1,tmp8
xor_i32 tmp8,VF,NF
movi_i32 tmp9,$0x0
brcond_i32 tmp8,tmp9,lt,$0x5 // Point B
movi_i32 tmp8,$0x4
mov_i32 tmp9,r2
sub_i32 NF,r2,tmp8 // Point A
mov_i32 ZF,NF
setcond_i32 CF,r2,tmp8,geu
xor_i32 VF,NF,r2
xor_i32 tmp10,r2,tmp8
and_i32 VF,VF,tmp10
mov_i32 tmp9,NF
mov_i32 r2,ZF
set_label $0x5
xor_i32 tmp8,VF,NF
movi_i32 tmp9,$0x0
brcond_i32 tmp8,tmp9,lt,$0x6
goto_tb $0x0
movi_i32 pc,somewhere
exit_tb $0x181141a0
...

Since liveness analysis is backward, first in Point A, NF as output operand, is sentenced to death and got the corresponding In-Memory bit clear, previous op will not sync result to it if it's used as output operand again, without In-Memory state refresh, that is, Point C will failed to remember NF(in the generated host code, discard), and then in label $0x5, condition check will be wrong if the subtraction is skipped. In Qemu, the brcond in Point B comes to rescue, but Unicorn prevent it altogether. IIUC, we should prevent only dead_temps refresh for brcond, and keep refresh TCG globals In-Memory states for brcond op.

At least after the change I described above, my codes work properly, the bug is gone. I will learn to send a pull request ASAP. Any comments are welcome.

@JCYang
Copy link
Contributor Author

JCYang commented Dec 19, 2015

This bug is really deep and very subtle, in the process of tracing it, it teach me many things about TCG... hooray for the solution...

@aquynh
Copy link
Member

aquynh commented Dec 19, 2015

looks interesting, but with your fix, does everything else (especially those under tests/unit/ & tests/regress/) still works as before?

here is what you should do to make a pull request. (i suppose you are using commandline on Mac or Linux)

you can repeat all the steps above for each pull request, but with different branch name each time

@JCYang
Copy link
Contributor Author

JCYang commented Dec 19, 2015

Can I do the unit test in Windows? the default build configure of windows does not contain unit test code built, is it possible to build the unit test in Windows?
Though I can try it inside a VM...

@aquynh
Copy link
Member

aquynh commented Dec 19, 2015

yes you can try on Windows, but everything is much easier on Unix.

@JCYang
Copy link
Contributor Author

JCYang commented Dec 21, 2015

It's not as fluid as it should be. I try to build the unit test(tests/unit) in a Ubuntu box, the linker still complained symbol not found(all are symbols from cmocka, IIUC. I encountered the same problem in Windows with mingw, so I gave up there and move to a Ubuntu box) . I've built cmocka-1.0.1 and install it already, and I even read the Makefile, know it include the root of unicorn directory as a library search directory, so I manually copy libcmocka.so(and the sym links) to here, and it still doesn't work. anything wrong?

@JCYang
Copy link
Contributor Author

JCYang commented Dec 21, 2015

The patch is ready for test, but since test can not be built successfully, I'm not sure enough whether it will break them or not.

@aquynh
Copy link
Member

aquynh commented Dec 21, 2015

On Dec 21, 2015 3:45 PM, "JCYang" notifications@github.com wrote:

The patch is ready for test, but since test can not be built successfully, I'm not sure enough whether it will break them or not.

Ok, just send a pull request so we can review and test it. Thanks.

@farmdve
Copy link
Contributor

farmdve commented Dec 22, 2015

I did have a problem with the ZF flag not being cleared when it should, perhaps it's related to this?

@JCYang
Copy link
Contributor Author

JCYang commented Dec 23, 2015

@farmdve Maybe, any code result in brcond in the middle of a TB may introduce a scenario trigger this bug, I haven't look into x86 translator, though IIRC no condition-per-instruction code there. I've added a test case for this bug yesterday which contain detail explanation, which can help you better understand this bug.

@JCYang
Copy link
Contributor Author

JCYang commented Dec 23, 2015

Some words about the initial weird behavior I found:
A global code hook(even an empty one as place holder) effectively 'disable' this bug because, Unicorn implement code hook by inserting helper function into the TCG ops before every instruction-translated-TCG-code-block(this is a term named by me) to call our callback, every this kind of call contains one call op, which is also a point to refresh/revalidate the TCG globals. So If we do global code hook or maybe only hook the portion of codes which can trigger this bug, can effectively 'disable' or 'hide' it.

@JCYang
Copy link
Contributor Author

JCYang commented Dec 23, 2015

But I don't think global hook can be used as a workaround, because global hook, first will slow down the emulation greatly, second, global hook has the highest priority in Unicorn, it will effectively disable all other hooks, force the user to write all hook logic in one place can be annoying, and sometimes even possible, is awful.

@aquynh
Copy link
Member

aquynh commented Dec 28, 2015

merged the pull request to fix this issue, thanks!

@13824125580
Copy link

Finally solved... a bug hidden very deep.
I think I need to elaborate it here before I send a pull request.(hell... I don't know how to send a git pull request yet...will learn very soon>_<)

It's a bug introduced by Unicorn modification to TCG(that's why it's not in qemu, which will surely be encountered and fixed already), in the tcg_liveness_analysis.
Unicorn prevent all temps liveness status to refresh for brcond op, for the reason in the comment, but this solution will make tcg global sometimes not synced properly(in tcg_reg_alloc_op(), especially probably the mostly hurt are the four flags register, ZF, CF, VF, NF. Nearly every ARM code can be executed conditionally, which will introduced such kind of scenario.
Consider the following ARM code(not THUMB):
...
CMP R0,#1
MOV R1,R1,LSR#4
SUBGE R2,R2,#4
BGE somewhere
...
which will be translated to TCG ops like this(after tcg_optimize()):

...
set_label $0x4
movi_i32 tmp8,$0x1
mov_i32 tmp9,r0
sub_i32 NF,r0,tmp8 // Point C
mov_i32 ZF,NF
setcond_i32 CF,r0,tmp8,geu
xor_i32 VF,NF,r0
xor_i32 tmp10,r0,tmp8
and_i32 VF,VF,tmp10
mov_i32 tmp9,NF
mov_i32 tmp8,r1
movi_i32 tmp9,$0x4
shr_i32 tmp8,r1,tmp9
mov_i32 r1,tmp8
xor_i32 tmp8,VF,NF
movi_i32 tmp9,$0x0
brcond_i32 tmp8,tmp9,lt,$0x5 // Point B
movi_i32 tmp8,$0x4
mov_i32 tmp9,r2
sub_i32 NF,r2,tmp8 // Point A
mov_i32 ZF,NF
setcond_i32 CF,r2,tmp8,geu
xor_i32 VF,NF,r2
xor_i32 tmp10,r2,tmp8
and_i32 VF,VF,tmp10
mov_i32 tmp9,NF
mov_i32 r2,ZF
set_label $0x5
xor_i32 tmp8,VF,NF
movi_i32 tmp9,$0x0
brcond_i32 tmp8,tmp9,lt,$0x6
goto_tb $0x0
movi_i32 pc,somewhere
exit_tb $0x181141a0
...

Since liveness analysis is backward, first in Point A, NF as output operand, is sentenced to death and got the corresponding In-Memory bit clear, previous op will not sync result to it if it's used as output operand again, without In-Memory state refresh, that is, Point C will failed to remember NF(in the generated host code, discard), and then in label $0x5, condition check will be wrong if the subtraction is skipped. In Qemu, the brcond in Point B comes to rescue, but Unicorn prevent it altogether. IIUC, we should prevent only dead_temps refresh for brcond, and keep refresh TCG globals In-Memory states for brcond op.

At least after the change I described above, my codes work properly, the bug is gone. I will learn to send a pull request ASAP. Any comments are welcome.

how to get the tcg code like that? it is costly to type these codes b hands. @JCYang

@aquynh
Copy link
Member

aquynh commented Dec 14, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants