Weird bug in ARM emulation(investigating but need some help about code behaviour) #287

JCYang · 2015-12-02T10:17:34Z

I emulated some ARMv7(thumb and ARM mixed) code and tried to exam the memory status at some point.
In order to emulate a known imported function, I hooked the entry of the lazy link PLT block(uc_hook_add + UC_HOOK_CODE) of the function and patched it to BX LR(I write r0 in my emulation codes), and it worked.
But a bug found, emulator seems to halt at a certain point later, yet the CPU usage indicate it is still live and running. In order to investigate what's wrong, I tried adding global instruction hook which simply print the instruction ran, in the callback of my patched hook. And weird thing happen, this very bug got "fixed", the engine continue to run to the "until" instruction I specified.

Some code snippet might better tell the story:

def GlobalHook(uc_emu, address, code_size, usr_data):
pass # even place holder helps!!!

def EmulateFunc(uc_emu, address, code_size, usr_data):
r0 = uc_emu.reg_read(UC_ARM_REG_R0)
# some calculation here
uc_emu.reg_write(UC_ARM_REG_R0, result)
uc_emu.hook_add(UC_HOOK_CODE, GlobalHook) # without this hook, the engine will halt at a certain point later

I've tried to read and modify Unicorn to help diagnose this bug, so I know it halt at a certain point.
And I guess, the hook callback codes make a different to the internal states of the engine, but I'm not familiar with Unicorn codebase, and the code generation engine it used, when I try to look for gen_helper_uc_tracecode from the source, only one invoke location and the definition is nowhere.

I'm continue investigating this bug and would like to fix it, but I really need some help from you, thank you.

aquynh · 2015-12-02T11:16:15Z

Can you please put together a sample code exposing this issue, and send in
a pull request to put it under tests/regress/ or tests/unit/ ?

Thanks.

JCYang · 2015-12-02T14:30:06Z

I'm sorry that I can't provide the codes which trigger this bug for some reasons, and I don't know how to properly construct a minimal test case. So I asked for help, try to find and fix it locally. Of course I will send a pull request once I identify and fix it, if I can't fix it by myself, I will also share my discover with you here.

aquynh · 2015-12-03T01:58:08Z

if you can narrow down the input to just some bytes triggering the issue, i think you can show it without disclosing anything.

JCYang · 2015-12-03T07:33:06Z

@aquynh Once I know the range of bytes trigger this issue, I will share it.

Now some new discovery, I found the point probably should be responsible for this issue or somehow relate. I attach to the thread continue occupy the CPU(one core full load), and the stack trace is like this:

(gdb) bt
#0 0x051cd70b in static_code_gen_buffer ()
from C:\Python27\lib\site-packages\unicorn\unicorn.dll
#1 0x00000015 in ?? ()
#2 0x02b0b83e in save_globals_arm (s=0x0, allocated_regs=27)
at E:/MyProjects/unicorn/qemu/tcg/tcg.c:1950
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

I run the same debug session in a linux build and the backtrace is also similar, gdb complain stack corrupt.

(gdb) thread 2
[Switching to thread 2 (Thread 0x7f2c68b03700 (LWP 6064))]
#0 0x00007f2c6d86ed7e in static_code_gen_buffer () from /usr/lib/libunicorn.so
(gdb) bt
#0 0x00007f2c6d86ed7e in static_code_gen_buffer () from /usr/lib/libunicorn.so
#1 0x00007f2c6a70b0f0 in ?? ()
#2 0x000000000000001c in ?? ()
#3 0x0000000010000000 in ?? ()
#4 0x00007f2c6a70b0f0 in ?? ()
#5 0x00007f2c68b02970 in ?? ()
#6 0x00007f2c6af38819 in tcg_temp_free_internal_arm (
s=<error reading variable: Cannot access memory at address 0xffffffffffffffe8>,
idx=<error reading variable: Cannot access memory at address 0xffffffffffffffe4>) at /home/jcyang/development/unicorn/qemu/tcg/tcg.c:628
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

And both point that static_code_gen_buffer() as the top of the stack.

aquynh · 2015-12-04T02:53:43Z

you can find out which instruction triggers the crash by hooking every instructions, and print out its bytes before the instruction is executed.

this is possible by having a callback like this: https://github.com/unicorn-engine/unicorn/blob/master/samples/sample_x86.c#L41. then you have to register the callback before emulation, with UC_HOOK_CODE hook, like this https://github.com/unicorn-engine/unicorn/blob/master/samples/sample_x86.c#L205

JCYang · 2015-12-05T08:35:03Z

@aquynh Not crash, my first post has detail description of this issue.
And now I approach the root by several more steps,
it is tcg_qemu_tb_exec() in static tcg_target_ulong cpu_tb_exec(CPUState *cpu, uint8_t *tb_ptr) continue running and not return, cpu is full loaded, seems like wrong code generated result in infinite loop in the TB? I would like to print the codes in the TB for further analyze but add codes to cpu-exec.c is really difficult, how can I tell unicorn to build my .c and link to it at last?

JCYang · 2015-12-05T08:50:26Z

hmm... gdb can help me better... I think I don't need to modify the code~

JCYang · 2015-12-19T04:57:06Z

Finally solved... a bug hidden very deep.
I think I need to elaborate it here before I send a pull request.(hell... I don't know how to send a git pull request yet...will learn very soon>_<)

It's a bug introduced by Unicorn modification to TCG(that's why it's not in qemu, which will surely be encountered and fixed already), in the tcg_liveness_analysis.
Unicorn prevent all temps liveness status to refresh for brcond op, for the reason in the comment, but this solution will make tcg global sometimes not synced properly(in tcg_reg_alloc_op(), especially probably the mostly hurt are the four flags register, ZF, CF, VF, NF. Nearly every ARM code can be executed conditionally, which will introduced such kind of scenario.
Consider the following ARM code(not THUMB):
...
CMP R0,#1
MOV R1,R1,LSR#4
SUBGE R2,R2,#4
BGE somewhere
...
which will be translated to TCG ops like this(after tcg_optimize()):

...
set_label $0x4
movi_i32 tmp8,$0x1
mov_i32 tmp9,r0
sub_i32 NF,r0,tmp8 // Point C
mov_i32 ZF,NF
setcond_i32 CF,r0,tmp8,geu
xor_i32 VF,NF,r0
xor_i32 tmp10,r0,tmp8
and_i32 VF,VF,tmp10
mov_i32 tmp9,NF
mov_i32 tmp8,r1
movi_i32 tmp9,$0x4
shr_i32 tmp8,r1,tmp9
mov_i32 r1,tmp8
xor_i32 tmp8,VF,NF
movi_i32 tmp9,$0x0
brcond_i32 tmp8,tmp9,lt,$0x5 // Point B
movi_i32 tmp8,$0x4
mov_i32 tmp9,r2
sub_i32 NF,r2,tmp8 // Point A
mov_i32 ZF,NF
setcond_i32 CF,r2,tmp8,geu
xor_i32 VF,NF,r2
xor_i32 tmp10,r2,tmp8
and_i32 VF,VF,tmp10
mov_i32 tmp9,NF
mov_i32 r2,ZF
set_label $0x5
xor_i32 tmp8,VF,NF
movi_i32 tmp9,$0x0
brcond_i32 tmp8,tmp9,lt,$0x6
goto_tb $0x0
movi_i32 pc,somewhere
exit_tb $0x181141a0
...

Since liveness analysis is backward, first in Point A, NF as output operand, is sentenced to death and got the corresponding In-Memory bit clear, previous op will not sync result to it if it's used as output operand again, without In-Memory state refresh, that is, Point C will failed to remember NF(in the generated host code, discard), and then in label $0x5, condition check will be wrong if the subtraction is skipped. In Qemu, the brcond in Point B comes to rescue, but Unicorn prevent it altogether. IIUC, we should prevent only dead_temps refresh for brcond, and keep refresh TCG globals In-Memory states for brcond op.

At least after the change I described above, my codes work properly, the bug is gone. I will learn to send a pull request ASAP. Any comments are welcome.

JCYang · 2015-12-19T05:07:55Z

This bug is really deep and very subtle, in the process of tracing it, it teach me many things about TCG... hooray for the solution...

aquynh · 2015-12-19T05:42:19Z

looks interesting, but with your fix, does everything else (especially those under tests/unit/ & tests/regress/) still works as before?

here is what you should do to make a pull request. (i suppose you are using commandline on Mac or Linux)

fork Unicorn again on Github website (so you have a fork at https://github.com/your-id/unicorn)
clone this fork to your local machine
$ git clone https://github.com/your-id/unicorn.git
make a new branch in this fork on your local machine:
$ git checkout -b mybranch
modify source code for your fix ...
commit this fix to your local git repo:
$ git commit -a -m "commit description ...."
push to your fork on Github
$ git push origin mybranch
go to your fork on Github website (https://github.com/your-id/unicorn), then do pull request on this "mybranch" from Github website to the "master" branch of https://github.com/unicorn-engine/unicorn

you can repeat all the steps above for each pull request, but with different branch name each time

JCYang · 2015-12-19T07:04:21Z

Can I do the unit test in Windows? the default build configure of windows does not contain unit test code built, is it possible to build the unit test in Windows?
Though I can try it inside a VM...

aquynh · 2015-12-19T07:06:48Z

yes you can try on Windows, but everything is much easier on Unix.

JCYang · 2015-12-21T02:35:37Z

It's not as fluid as it should be. I try to build the unit test(tests/unit) in a Ubuntu box, the linker still complained symbol not found(all are symbols from cmocka, IIUC. I encountered the same problem in Windows with mingw, so I gave up there and move to a Ubuntu box) . I've built cmocka-1.0.1 and install it already, and I even read the Makefile, know it include the root of unicorn directory as a library search directory, so I manually copy libcmocka.so(and the sym links) to here, and it still doesn't work. anything wrong?

JCYang · 2015-12-21T07:45:51Z

The patch is ready for test, but since test can not be built successfully, I'm not sure enough whether it will break them or not.

aquynh · 2015-12-21T08:33:50Z

On Dec 21, 2015 3:45 PM, "JCYang" notifications@github.com wrote:

The patch is ready for test, but since test can not be built successfully, I'm not sure enough whether it will break them or not.

Ok, just send a pull request so we can review and test it. Thanks.

the tcg_liveness_analysis(). Refer to unicorn-engine#287 for further info.

farmdve · 2015-12-22T09:56:16Z

I did have a problem with the ZF flag not being cleared when it should, perhaps it's related to this?

JCYang · 2015-12-23T01:00:54Z

@farmdve Maybe, any code result in brcond in the middle of a TB may introduce a scenario trigger this bug, I haven't look into x86 translator, though IIRC no condition-per-instruction code there. I've added a test case for this bug yesterday which contain detail explanation, which can help you better understand this bug.

JCYang · 2015-12-23T01:39:52Z

Some words about the initial weird behavior I found:
A global code hook(even an empty one as place holder) effectively 'disable' this bug because, Unicorn implement code hook by inserting helper function into the TCG ops before every instruction-translated-TCG-code-block(this is a term named by me) to call our callback, every this kind of call contains one call op, which is also a point to refresh/revalidate the TCG globals. So If we do global code hook or maybe only hook the portion of codes which can trigger this bug, can effectively 'disable' or 'hide' it.

JCYang · 2015-12-23T01:48:55Z

But I don't think global hook can be used as a workaround, because global hook, first will slow down the emulation greatly, second, global hook has the highest priority in Unicorn, it will effectively disable all other hooks, force the user to write all hook logic in one place can be annoying, and sometimes even possible, is awful.

aquynh · 2015-12-28T06:06:38Z

merged the pull request to fix this issue, thanks!

the tcg_liveness_analysis(). Refer to unicorn-engine#287 for further info.

13824125580 · 2018-12-14T02:35:51Z

Finally solved... a bug hidden very deep.
I think I need to elaborate it here before I send a pull request.(hell... I don't know how to send a git pull request yet...will learn very soon>_<)

It's a bug introduced by Unicorn modification to TCG(that's why it's not in qemu, which will surely be encountered and fixed already), in the tcg_liveness_analysis.
Unicorn prevent all temps liveness status to refresh for brcond op, for the reason in the comment, but this solution will make tcg global sometimes not synced properly(in tcg_reg_alloc_op(), especially probably the mostly hurt are the four flags register, ZF, CF, VF, NF. Nearly every ARM code can be executed conditionally, which will introduced such kind of scenario.
Consider the following ARM code(not THUMB):
...
CMP R0,#1
MOV R1,R1,LSR#4
SUBGE R2,R2,#4
BGE somewhere
...
which will be translated to TCG ops like this(after tcg_optimize()):

...
set_label $0x4
movi_i32 tmp8,$0x1
mov_i32 tmp9,r0
sub_i32 NF,r0,tmp8 // Point C
mov_i32 ZF,NF
setcond_i32 CF,r0,tmp8,geu
xor_i32 VF,NF,r0
xor_i32 tmp10,r0,tmp8
and_i32 VF,VF,tmp10
mov_i32 tmp9,NF
mov_i32 tmp8,r1
movi_i32 tmp9,$0x4
shr_i32 tmp8,r1,tmp9
mov_i32 r1,tmp8
xor_i32 tmp8,VF,NF
movi_i32 tmp9,$0x0
brcond_i32 tmp8,tmp9,lt,$0x5 // Point B
movi_i32 tmp8,$0x4
mov_i32 tmp9,r2
sub_i32 NF,r2,tmp8 // Point A
mov_i32 ZF,NF
setcond_i32 CF,r2,tmp8,geu
xor_i32 VF,NF,r2
xor_i32 tmp10,r2,tmp8
and_i32 VF,VF,tmp10
mov_i32 tmp9,NF
mov_i32 r2,ZF
set_label $0x5
xor_i32 tmp8,VF,NF
movi_i32 tmp9,$0x0
brcond_i32 tmp8,tmp9,lt,$0x6
goto_tb $0x0
movi_i32 pc,somewhere
exit_tb $0x181141a0
...

Since liveness analysis is backward, first in Point A, NF as output operand, is sentenced to death and got the corresponding In-Memory bit clear, previous op will not sync result to it if it's used as output operand again, without In-Memory state refresh, that is, Point C will failed to remember NF(in the generated host code, discard), and then in label $0x5, condition check will be wrong if the subtraction is skipped. In Qemu, the brcond in Point B comes to rescue, but Unicorn prevent it altogether. IIUC, we should prevent only dead_temps refresh for brcond, and keep refresh TCG globals In-Memory states for brcond op.

At least after the change I described above, my codes work properly, the bug is gone. I will learn to send a pull request ASAP. Any comments are welcome.

how to get the tcg code like that? it is costly to type these codes b hands. @JCYang

aquynh · 2018-12-14T02:44:32Z

can you propose a fix, with pull req?

JCYang added a commit to JCYang/unicorn that referenced this issue Dec 21, 2015

Fix possible wrong conditional branch in generated host code by fixing

8ef018a

the tcg_liveness_analysis(). Refer to unicorn-engine#287 for further info.

JCYang mentioned this issue Dec 21, 2015

Fix possible wrong conditional branch in generated host code by fixing the tcg_liveness_analysis(). #333

Merged

aquynh closed this as completed Dec 28, 2015

farmdve pushed a commit to farmdve/unicorn that referenced this issue Dec 28, 2015

Fix possible wrong conditional branch in generated host code by fixing

6f0a45e

the tcg_liveness_analysis(). Refer to unicorn-engine#287 for further info.

ekse mentioned this issue Apr 28, 2018

Methods that alter the unicorn instance state now require mut references unicorn-rs/unicorn-rs#30

Closed

pgarba mentioned this issue Aug 16, 2018

X86_64 code behaves differnt execute single stepped compared to just let it run #999

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weird bug in ARM emulation(investigating but need some help about code behaviour) #287

Weird bug in ARM emulation(investigating but need some help about code behaviour) #287

JCYang commented Dec 2, 2015

aquynh commented Dec 2, 2015

JCYang commented Dec 2, 2015

aquynh commented Dec 3, 2015

JCYang commented Dec 3, 2015

aquynh commented Dec 4, 2015

JCYang commented Dec 5, 2015

JCYang commented Dec 5, 2015

JCYang commented Dec 19, 2015

JCYang commented Dec 19, 2015

aquynh commented Dec 19, 2015

JCYang commented Dec 19, 2015

aquynh commented Dec 19, 2015

JCYang commented Dec 21, 2015

JCYang commented Dec 21, 2015

aquynh commented Dec 21, 2015

farmdve commented Dec 22, 2015

JCYang commented Dec 23, 2015

JCYang commented Dec 23, 2015

JCYang commented Dec 23, 2015

aquynh commented Dec 28, 2015

13824125580 commented Dec 14, 2018

aquynh commented Dec 14, 2018 via email

Weird bug in ARM emulation(investigating but need some help about code behaviour) #287

Weird bug in ARM emulation(investigating but need some help about code behaviour) #287

Comments

JCYang commented Dec 2, 2015

aquynh commented Dec 2, 2015

JCYang commented Dec 2, 2015

aquynh commented Dec 3, 2015

JCYang commented Dec 3, 2015

aquynh commented Dec 4, 2015

JCYang commented Dec 5, 2015

JCYang commented Dec 5, 2015

JCYang commented Dec 19, 2015

JCYang commented Dec 19, 2015

aquynh commented Dec 19, 2015

JCYang commented Dec 19, 2015

aquynh commented Dec 19, 2015

JCYang commented Dec 21, 2015

JCYang commented Dec 21, 2015

aquynh commented Dec 21, 2015

farmdve commented Dec 22, 2015

JCYang commented Dec 23, 2015

JCYang commented Dec 23, 2015

JCYang commented Dec 23, 2015

aquynh commented Dec 28, 2015

13824125580 commented Dec 14, 2018

aquynh commented Dec 14, 2018 via email