Incorrect memory view after running self-modifying code #820

jbremer · 2017-05-05T10:19:34Z

Hi!

We're in the process of integrating Unicorn Engine in Cuckoo Sandbox. Its first purpose is unpacking shikata ga nai-encoded (a metasploit encoder) payloads. While working on this we encountered an interesting shellcode sample that behaves incorrectly in Unicorn. Following some additional information.

First of all, find the decoding stub as follows. What's notable (and I'm quite sure also related to the bug) is that the shikata ga nai stub decodes not only the payload, but also parts of the decoder stub.
In this particular sample you'll find that the immediate operand of the loop instruction is decoded during the first xor operation and I suspect that this causes some out-of-sync issues with the tcg.

➜  cuckoo git:(master) ✗ ndisasm -b32 tests/files/shellcode/shikata/5.bin|head -n20
00000000  DBD0              fcmovnbe st0
00000002  D97424F4          fnstenv [esp-0xc]
00000006  5F                pop edi
00000007  B8E67741BC        mov eax,0xbc4177e6
0000000C  31C9              xor ecx,ecx
0000000E  B158              mov cl,0x58
00000010  31471A            xor [edi+0x1a],eax
00000013  03471A            add eax,[edi+0x1a]
00000016  83C704            add edi,byte +0x4
00000019  E213              loop 0x2e
0000001B  8BA93EDB742A      mov ebp,[ecx+0x2a74db3e]
00000021  5F                pop edi
00000022  52                push edx
00000023  91                xchg eax,ecx
00000024  1B5F00            sbb ebx,[edi+0x0]
00000027  D10C6F            ror dword [edi+ebp*2],1
0000002A  43                inc ebx
0000002B  B7A0              mov bh,0xa0
0000002D  0401              add al,0x1
0000002F  2C32              sub al,0x32

Expected output after running the shellcode may be found as follows.

The actual output may be demonstrated by the following script (running unicorn==1.0.0).

import unicorn
import unicorn.x86_const as x86

sc = (
    "dbd0d97424f45fb8e67741bc31c9b15831471a03471a83c704e2138ba93edb742"
    "a5f52911b5f00d10c6f43b7a004012c32688d43f3c7eb6a047bcfed868603ceb7"
    "48560fffb59a5da8b20872dd8f90f9ad1e901e6520b1b0fd7b1132d1f7182c363"
    "dd3c78cc9e201dd32486cd1c091a8d63ae4c024c6fe16561c8b8cf0d72b69003b"
    "adfa0ef0baa512076fde2f8c8e31a6d6b495e28dd58c4e63eacf30dc4e9bdd09e"
    "3c689a39e8c4954170424cd83bef47a0d38fa50609d5708d1720bc6ef22d2b1f0"
    "1e77ed64a22b4210ffda64e0175064e0e7460ca6d7ad862648a641aff7f0917a8"
    "e3b3eec91f12168c2a6f227b61e9d2c6db1664d5b5bf2bb3b0c8388c3cc0a0ea9"
    "c85ca43187344d08b94352419618ff394ff7d2bb777cd31102425e90423679cca"
    "c0ddb5bb2bb7124244495a4b42c95a4f4acc6ccac08bbe9b284a8a11fae2912c8"
    "b0959d08e283f51a92a2e4e44f31286ebdb2ae8efe4170e5e511b2590ed4cb993"
    "1614311fda3c8b573d46323595f0c85c5fe98bc05"
).decode("hex")

def main():
    uc = unicorn.Uc(unicorn.UC_ARCH_X86, unicorn.UC_MODE_32)
    uc.mem_map(0x1000, 0x2000)
    uc.mem_write(0x1000, sc)
    uc.reg_write(x86.UC_X86_REG_ESP, 0x2000)
    uc.emu_start(0x1000, 0, count=0x166)
    out = uc.mem_read(0x1000, 0x2000)

    for x in xrange(len(sc)):
        print "0x%08x: 0x%02x => 0x%02x" % (x, ord(sc[x]), out[x])
    print str(out)

if __name__ == "__main__":
    main()

On one hand you'll see that the remainder of the shellcode is decoded correctly, as you'll be able to find the www3.chrome-up.date string somewhere at the end of it. On the other hand, however, you'll find that the bytes just after the loop instruction aren't decoded properly. The output of the script is as follows.

0x0000000f: 0x58 => 0x58
0x00000010: 0x31 => 0x31
0x00000011: 0x47 => 0x47
0x00000012: 0x1a => 0x1a
0x00000013: 0x03 => 0x03
0x00000014: 0x47 => 0x47
0x00000015: 0x1a => 0x1a
0x00000016: 0x83 => 0x83
0x00000017: 0xc7 => 0xc7
0x00000018: 0x04 => 0x04
0x00000019: 0xe2 => 0xe2  ; the loop instruction
0x0000001a: 0x13 => 0xf5  ; the correct immediate after decoding
0x0000001b: 0x8b => 0x8b  ; definitely not the `cld` instruction (see `x64dbg` screenshot)
0x0000001c: 0xa9 => 0xa9  ; definitely not a `call` instruction
0x0000001d: 0x3e => 0x3e
0x0000001e: 0xdb => 0x00  ; interestingly enough this byte is correct!
0x0000001f: 0x74 => 0x77  ; ^ that's the first byte of the 2nd xor
0x00000020: 0x2a => 0xc1
0x00000021: 0x5f => 0xa5
0x00000022: 0x52 => 0x89  ; and so is this one (first byte, 3rd xor)
0x00000023: 0x91 => 0xeb
0x00000024: 0x1b => 0xb7

At this point I don't know much about the Unicorn internals, but I sure do hope that somebody can pick up this issue! Thanks in advance! :-)

The text was updated successfully, but these errors were encountered:

While working on proper interpretation of shellcode payloads from [1] we found some shikata ga nai-encoded payloads. Using Unicorn Engine we attempt to unpack this first layer to reach the actual shellcode. For now this only works partially due to one or more suspected bugs in Unicorn Engine's support for self-modifying code [2]. [1]: http://researchcenter.paloaltonetworks.com/2017/03/unit42-pulling-back-the-curtains-on-encodedcommand-powershell-attacks/ [2]: unicorn-engine/unicorn#820

jbremer · 2017-05-05T18:02:17Z

This appears to be a duplicate of #364 #562 as far as I can see.
Any update @aquynh @egberts (I'm guessing not)? :-)

egberts · 2017-05-05T18:24:29Z

Yep. Ran into that repeatedly with several self-modifying stack-based code. The problem is basically that QEMU translation cache has a tolerance of 16-byte shallow. That is, if the code modifies within 16-byte reach from its own EIP/PC then QEMU isn't going to 'taint' the cache.

And I've spent about 23 hours on it only to find that this is a design issue.

agraf · 2017-05-05T21:10:25Z

Are you sure it's within 16-bytes, not within the same TB? Maybe something with the logic to abort the currently executing TB on memory writes to itself is broken?

egberts · 2017-05-05T21:39:16Z

Yes, as you described exactly. When the code-being-modified is beyond the reach of the TB cache, then it's flushed in current logic.

The TB cache taint algorithm has been peppered throughout the QEMU source code and it has hard to work the state diagram of what it should be, so it's pretty wonky there.

agraf · 2017-05-05T21:59:57Z

Ideally you should land in tb_invalidate_phys_page_range() which then would detect you're in the same TB and exit the cpu loop forcefully.

We did change the way most cpu loop exiting works a while ago from an agressive longjmp to a softer "check a flag at the beginning of each TB" method. Maybe it broke there.

jbremer · 2017-05-05T23:17:44Z

Thanks for chiming in @agraf. I wanted to add one additional piece of information here which would suggest that Unicorn is at fault here. As mentioned above the xor decoding doesn't work correctly for the first X bytes, but does do its thing correctly for most of the remaining data (hence you will see URLs or hostnames used by the shellcode at the end of it).
To me this indicates that the executed/emulated/tcg'd x86 assembly is interpreted correctly: while the exact basic block of the decoding stub isn't known during the first loop execution (because of xor decoding), it does decode the data properly. The decoding stub is a simple xor loop with feedback (I believe it's called like that, i.e., the next xor key is based on the initial/current key added with the decoded value), as such the entire data stream would be corrupted if an invalid value would be added to the key early on. As we can clearly identify assembly & hostname/URLs, this is not the case.
I hope this example makes sense, but simply put, I think that tcg is seeing the correct bytes and the Unicorn memory API some kind of wrong bytes.

I also noted earlier that 1 out of every 4 bytes (namely, the first byte - in little endian speak) seemed to be correct. Furthermore, when adding a unicorn.UC_HOOK_MEM_WRITE callback, one may find that for every xor operation one 32-bit write is performed as well as 4 8-bit write's, which on its own is weird already.. ;-)
Hopefully that's enough information to further pinpoint the exact issue.

egberts · 2017-05-06T02:20:27Z

I do know that nearby code memory were modified by an XOR being emulated TWICE, hence the appearance that nothing got done and that no invalidation of TB cache were being done but (I think) is needed.

Yeah, emulated TWICE, that is the flaw.

lunixbochs · 2017-05-06T18:18:53Z

Are all XORs pointing into their own TB emulated twice? Can someone come up with a minimal test case for this?

egberts · 2017-05-08T11:50:07Z

Yes. I wrote a test case exactly for that. And it has been accepted and pushed here. Look for the word XOR as part of its source file name

egberts · 2017-05-08T11:52:20Z

More specifically. I remember that the operand2 of XOR got emulated twice

While working on proper interpretation of shellcode payloads from [1] we found some shikata ga nai-encoded payloads. Using Unicorn Engine we attempt to unpack this first layer to reach the actual shellcode. For now this only works partially due to one or more suspected bugs in Unicorn Engine's support for self-modifying code [2]. [1]: http://researchcenter.paloaltonetworks.com/2017/03/unit42-pulling-back-the-curtains-on-encodedcommand-powershell-attacks/ [2]: unicorn-engine/unicorn#820

Coldzer0 · 2018-07-21T17:40:13Z

is this solved if not is there any work done on it ?

i see here
https://bugs.chromium.org/p/project-zero/issues/detail?id=1122

that it's already fixed here
qemu/qemu@30663fd#diff-b01d071f4eb5dc04039d748944225598

i don't know if it's the same issue , but i'll try to test it .

egberts · 2018-07-21T18:33:50Z

The greater than 14 patch basically forces an invalid instruction and which is NOT the behavior of the hardware CPU.

In other word, the patch strengthens QEMU better than hardware which is not what we want for emulation of malicious code on pseudo-real hardware.

egberts · 2018-07-21T18:44:16Z

According to Google Bug Report given earlier, it only impact when running QEMU in TCB mode, and not in KVM mode

This means that ARM-based QEMU cannot emulate ix86 accurately because only TCB mode is supported.

Whereas,Intel-based in KVM mode will properly emulate this.

lunixbochs · 2018-07-21T18:52:10Z

Unicorn doesn't use KVM mode.

egberts · 2018-07-21T19:06:53Z

Right... I mean to say QEMU... re-edited that.

Coldzer0 · 2018-08-31T07:11:58Z

is there any fix yet for this ?!

egberts · 2018-08-31T15:34:02Z

Nope.

Coldzer0 · 2018-09-15T15:19:18Z

can anyone help at least showing me where to look in the code
so i can check if i can do something to solve this !

and @egberts can you please explain why "KVM" mode can emulate this but Unicorn can't
if both run on same CPU emulation core

thanks

egberts · 2018-09-15T16:24:28Z

@Coldzer0, fully detailed in issue #364

egberts · 2018-09-15T16:34:36Z

KVM runs just fine (that Unicorn doesn’t support) but not in QEMU emulation mode (that only Unicorn supports).

The Bug is in emulation mode, specifically on XOR operator modifying within 16-byte region of its own XOR operator (in my supplied test case in this repo under test directory, the IMUL operator) and within the same TLB block.

One can run this same IMUL/XOR code in KVM-emulation mode which I have not done. So the answer, to your question, is I don’t know.

This mode forces EOB generation after each instruction. This is work-around to emulate self-modifying shell-code, maybe also unicorn-engine#820

wtdcode · 2021-04-02T06:29:06Z

Confirmed that it is fixed in UC2. Link to #1217.

Coldzer0 mentioned this issue Sep 22, 2018

Multiply instead of shifting negative values #1020

Closed

alxchk added a commit to alxchk/unicorn that referenced this issue Jul 29, 2019

Add slow_self_unpack mode (x86)

5027421

This mode forces EOB generation after each instruction. This is work-around to emulate self-modifying shell-code, maybe also unicorn-engine#820

alxchk added a commit to alxchk/unicorn that referenced this issue Jul 29, 2019

Add slow_self_unpack mode (x86)

a195b31

This mode forces EOB generation after each instruction. This is work-around to emulate self-modifying shell-code, maybe also unicorn-engine#820

wonderkun mentioned this issue Oct 25, 2020

Incorrect memory view after running self-modifying code qilingframework/qiling#561

Closed

wonderkun pushed a commit to wonderkun/unicorn that referenced this issue Nov 4, 2020

fix issue unicorn-engine#820

5a30a16

wonderkun pushed a commit to wonderkun/unicorn that referenced this issue Nov 4, 2020

fix issue unicorn-engine#820

d5fbfa1

wonderkun mentioned this issue Nov 4, 2020

Fix Incorrect memory view after running self-modifying code #820 #1352

Closed

wtdcode closed this as completed Oct 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect memory view after running self-modifying code #820

Incorrect memory view after running self-modifying code #820

jbremer commented May 5, 2017 •

edited

Loading

jbremer commented May 5, 2017

egberts commented May 5, 2017

agraf commented May 5, 2017

egberts commented May 5, 2017

agraf commented May 5, 2017

jbremer commented May 5, 2017

egberts commented May 6, 2017 •

edited

Loading

lunixbochs commented May 6, 2017

egberts commented May 8, 2017

egberts commented May 8, 2017

Coldzer0 commented Jul 21, 2018

egberts commented Jul 21, 2018

egberts commented Jul 21, 2018 •

edited

Loading

lunixbochs commented Jul 21, 2018 via email

egberts commented Jul 21, 2018

Coldzer0 commented Aug 31, 2018

egberts commented Aug 31, 2018

Coldzer0 commented Sep 15, 2018

egberts commented Sep 15, 2018

egberts commented Sep 15, 2018 •

edited

Loading

wtdcode commented Apr 2, 2021

Incorrect memory view after running self-modifying code #820

Incorrect memory view after running self-modifying code #820

Comments

jbremer commented May 5, 2017 • edited Loading

jbremer commented May 5, 2017

egberts commented May 5, 2017

agraf commented May 5, 2017

egberts commented May 5, 2017

agraf commented May 5, 2017

jbremer commented May 5, 2017

egberts commented May 6, 2017 • edited Loading

lunixbochs commented May 6, 2017

egberts commented May 8, 2017

egberts commented May 8, 2017

Coldzer0 commented Jul 21, 2018

egberts commented Jul 21, 2018

egberts commented Jul 21, 2018 • edited Loading

lunixbochs commented Jul 21, 2018 via email

egberts commented Jul 21, 2018

Coldzer0 commented Aug 31, 2018

egberts commented Aug 31, 2018

Coldzer0 commented Sep 15, 2018

egberts commented Sep 15, 2018

egberts commented Sep 15, 2018 • edited Loading

wtdcode commented Apr 2, 2021

jbremer commented May 5, 2017 •

edited

Loading

egberts commented May 6, 2017 •

edited

Loading

egberts commented Jul 21, 2018 •

edited

Loading

egberts commented Sep 15, 2018 •

edited

Loading