Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicorn Preload #1356

Merged
merged 74 commits into from Feb 26, 2019

Conversation

2 participants
@ehennenfent
Copy link
Contributor

commented Jan 17, 2019

Modifies the Unicorn emulator module to allow it to be used for "preloading" large binaries. It uses Unicorn to execute x64 instructions in bulk while Manticore handles IO and syscalls. State changes are aggressively written from Manticore back to Unicorn, and lazily written from Unicorn to Manticore before a syscall.

The following script demonstrates this. The user can register a plugin at startup that tells Manticore to use Unicorn to quickly execute the initialization instructions leading up to the start of main. The binary in question is multiple-styles from an old example.

from manticore.native import Manticore
from manticore.core.plugin import Plugin

address_of_main = 0x4009ae

class concretePlugin(Plugin):

    def will_start_run_callback(self, state, *_args):
        state.cpu.emulate_until(address_of_main)

m = Manticore("multiple-styles", concrete_start='coldlikeminisoda')
m.register_plugin(concretePlugin())
m.run()

Once main is reached, Manticore takes over and continues execution as normal. In the case of this example, the full solution is coldlikeminisodas, so Manticore generates two different test cases for the remaining byte. The performance improvement for this example is only marginal because the bulk of the time is taken up by the solver, and relatively little initialization is required before executing main. However, more complex binaries can see very significant speedups.

There are still some limitations, but work on those will come in future PR's. They include:

  • Unicorn can't handle symbolic data, so we skip over writing it back. We should instead try to concretize it to a single value, and only continue executing in Unicorn if we can do that successfully.
  • Full memory maps are often copied lazily into Unicorn, and unmaps and permissions changes are not well modeled, which can lead to Unicorn's memory maps becoming desynchronized from Manticore.
  • Copying large amounts of data between Manticore and Unicorn can be dismally slow. A number of optimizations could improve this, including mapping files directly into Unicorn's memory rather than copying Manticore's memory maps.

This PR also boosts unimplemented linux system calls to warnings, which I've found to be a helpful UX change when debugging.


This change is Reviewable

ehennenfent added some commits Oct 25, 2017

Push to github
Partially working implementation of unicorn for instruction emulation.
Still not working:
Mapping symbolic memory
Global Descriptor tables
Fixed import errors
It still doesn't run, but we get a _little_ farther
Fix runtime errors
Enough to execute a few instructions now, though not enough to completely analyze everything.
Execute to completion
Runs multiple-styles without failing
Update memory mapping techniques
Still copying the maps into unicorn instead of mapping the pointers, but right now it only takes a fraction of a second, so we'll let it slide for now.

ehennenfent and others added some commits Jan 23, 2019

Apply suggestions from code review
Thanks Dominik!

Co-Authored-By: ehennenfent <ecapstone@gmail.com>
Bump Travis to Py 3.7
I just wanna see what happens
Fix updated Capstone register limits
In 4.01, they added a handy REG_ENDING so that we wouldn't have to hard-code the final register
Show resolved Hide resolved manticore/native/cpu/abstractcpu.py Outdated
Show resolved Hide resolved manticore/native/cpu/abstractcpu.py Outdated
"""
map = self.memory.map_containing(where)
start = map._get_offset(where)
if type(map) is FileMap:

This comment has been minimized.

Copy link
@disconnect3d

disconnect3d Feb 15, 2019

Member

Is there any reason to not use isinstance here?

If we want to stick with type(map) we might want to assign it to a variable as we calculate it twice in a worst case.

This comment has been minimized.

Copy link
@ehennenfent

ehennenfent Feb 15, 2019

Author Contributor

It does make sense to use type in this case because we access protected members of the FileMap that aren't guaranteed to behave the same way in a subclass. Strictly speaking, the right way to do this is probably to add unprotected bulk read and write functions to filemaps and anonmaps, but for now I'll just save the type in a variable so we won't need to calculate it twice.

Show resolved Hide resolved manticore/native/cpu/abstractcpu.py Outdated
Show resolved Hide resolved manticore/native/cpu/abstractcpu.py
@@ -540,6 +547,9 @@ def page_mask(self):
def maps(self):
return self._maps

def setCPU(self, cpu):
self.cpu = cpu

This comment has been minimized.

Copy link
@disconnect3d

disconnect3d Feb 15, 2019

Member

Why don't we just do mem.cpu = cpu instead of mem.setCPU(cpu)?

Also, might be good to add some explanation/comment why do we need to do it in the first place :S.

This comment has been minimized.

Copy link
@ehennenfent

ehennenfent Feb 15, 2019

Author Contributor

Good point. A getter/setter would make more sense.

Show resolved Hide resolved manticore/platforms/linux.py Outdated
@@ -387,6 +392,7 @@ class Linux(Platform):

# from /usr/include/asm-generic/resource.h
RLIMIT_NOFILE = 7 # /* max number of open files */
RLIMIT_STACK = 3

This comment has been minimized.

Copy link
@disconnect3d
return 0

def sys_rt_sigprocmask(self, cpu, how, newset, oldset):
'''Wrapper for sys_sigprocmask'''
return self.sys_sigprocmask(cpu, how, newset, oldset)

def sys_sigprocmask(self, cpu, how, newset, oldset):
logger.debug(f"SIGACTION, Ignoring changing signal mask set cmd:{how}", )
logger.warning(f"SIGACTION, Ignoring changing signal mask set cmd:{how}", )

This comment has been minimized.

Copy link
@disconnect3d

disconnect3d Feb 15, 2019

Member

Since this is logging, we should rather use it's internal formatting (i.e. we don't want to format if we discard the log later on). This being said if we change it to warning it's rather okay to format it right away?

Show resolved Hide resolved manticore/utils/fallback_emulator.py Outdated
@ehennenfent

This comment has been minimized.

Copy link
Contributor Author

commented Feb 15, 2019

Sample invocation using the multiple-styles example

from manticore.native import Manticore
from manticore.core.plugin import Plugin

class concretePlugin(Plugin):

    def will_start_run_callback(self, state, *_args):
        state.cpu.emulate_until(0x4009ae)

m = Manticore("multiple-styles", concrete_start='coldlikeminisoda')
m.register_plugin(concretePlugin())
m.verbosity(2)
m.run()

disconnect3d and others added some commits Feb 15, 2019

Apply suggestions from code review
More suggestions still to be applied, these are just the ones I could auto-accept from Github

Co-Authored-By: ehennenfent <ecapstone@gmail.com>
Add partial suggestion implementations
Seems to be something broken with memory right now
Partial fix for setCPU suggestion
A getter/setter would be more idiomatic, but for some reason it doesn't work....

@ehennenfent ehennenfent moved this from In progress to Review Needed in Manticore Releases Feb 19, 2019

ehennenfent added some commits Feb 19, 2019

Fix clone/ptregs return code
Not all syscalls can be cheez'd to 0 apparently
Made __repr__ idiomatic
Angle brackets should surround anything you can't construct directly

@ehennenfent ehennenfent merged commit bc77660 into master Feb 26, 2019

4 checks passed

ci/dockercloud Your image was successfully built in Docker Cloud
Details
codeclimate 2 fixed issues
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

Manticore Releases automation moved this from Review Needed to Included in 0.2.4! Feb 26, 2019

@ehennenfent ehennenfent deleted the dev-unicorn-revival branch Feb 26, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.