Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicorn Preload #1356

Merged
merged 74 commits into from Feb 26, 2019
Merged
Show file tree
Hide file tree
Changes from 73 commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
e59b06e
Push to github
Oct 25, 2017
60e6266
Bypass smtlib when initializing memory
Oct 25, 2017
f40577b
Revert "Bypass smtlib when initializing memory"
Oct 25, 2017
cb3b733
Fixed extra boxes checked...
Oct 25, 2017
bf266e6
Save changes to emulator before implementing other concretization tec…
ehennenfent Oct 31, 2017
15fee00
Cleaned up miscellaneous print statements
ehennenfent Nov 2, 2017
8907225
Fixed small merge conflict
ehennenfent Nov 2, 2017
9f60360
Cleaned up implementation selection
ehennenfent Nov 2, 2017
ab48f25
Working FS register in Unicorn
ehennenfent Nov 3, 2017
b0ed8f4
Improved register sync performance
ehennenfent Nov 5, 2017
fc6b1c4
Switched to memory delta model
ehennenfent Nov 8, 2017
dbb9e49
Added timing information
ehennenfent Nov 8, 2017
dc55c78
Stripped print statements
ehennenfent Nov 25, 2017
d5f3f99
Hid concrete mode behind kwarg
ehennenfent Nov 25, 2017
bfb4634
Fixed merge, but not functionality
ehennenfent Nov 25, 2017
267c825
Propagated missing kwargs
ehennenfent Nov 27, 2017
968fca6
Re-implemented abstractcpu changes from master
ehennenfent Nov 28, 2017
f9c81ca
Added transition doc to repo
ehennenfent Dec 10, 2017
d983747
Merge branch 'master' into dev-unicorn-revival
Dec 5, 2018
c9763a7
Fixed import errors
Dec 6, 2018
ae55816
Fix py3 integer division
Dec 6, 2018
bd2777d
Fix runtime errors
Dec 6, 2018
1aaa3e0
Update MSR setting
Dec 7, 2018
d9ff226
Execute to completion
Dec 7, 2018
01b7e5a
Bugfixes before bulk rewrite
Dec 12, 2018
d21c01e
Add memory map event
Dec 14, 2018
c55c0cf
Merge branch 'master' into dev-unicorn-revival
Dec 14, 2018
c16ec49
Refine debug printouts a bit
Dec 14, 2018
45070d9
Update memory mapping techniques
Dec 14, 2018
d20ccbd
Disable write backs during sync
Dec 15, 2018
89a935f
Properly hook syscalls
Jan 9, 2019
f8bf58a
Fixed Unicorn's refusal to stop
Jan 9, 2019
e3c68a7
Upgrade skipped syscalls to warnings
Jan 9, 2019
211e690
Improve debug statement
Jan 9, 2019
7be5206
Merge branch 'master' into dev-unicorn-revival
Jan 9, 2019
83d617e
Restore old argparse behavior
Jan 9, 2019
d9f787b
More syscall warnings
Jan 11, 2019
e14107e
Undo some upstream changes
Jan 11, 2019
87e631a
Fix upstream permissions changes
Jan 11, 2019
547d312
More permisisons
Jan 11, 2019
7127119
Refactor out upstream changes and optimize imports
Jan 11, 2019
003e0d0
Add API for emulating until an address is reached
Jan 12, 2019
195e1ac
Clean up for PR
Jan 12, 2019
c3314b0
Fix Travis failure
Jan 14, 2019
9764e58
Merge branch 'master' into dev-unicorn-revival
Jan 15, 2019
9d0482d
Remove vestigial __cmp__ method
ehennenfent Jan 17, 2019
6958702
Add a few more stubbed-out syscalls
Jan 17, 2019
394b43b
Docstrings for emulate and abstractcpu
Jan 17, 2019
0b10f3b
Add a simple unit test
Jan 17, 2019
499ea0d
Merge branch 'master' into dev-unicorn-revival
Jan 17, 2019
81f6010
Move tests to native directory
Jan 17, 2019
d0b1ea2
Tell codeclimate I'm sorry
Jan 18, 2019
6b77b80
More exhaustive memory callbacks
Jan 23, 2019
16aa817
Check for missing bytes in abstractCPU
Jan 24, 2019
fd821f5
Add more missing syscalls
Feb 14, 2019
a4bbec4
Supporting callbacks in emulate.py
Feb 14, 2019
61ea03c
Add fast writing method for anonmaps
Feb 14, 2019
c1b0a27
Merge branch 'master' into dev-unicorn-revival
Feb 14, 2019
3f4fe74
Apply suggestions from code review
disconnect3d Feb 14, 2019
b766452
Fix CPU being None on manually constructed memories
Feb 14, 2019
2c5f1cc
Limit fast write_bytes to concrete mode
Feb 14, 2019
861d11c
Bump Travis to Py 3.7
Feb 14, 2019
8f8da49
Okay fine, that was a mistake
Feb 14, 2019
55acf73
Fix updated Capstone register limits
Feb 15, 2019
cafea31
Make codeclimate happy
Feb 15, 2019
911127e
Apply suggestions from code review
disconnect3d Feb 15, 2019
fa6572f
Add partial suggestion implementations
Feb 15, 2019
655a463
Partial fix for setCPU suggestion
Feb 15, 2019
14b79f1
Merge branch 'dev-unicorn-revival' of https://github.com/trailofbits/…
Feb 15, 2019
fb85c8f
Downgrade f/madvise to info
Feb 19, 2019
deb302a
More idiomatic emulator initialization
Feb 22, 2019
1848a8a
Add stub prlimit64 implementation
Feb 22, 2019
8f1ca7d
Fix clone/ptregs return code
Feb 22, 2019
66ac42d
Made __repr__ idiomatic
Feb 26, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
6 changes: 3 additions & 3 deletions .travis.yml
Expand Up @@ -4,7 +4,7 @@ os:
- linux
language: python
python:
- 3.6.5
- 3.6.6

stages:
- prepare
Expand All @@ -30,8 +30,8 @@ branches:
cache:
pip: true
directories:
- $HOME/virtualenv/python3.6.5/lib/python3.6/site-packages
- $HOME/virtualenv/python3.6.5/bin/
- $HOME/virtualenv/python3.6.6/lib/python3.6/site-packages
- $HOME/virtualenv/python3.6.6/bin/

jobs:
include:
Expand Down
3 changes: 3 additions & 0 deletions manticore/core/smtlib/expression.py
Expand Up @@ -1026,6 +1026,9 @@ def array(self):
def index(self):
return self.operands[1]

def __repr__(self):
return f"ArraySelect obj with index={self.index}:\n{self.array}"


class BitVecSignExtend(BitVecOperation):
def __init__(self, operand, size_dest, *args, **kwargs):
Expand Down
169 changes: 146 additions & 23 deletions manticore/native/cpu/abstractcpu.py
@@ -1,20 +1,27 @@
import inspect
import io
import logging
import struct
from functools import wraps
from itertools import islice

import io
import struct
import unicorn
from functools import wraps

from .disasm import init_disassembler
from ..memory import ConcretizeMemory, InvalidMemoryAccess, LazySMemory
from ...core.smtlib import BitVec, Operators, Constant, visitors
from ..memory import (
ConcretizeMemory, InvalidMemoryAccess, FileMap, AnonMap
)
from ..memory import LazySMemory
from ...core.smtlib import Expression, BitVec, Operators, Constant
from ...core.smtlib import visitors
from ...core.smtlib.solver import solver
from ...utils.emulate import UnicornEmulator
from ...utils.emulate import ConcreteUnicornEmulator
from ...utils.event import Eventful
from ...utils.fallback_emulator import UnicornEmulator
from ...utils.helpers import issymbolic

from capstone.x86 import X86_REG_ENDING

logger = logging.getLogger(__name__)
register_logger = logging.getLogger(f'{__name__}.registers')

Expand Down Expand Up @@ -139,6 +146,9 @@ def _reg_name(self, reg_id):

:param int reg_id: Register ID
'''
if reg_id >= X86_REG_ENDING:
logger.warning("Trying to get register name for a non-register")
return None
cs_reg_name = self.cpu.instruction.reg_name(reg_id)
if cs_reg_name is None or cs_reg_name.lower() == '(invalid)':
return None
Expand Down Expand Up @@ -442,7 +452,7 @@ class Cpu(Eventful):
'''

_published_events = {'write_register', 'read_register', 'write_memory', 'read_memory', 'decode_instruction',
'execute_instruction'}
'execute_instruction', 'set_descriptor', 'map_memory', 'protect_memory', 'unmap_memory'}

def __init__(self, regfile, memory, **kwargs):
assert isinstance(regfile, RegisterFile)
Expand All @@ -453,6 +463,9 @@ def __init__(self, regfile, memory, **kwargs):
self._instruction_cache = {}
self._icount = 0
self._last_pc = None
self._concrete = kwargs.pop("concrete", False)
self.emu = None
self._break_unicorn_at = None
if not hasattr(self, "disasm"):
self.disasm = init_disassembler(self._disasm, self.arch, self.mode)
# Ensure that regfile created STACK/PC aliases
Expand All @@ -466,15 +479,19 @@ def __getstate__(self):
state['icount'] = self._icount
state['last_pc'] = self._last_pc
state['disassembler'] = self._disasm
state['concrete'] = self._concrete
state['break_unicorn_at'] = self._break_unicorn_at
return state

def __setstate__(self, state):
Cpu.__init__(self, state['regfile'],
state['memory'],
disasm=state['disassembler'])
disasm=state['disassembler'], concrete=state['concrete'])
self._icount = state['icount']
self._last_pc = state['last_pc']
self._disasm = state['disassembler']
self._concrete = state['concrete']
self._break_unicorn_at = state['break_unicorn_at']
super().__setstate__(state)

@property
Expand Down Expand Up @@ -563,6 +580,18 @@ def __setattr__(self, name, value):
except AttributeError:
object.__setattr__(self, name, value)

def emulate_until(self, target: int):
"""
Tells the CPU to set up a concrete unicorn emulator and use it to execute instructions
until target is reached.

:param target: Where Unicorn should hand control back to Manticore. Set to 0 for all instructions.
"""
self._concrete = True
self._break_unicorn_at = target
if self.emu:
self.emu._stop_at = target
ehennenfent marked this conversation as resolved.
Show resolved Hide resolved

#############################
# Memory access
@property
Expand All @@ -589,6 +618,40 @@ def write_int(self, where, expression, size=None, force=False):

self._publish('did_write_memory', where, expression, size)

def _raw_read(self, where: int, size=1) -> bytes:
"""
Selects bytes from memory. Attempts to do so faster than via read_bytes.

:param where: address to read from
:param size: number of bytes to read
:return: the bytes in memory
"""
map = self.memory.map_containing(where)
start = map._get_offset(where)
mapType = type(map)
if mapType is FileMap:
end = map._get_offset(where + size)

if end > map._mapped_size:
logger.warning(f"Missing {end - map._mapped_size} bytes at the end of {map._filename}")

raw_data = map._data[map._get_offset(where): min(end, map._mapped_size)]
if len(raw_data) < end:
raw_data += b'\x00' * (end - len(raw_data))

data = b''
for offset in sorted(map._overlay.keys()):
data += raw_data[len(data):offset]
data += map._overlay[offset]
data += raw_data[len(data):]

elif mapType is AnonMap:
data = bytes(map._data[start:start + size])
else:
data = b''.join(self.memory[where:where + size])
assert len(data) == size, 'Raw read resulted in wrong data read which should never happen'
return data

def read_int(self, where, size=None, force=False):
'''
Reads int from memory
Expand Down Expand Up @@ -620,8 +683,28 @@ def write_bytes(self, where, data, force=False):
:type data: str or list
:param force: whether to ignore memory permissions
'''
for i in range(len(data)):
self.write_int(where + i, Operators.ORD(data[i]), 8, force)

mp = self.memory.map_containing(where)
# TODO (ehennenfent) - fast write can have some yet-unstudied unintended side effects.
# At the very least, using it in non-concrete mode will break the symbolic strcmp/strlen models. The 1024 byte
# minimum is intended to minimize the potential effects of this by ensuring that if there _are_ any other
# issues, they'll only crop up when we're doing very large writes, which are fairly uncommon.
can_write_raw = type(mp) is AnonMap and \
isinstance(data, (str, bytes)) and \
(mp.end - mp.start + 1) >= len(data) >= 1024 and \
ehennenfent marked this conversation as resolved.
Show resolved Hide resolved
not issymbolic(data) and \
self._concrete

if can_write_raw:
logger.debug("Using fast write")
offset = mp._get_offset(where)
if isinstance(data, str):
data = bytes(data.encode('utf-8'))
mp._data[offset:offset + len(data)] = data
self._publish('did_write_memory', where, data, 8 * len(data))
ehennenfent marked this conversation as resolved.
Show resolved Hide resolved
else:
for i in range(len(data)):
self.write_int(where + i, Operators.ORD(data[i]), 8, force)

def read_bytes(self, where, size, force=False):
'''
Expand Down Expand Up @@ -778,7 +861,7 @@ def decode_instruction(self, pc):
policy='INSTRUCTION')
text += c

#Pad potentially incomplete instruction with zeroes
# Pad potentially incomplete instruction with zeroes
code = text.ljust(self.max_instr_width, b'\x00')

try:
Expand Down Expand Up @@ -840,17 +923,25 @@ def execute(self):
register_logger.debug(l)

try:
implementation = getattr(self, name, None)

if implementation is not None:
implementation(*insn.operands)

else:
text_bytes = ' '.join('%02x' % x for x in insn.bytes)
logger.warning("Unimplemented instruction: 0x%016x:\t%s\t%s\t%s",
insn.address, text_bytes, insn.mnemonic, insn.op_str)
if self._concrete and 'SYSCALL' in name:
self.emu.sync_unicorn_to_manticore()
if self._concrete and 'SYSCALL' not in name:
ehennenfent marked this conversation as resolved.
Show resolved Hide resolved
self.emulate(insn)
if self.PC == self._break_unicorn_at:
logger.debug("Switching from Unicorn to Manticore")
self._break_unicorn_at = None
self._concrete = False
else:
implementation = getattr(self, name, None)

if implementation is not None:
implementation(*insn.operands)

else:
text_bytes = ' '.join('%02x' % x for x in insn.bytes)
logger.warning("Unimplemented instruction: 0x%016x:\t%s\t%s\t%s",
insn.address, text_bytes, insn.mnemonic, insn.op_str)
self.backup_emulate(insn)
except (Interruption, Syscall) as e:
e.on_handled = lambda: self._publish_instruction_as_executed(insn)
raise e
Expand All @@ -868,16 +959,48 @@ def _publish_instruction_as_executed(self, insn):
self._publish('did_execute_instruction', self._last_pc, self.PC, insn)

def emulate(self, insn):
"""
Pick the right emulate function (maintains API compatiblity)

:param insn: single instruction to emulate/start emulation from
"""

if self._concrete:
self.concrete_emulate(insn)
else:
self.backup_emulate(insn)

def concrete_emulate(self, insn):
"""
Start executing in Unicorn from this point until we hit a syscall or reach break_unicorn_at

:param capstone.CsInsn insn: The instruction object to emulate
"""

if not self.emu:
self.emu = ConcreteUnicornEmulator(self)
self.emu._stop_at = self._break_unicorn_at
try:
self.emu.emulate(insn)
except unicorn.UcError as e:
if e.errno == unicorn.UC_ERR_INSN_INVALID:
text_bytes = ' '.join('%02x' % x for x in insn.bytes)
logger.error("Unimplemented instruction: 0x%016x:\t%s\t%s\t%s",
insn.address, text_bytes, insn.mnemonic, insn.op_str)
raise InstructionEmulationError(str(e))

def backup_emulate(self, insn):
'''
If we could not handle emulating an instruction, use Unicorn to emulate
it.

:param capstone.CsInsn instruction: The instruction object to emulate
'''

emu = UnicornEmulator(self)
if not hasattr(self, 'backup_emu'):
self.backup_emu = UnicornEmulator(self)
try:
emu.emulate(insn)
self.backup_emu.emulate(insn)
except unicorn.UcError as e:
if e.errno == unicorn.UC_ERR_INSN_INVALID:
text_bytes = ' '.join('%02x' % x for x in insn.bytes)
Expand All @@ -888,7 +1011,7 @@ def emulate(self, insn):
# We have been seeing occasional Unicorn issues with it not clearing
# the backing unicorn instance. Saw fewer issues with the following
# line present.
del emu
del self.backup_emu

def render_instruction(self, insn=None):
try:
Expand Down
4 changes: 3 additions & 1 deletion manticore/native/cpu/cpufactory.py
Expand Up @@ -11,7 +11,9 @@ class CpuFactory:

@staticmethod
def get_cpu(mem, machine):
return CpuFactory._cpus[machine](mem)
cpu = CpuFactory._cpus[machine](mem)
mem.cpu = cpu
return cpu

@staticmethod
def get_function_abi(cpu, os, machine):
Expand Down
6 changes: 4 additions & 2 deletions manticore/native/cpu/x86.py
Expand Up @@ -715,7 +715,9 @@ def set_descriptor(self, selector, base, limit, perms):
assert base >= 0 and base < (1 << self.address_bit_size)
assert limit >= 0 and limit < 0xffff or limit & 0xfff == 0
# perms ? not used yet Also is not really perms but rather a bunch of attributes
self._publish('will_set_descriptor', selector, base, limit, perms)
self._segments[selector] = (base, limit, perms)
self._publish('did_set_descriptor', selector, base, limit, perms)

def get_descriptor(self, selector):
return self._segments.setdefault(selector, (0, 0xfffff000, 'rwx'))
Expand Down Expand Up @@ -4604,7 +4606,7 @@ def PMAXUB(cpu, dest, src):
PMAXUB: returns maximum of packed unsigned byte integers in the dest operand

Performs a SIMD compare of the packed unsigned byte in the second source operand
and the first source operand and returns the maximum value for each pair of
and the first source operand and returns the maximum value for each pair of
integers to the destination operand.

Example :
Expand Down Expand Up @@ -4981,7 +4983,7 @@ def PSLLD(cpu, op0, op1):
"""
PSLLD: Packed shift left logical with double words

Shifts the destination operand (first operand) to the left by the number of bytes specified
Shifts the destination operand (first operand) to the left by the number of bytes specified
in the count operand (second operand). The empty low-order bytes are cleared (set to all 0s).
If the value specified by the count operand is greater than 15, the destination operand is
set to all 0s. The count operand is an 8-bit immediate.
Expand Down