Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
src
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

mishegos

Build Status

A differential fuzzer for x86 decoders.

mishegos

Usage

Start with a clone, including submodules:

git clone --recurse-submodules https://github.com/trailofbits/mishegos

Building

mishegos is most easily built within Docker:

docker build -t mishegos .

Alternatively, you can try building it directly.

Make sure you have binutils-dev (or however your system provides libopcodes) installed:

make
# or
make debug

Running

Run the fuzzer for a bit:

./src/mishegos/mishegos ./workers.spec > /tmp/mishegos

mishegos checks for three environment variables:

  • V=1 enables verbose output on stderr
  • D=1 enables the "dummy" mutation mode for debugging purposes
  • M=1 enables the "manual" mutation mode (i.e., read from stdin)

Convert mishegos's raw output into JSONL suitable for analysis:

./src/mish2jsonl/mish2jsonl /tmp/mishegos > /tmp/mishegos.jsonl

mish2jsonl checks for V=1 to enable verbose output on stderr.

Run an analysis/filter pass group on the results:

./src/analysis/analysis -p same-size-different-decodings < /tmp/mishegos.jsonl > /tmp/mishegos.interesting

Generate an ugly pretty visualization of the filtered results:

./src/mishmat/mishmat < /tmp/mishegos.interesting > /tmp/mishegos.html
open /tmp/mishegos.html

Contributing

We welcome contributors to mishegos!

A guide for adding new disassembler workers can be found here.

Performance notes

All numbers below correspond to the following run:

V=1 timeout 60s ./src/mishegos/mishegos ./workers.spec > /tmp/mishegos

Within Docker:

  • On a Linux server (40 cores, 128GB RAM):
    • 3.5M outputs/minute
    • 5 cores pinned
  • On a 2018 Macbook Pro (2+2 cores, 16GB RAM):
    • 300K outputs/minute
    • (All) 4 cores pinned

TODO

  • Performance improvements
    • Break cohort collection out into a separate process (requires re-addition of semaphores)
    • Maybe use a better data structure for input/output/cohort slots
  • Add a scaling factor for workers, e.g. spawn N of each worker
  • Pre-analysis normalization (whitespace, immediate representation, prefixes)
  • Analysis strategies:
    • Filter by length, decode status discrepancies
    • Easy: lexical comparison
    • Easy: reassembly + effects modeling (maybe with microx?)
  • Scoring ideas:
    • Low value: Flag/prefix discrepancies
    • Medium value: Decode success/failure/crash discrepancies
    • High value: Decode discrepancies with differing control flow, operands, maybe some immediates
  • Visualization ideas:
    • Basic but not really basic: some kind of mouse-over differential visualization
You can’t perform that action at this time.