# Fuzzing In the Large

In this chapter we will perform basic steps around FuzzManager, including crash submission and triage as well as coverage measurement tasks.

## Server Setup

This Docker image already has a FuzzManager Demo Server running in the background for you. It will be used during the following exercises. If you are prompted to login at any point in time, use the following credentials:

<center>**Username: demo**</center>
<center>**Password: demo**</center>

At the end of the chapter, you will find a link to setup instructions, in case you would like to run your own instance for future fuzzing work.

In [None]:
import fuzzingbook_utils

In [None]:
import Fuzzer

## Running the Server

In [None]:
import os

In [None]:
if not os.path.exists('FuzzManager'):
    os.system("git clone https://github.com/MozillaSecurity/FuzzManager")
    os.system("pip install -r FuzzManager/server/requirements.txt")
    os.system("cd FuzzManager/server; python manage.py migrate")

In [None]:
import sqlite3

In [None]:
db_connection = sqlite3.connect("FuzzManager/server/db.sqlite3")
db_connection.execute("DELETE FROM crashmanager_crashentry;")
db_connection.commit()

In [None]:
from multiprocessing import Process

In [None]:
def run_fuzzmanager():
    def run_fuzzmanager_forever():
        os.system("python FuzzManager/server/manage.py runserver")
    
    fuzzmanager_process = Process(target=run_fuzzmanager_forever)
    fuzzmanager_process.start()

    return fuzzmanager_process

In [None]:
fuzzmanager_process = run_fuzzmanager()

In [None]:
import time

In [None]:
time.sleep(2)

In [None]:
fuzzmanager_url = "http://127.0.0.1:8000"

In [None]:
from IPython.display import display, Image

In [None]:
from fuzzingbook_utils import HTML, rich_output

In [None]:
from GUIFuzzer import start_webdriver

In [None]:
gui_driver = start_webdriver(headless=False, zoom=1.2)

In [None]:
gui_driver.set_window_size(1400, 600)

In [None]:
gui_driver.get(fuzzmanager_url)

In [None]:
Image(gui_driver.get_screenshot_as_png())

In [None]:
username = gui_driver.find_element_by_name("username")
username.send_keys("demo")

In [None]:
password = gui_driver.find_element_by_name("password")
password.send_keys("demo")

In [None]:
login = gui_driver.find_element_by_tag_name("button")
login.click()

In [None]:
Image(gui_driver.get_screenshot_as_png())

## Collecting Crashes

To get started with basic steps in crash processing, let's take a look at *simply-buggy*, an example repository containing trivial C++ programs for illustration purposes.

In [None]:
!git clone https://github.com/choller/simply-buggy

In [None]:
!(cd simply-buggy && make)

The make command compiles our target program, including our first target, the *simple-crash* example. Alongside the program, there is also a configuration file generated. Let's take a look at that file and the source code:

In [None]:
from fuzzingbook_utils import print_file

In [None]:
print_file("simply-buggy/simple-crash.cpp")

In [None]:
from pygments.lexers.configs import IniLexer

In [None]:
print_file("simply-buggy/simple-crash.fuzzmanagerconf", lexer=IniLexer())

As you can see, the source code is fairly simple: A forced crash by writing to a (near)-NULL pointer. The configuration file generated for the the binary also contains some straightforward information, like the revision of the program and other metadata that is required or at least useful later on when submitting things.

Running the program shows us a crash trace as expected:

In [None]:
!simply-buggy/simple-crash

Now, what we would actually like to do is to run this binary from Python instead, detect that it crashed, collect the trace and submit it to the server. Let's start with a simple script that would just run the program we give it and detect the presence of the ASan trace:

In [None]:
import subprocess

In [None]:
cmd = ["simply-buggy/simple-crash"]

In [None]:
result = subprocess.run(cmd, stderr=subprocess.PIPE)
stderr = result.stderr.decode().splitlines()
crashed = False

for line in stderr:
    if "ERROR: AddressSanitizer" in line:
        crashed = True
        break

if crashed:
    print("Yay, we crashed!")
else:
    print("Move along, nothing to see...")

Nice, we can now run the binary and detect that it crashed. But how do we send this information to the server now? Let's make a few modifications...

In [None]:
import subprocess

In [None]:
from Collector.Collector import Collector
from FTB.ProgramConfiguration import ProgramConfiguration
from FTB.Signatures.CrashInfo import CrashInfo

We instantiate the collector instance; this will be our entry point for talking to the server.

In [None]:
collector = Collector()

cmd = ["simply-buggy/simple-crash"]

result = subprocess.run(cmd, stderr=subprocess.PIPE)
stderr = result.stderr.decode().splitlines()
crashed = False

for line in stderr:
    if "ERROR: AddressSanitizer" in line:
        crashed = True
        break

if crashed:
    print("Yay, we crashed, processing...")
    
    # This reads the simple-crash.fuzzmanagerconf file
    configuration = ProgramConfiguration.fromBinary(cmd[0])
    
    # This reads and parses our ASan trace into a more generic format,
    # returning us a generic "CrashInfo" object that we can inspect
    # and/or submit to the server.
    crashInfo = CrashInfo.fromRawCrashData([], stderr, configuration)
    
    # Submit the crash
    collector.submit(crashInfo)
    
    print("Crash submitted!")
else:
    print("Move along, nothing to see...")

We now submitted something to our local FuzzManager demo instance. If you go to http://127.0.0.1:8000/crashmanager/crashes/ you should see your crash.

In [None]:
gui_driver.refresh()

In [None]:
Image(gui_driver.get_screenshot_as_png())

## Crash Buckets

Now click on the crash to inspect the submitted data.

In [None]:
crash = gui_driver.find_element_by_xpath('//td/a[contains(@href,"/crashmanager/crashes/")]')
crash.click()

In [None]:
Image(gui_driver.get_screenshot_as_png())

Then click the orange *Create* button to create a bucket for this crash.  A *crash signature* will be proposed to you for matching this and future crashes of the same type:

In [None]:
create = gui_driver.find_element_by_xpath('//a[contains(@href,"/signatures/new/")]')
create.click()

In [None]:
gui_driver.set_window_size(1400, 1200)

In [None]:
Image(gui_driver.get_screenshot_as_png())

Accept it by clicking *Save*.

In [None]:
save = gui_driver.find_element_by_name("submit_save")
save.click()

You will be redirected to the newly created bucket, which shows you the size (how many crashes it holds), its bug report status (buckets can be linked to bugs in an external bug tracker like Bugzilla) and many other useful information.

### Crash Signatures

If you click on the *Signatures* entry in the top menu, you should also see your newly created entry.

In [None]:
Image(gui_driver.get_screenshot_as_png())

Buckets and their signatures are a central concept in FuzzManager. If you receive a lot of crash reports from various sources, bucketing allows you to easily group crashes and filter duplicates.

### Coarse-Grained Signatures

The flexible signature system starts out with an initially proposed fine-grained signature, but it can be adjusted as needed to capture variations of the same bug and make tracking easier. In the next example, we will look at a more complex example that reads data from a file and creates multiple crash signatures.

In [None]:
print_file("simply-buggy/out-of-bounds.cpp")

This program looks way more elaborate compared to the last one, but don't worry, it isn't really doing a whole lot. The code in the `main` function simply reads a file provided on the command line and puts its contents into a buffer that is passed to `validateAndPerformAction()`. That function pulls out two bytes of the buffer (`action` and `count`) and considers the rest `data`. Depending on the value of `action`, it then calls either `printFirst()` or `printLast()`, which prints either the first or the last `count` bytes of `data`. Sounds pointless, and yes, it is. The whole idea of this program is that the security check (that `count` is not larger than the length of `data`) is missing in `validateAndPerformAction()` but that the illegal access happens later in either of the two print functions. Hence, we would expect this program to generate at least two (slightly) different crash signatures.

Let's try it out with very simple fuzzing based on the last Python script:

In [None]:
import os
import random
import subprocess
import tempfile
import sys

In [None]:
from Collector.Collector import Collector
from FTB.ProgramConfiguration import ProgramConfiguration
from FTB.Signatures.CrashInfo import CrashInfo

In [None]:
# Instantiate the collector instance, this will be our entry point
# for talking to the server.

In [None]:
collector = Collector()

cmd = ["simply-buggy/out-of-bounds"]

crash_count = 0
TRIALS = 100

for itnum in range(0, TRIALS):
    rand_len = random.randint(1, 1024)
    rand_data = bytearray(os.urandom(rand_len))
    
    (fd, current_file) = tempfile.mkstemp(prefix="fuzztest")
    os.write(fd, rand_data)
    os.close(fd)
    
    current_cmd = []
    current_cmd.extend(cmd)
    current_cmd.append(current_file)
    
    result = subprocess.run(current_cmd, stderr=subprocess.PIPE)
    stderr = result.stderr.decode().splitlines()
    crashed = False

    for line in stderr:
        if "ERROR: AddressSanitizer" in line:
            crashed = True
            break

    if crashed:
        sys.stdout.write("C")

        # This reads the simple-crash.fuzzmanagerconf file
        configuration = ProgramConfiguration.fromBinary(cmd[0])

        # This reads and parses our ASan trace into a more generic format,
        # returning us a generic "CrashInfo" object that we can inspect
        # and/or submit to the server.
        crashInfo = CrashInfo.fromRawCrashData([], stderr, configuration)

        # Submit the crash
        collector.submit(crashInfo, testCase = current_file)
        
        crash_count += 1
    else:
        sys.stdout.write(".")
    
    os.remove(current_file)

print("")
print("Done, submitted %d crashes after %d runs." % (crash_count, TRIALS))

If you run this script, you will see its progress and notice that it produces quite a few crashes. And indeed, if you visit [FuzzManager](http://127.0.0.1:8000/crashmanager/crashes/), you will notice a variety of crashes that have accumulated:

In [None]:
gui_driver.get(fuzzmanager_url + "/crashmanager/crashes")

In [None]:
Image(gui_driver.get_screenshot_as_png())

Pick the first crash and create a bucket for it, like you did the last time. After saving, you will notice that not all of your crashes went into the bucket. The reason is that our program created several different stacks that are somewhat similar but not exactly identical. This is a common problem when fuzzing real world applications.

Fortunately, there is an easy way to deal with this. While on the bucket page, hit the *Optimize* button for the bucket. FuzzManager will then automatically propose you to change your signature. Accept the change by hitting *Edit with Changes* and then *Save*. Repeat these steps until all crashes are part of the bucket. After 3 to 4 iterations, your signature will likely look like this:

```{
  "symptoms": [
    {
      "type": "output",
      "src": "stderr",
      "value": "/ERROR: AddressSanitizer: heap-buffer-overflow/"
    },
    {
      "type": "stackFrames",
      "functionNames": [
        "?",
        "?",
        "?",
        "validateAndPerformAction",
        "main",
        "__libc_start_main",
        "_start"
      ]
    },
    {
      "type": "crashAddress",
      "address": "> 0xFF"
    }
  ]
}```

As you can see in the *stackFrames* symptom, the `validateAndPerformAction` stack frame is still present because it is common in all crashes (in fact, this is where the bug lives). But the lower stack parts have been removed because they vary across the set of submitted crashes.

The *Optimize* function is designed to automate this process as much as possible: It attempts to broaden the signature by fitting it to untriaged crashes and then checks if the modified signature would touch other existing buckets. This works with the assumption that other buckets are indeed other bugs, i.e. if you had created two buckets from your crashes first, optimizing wouldn't work anymore. Also, if the existing bucket data is sparse and you have a lot of untriaged crashes, the algorithm could propose changes that include crashes of different bugs in the same bucket. There is no way to fully automatically detect and prevent this, hence the process is semi-automated and requires you to review all proposed changes.

## Collecting Code Coverage

As we have heard before, measuring code coverage can be beneficial to assess how well a fuzzer is performing on the target code. Holes in code coverage can reveal particularly hard-to-reach locations as well as bugs in the fuzzer itself. Because this is an important part of the overall fuzzing operations, FuzzManager supports visualizing per-fuzzing code coverage of repositories. To illustrate this, we are first going to look at a another simple program, the `maze` example:

In [None]:
print_file("simply-buggy/maze.cpp")

As you can see, all this program does is read some numbers from the command line, compare them to some magical constants and arbitrary criteria, and if everything works out, you reach one of the two secrets in the program. Also, one secret is buggy.

Before we start to work on this program, we recompile the programs with coverage support. In order to emit code coverage with either Clang or GCC, programs typically need to be built and linked with special `CFLAGS` like `--coverage`. In our case, the Makefile does this for us:

In [None]:
!(cd simply-buggy && make clean && make coverage)

Also, if we want to use FuzzManager to look at our code, we need to do the initial repository setup (essentially giving the server its own working copy of our GIT repository to pull the source from). Normally, the client and server run on different machines, so this involves checking out the repository on the server and telling it where to find it (and what version control system it uses):

In [None]:
!git clone https://github.com/choller/simply-buggy $HOME/simply-buggy-server    

In [None]:
!python3 FuzzManager/server/manage.py setup_repository simply-buggy GITSourceCodeProvider $HOME/simply-buggy-server

We now assume that we know some of the magic constants (like in practice, we sometimes know some things about the target, but might miss a detail) and we fuzz the program with that:

In [None]:
import random
import subprocess

In [None]:
random.seed(0)
cmd = ["simply-buggy/maze"]

constants = [3735928559, 1111638594]; 

TRIALS = 1000

for itnum in range(0, TRIALS):
    current_cmd = []
    current_cmd.extend(cmd)
    
    for _ in range(0,4):
        if random.randint(0, 9) < 3:
            current_cmd.append(str(constants[random.randint(0, len(constants) - 1)]))
        else:
            current_cmd.append(str(random.randint(-2147483647, 2147483647)))
    
    result = subprocess.run(current_cmd, stderr=subprocess.PIPE)
    stderr = result.stderr.decode().splitlines()
    crashed = False
    
    if stderr and "secret" in stderr[0]:
        print(stderr[0])

    for line in stderr:
        if "ERROR: AddressSanitizer" in line:
            crashed = True
            break

    if crashed:
        print("Found the bug!")
        break

print("Done!")

As you can see, with 1000 runs we found secret 1 a few times, but secret 2 (and the crash) are still missing. In order to determine how to improve this, we are now going to look at the coverage data:

In [None]:
!grcov simply-buggy/ -t coveralls+ --commit-sha $(cd simply-buggy && git rev-parse HEAD) --token NONE -p `pwd`/simply-buggy/ > coverage.json

In [None]:
!python3 -mCovReporter --repository simply-buggy --description "Test1" --submit coverage.json

We can now go to http://127.0.0.1:8000/covmanager/ to take a look at our source code and its coverage.

In [None]:
gui_driver.get(fuzzmanager_url + "/covmanager")

In [None]:
Image(gui_driver.get_screenshot_as_png())

Click on the first ID to browse the coverage data that you just submitted.

In [None]:
first_id = gui_driver.find_element_by_xpath('//td/a[contains(@href,"/browse")]')
first_id.click()

In [None]:
Image(gui_driver.get_screenshot_as_png())

You will first see the full list of files in the `simply-buggy` repository, with all but the `maze.cpp` file showing 0% coverage (because we didn't do anything with these binaries since we rebuilt them with coverage support). Now click on `maze.cpp` and inspect the coverage line by line:

In [None]:
maze_cpp = gui_driver.find_element_by_xpath("//*[contains(text(), 'maze.cpp')]")
maze_cpp.click()

In [None]:
Image(gui_driver.get_screenshot_as_png())

There are two observations to make:

1. The if-statement in line 34 is still covered, but the lines following after it are red. This is because our fuzzer misses the constant checked in that statement, so it is fairly obvious that we need to add to our constants list.

2. From line 26 to line 27 there is a sudden drop in coverage. Both lines are covered, but the counters show that we fail that check in more than 95% of the cases. This explains why we find secret 1 so rarely. If this was a real program, we would now try to figure out how much additional code is behind that branch and adjust probabilities such that we hit it more often, if necessary.

Of course, the `maze` program is so small that one could see these issues with the bare eye. But in reality with complex programs, it is most of the time not obvious where a fuzzing tool gets stuck. Identifying these cases can greatly help to improve fuzzing results. For the sake of completeness, let's rerun the program now with the missing constant added:

In [None]:
import random
import subprocess

In [None]:
random.seed(0)
cmd = ["simply-buggy/maze"]

constants = [3735928559, 1111638594, 3405695742]; # Added the missing constant here

for itnum in range(0,1000):
    current_cmd = []
    current_cmd.extend(cmd)
    
    for _ in range(0,4):
        if random.randint(0, 9) < 3:
            current_cmd.append(str(constants[random.randint(0, len(constants) - 1)]))
        else:
            current_cmd.append(str(random.randint(-2147483647, 2147483647)))
    
    result = subprocess.run(current_cmd, stderr=subprocess.PIPE)
    stderr = result.stderr.decode().splitlines()
    crashed = False
    
    if stderr:
        print(stderr[0])

    for line in stderr:
        if "ERROR: AddressSanitizer" in line:
            crashed = True
            break

    if crashed:
        print("Found the bug!")
        break

print("Done!")

As expected, we now found secret 2 including our crash.

## Lessons Learned

* _Lesson one_
* _Lesson two_
* _Lesson three_

We're done, so we clean up:

In [None]:
fuzzmanager_process.terminate()

In [None]:
gui_driver.quit()

In [None]:
import shutil

In [None]:
if os.path.exists('coverage.json'):
    os.remove('coverage.json')
if os.path.exists('coverage'):
    shutil.rmtree('coverage')
if os.path.exists('simply-buggy'):
    shutil.rmtree('simply-buggy')

## Next Steps

_Link to subsequent chapters (notebooks) here, as in:_

* [use _mutations_ on existing inputs to get more valid inputs](MutationFuzzer.ipynb)
* [use _grammars_ (i.e., a specification of the input format) to get even more valid inputs](Grammars.ipynb)
* [reduce _failing inputs_ for efficient debugging](Reducer.ipynb)


## Background

_Cite relevant works in the literature and put them into context, as in:_

The idea of ensuring that each expansion in the grammar is used at least once goes back to Burkhardt \cite{Burkhardt1967}, to be later rediscovered by Paul Purdom \cite{Purdom1972}.

## Exercises

_Close the chapter with a few exercises such that people have things to do.  To make the solutions hidden (to be revealed by the user), have them start with_

```markdown
**Solution.**
```

_Your solution can then extend up to the next title (i.e., any markdown cell starting with `#`)._

_Running `make metadata` will automatically add metadata to the cells such that the cells will be hidden by default, and can be uncovered by the user.  The button will be introduced above the solution._

### Exercise 1: _Title_

_Text of the exercise_

In [None]:
# Some code that is part of the exercise
pass

_Some more text for the exercise_

**Solution.** _Some text for the solution_

In [None]:
# Some code for the solution
2 + 2

_Some more text for the solution_

### Exercise 2: _Title_

_Text of the exercise_

**Solution.** _Solution for the exercise_