Merge pull request #817 from tlsfuzzer/more-docs

More documentation
tlsfuzzer · May 22, 2023 · 5f2069b · 5f2069b
2 parents d0d14c2 + 8766c9e
commit 5f2069b
Show file tree

Hide file tree

Showing 3 changed files with 155 additions and 11 deletions.
diff --git a/docs/source/failure-analysis.rst b/docs/source/failure-analysis.rst
@@ -0,0 +1,114 @@
+================
+Failure analysis
+================
+
+As the scripts in ``tlsfuzzer`` expect very specific behaviour from the server
+not all failures and errors reported by the script necessarily mean that the
+server is buggy.
+
+Note that those are just some of the tips, and this article is far from
+complete.
+To understand the failures and analyse them you will likely need to have
+a deep understanding of the TLS protocol.
+Remember to first understand what is happening and why it's happening
+before dismissing test failure as "irrelevant" or a "false positive".
+Also, when investigating, don't be afraid to edit the test scripts:
+they're intentionally written with little code reuse to make that process
+easy and contained.
+
+Execution summary
+=================
+
+All scripts print the following summary after execution:
+
+.. code::
+
+    Test end
+    ====================
+    version: 8
+    ====================
+    TOTAL: 2
+    SKIP: 0
+    PASS: 2
+    XFAIL: 0
+    FAIL: 0
+    XPASS: 0
+    ====================
+
+The ``version`` field is self-explanatory, it's the version number of the
+script.
+Every time the script is modified in a way that its execution *may*
+change, that version number is incremented.
+
+The ``TOTAL`` is the number of conversations executed by the script.
+While typically "conversations" is equivalent to "connections", in case the
+script is testing session resumption, the actual number of connections
+can be larger.
+
+The ``SKIP`` is the number of conversations that were excluded from execution
+through the use of the ``-e`` command line option.
+
+The ``PASS`` is the number of conversations in which the server behaved
+in expected manner (either the connection was successful when we expected
+success or the server correctly detected a malformed message when we
+sent one).
+
+The ``XFAIL`` is the number of conversations that *eXpectedly FAILed*.
+That is, the number of conversations that were specified with the ``-x`` option
+and subsequently failed (without the specific error message, or
+with the exact error message specified with the ``-X`` option).
+
+The ``FAIL`` is the number of conversations in which the server behaved
+in unexpected manner (refused the connection when we expected a
+successful handshake, didn't error out when we sent malformed messaged, etc.).
+
+The ``XPASS`` is the number of conversations that *unXepectedly PASSed*.
+That is, the conversations that were specified with the ``-x`` option but
+didn't fail.
+
+Those names are taken from the ``pytest`` project and have the same
+meaning there.
+
+The overall script will return a non-zero exit code if either the ``FAIL``
+or the ``XPASS`` count is more than zero.
+
+Failure in sanity
+=================
+
+For the overall test to be valid in the first place, we need to establish
+a baseline: verify that we can connect to the server and that we can
+perform a basic handshake, exchange some data, and perform an orderly
+connection shutdow.
+
+This is done by the ``sanity`` conversations inside the scripts.
+
+For example, if the script is testing that the server is rejecting
+malformed ClientHello messages, we first need to verify that it will
+accept a non-malformed message first: that happens in the the ``sanity``
+case.
+If that one doesn't pass, we may not be testing the ClientHello parser at
+all, as the server may be rejecting the connection because of the advertised
+ciphersuite by the client, not anything else.
+
+Now, since we want to detect wrong behaviour from the server,
+the scripts are fairly strict with the expected behaviour in the ``sanity``
+case.
+In particular, the *number* of ApplicationData packets expected as the
+reply to the HTTP GET request is exactly one.
+This is to detect situations in which the server sends each line of
+reply in single ApplicationData record (very bad),
+or a situation in which an HTTP server sends headers and body of the
+response in separate records (bad, as that leaks the exact size of both,
+which can have privacy consequences).
+
+That of course means that you generally can't execute the scripts against
+servers that reply with a large web page that doesn't fit into a single
+TLS ApplicationData record (at least, not without modifying the scripts).
+It's also like that, as some servers prefer to fragment the response
+to smaller records, ones that fit into MTU (to improve latency), while
+others will happily send records of maximum size and as few of them as
+possible.
+What is the expected behaviour for a particular server is not something
+we should specify for each and every test script, so they expect
+one and only one record as the response (though there are scripts that
+test large responses specifically, like the ``test-record-size-limit.py``).
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -37,6 +37,7 @@ to see wanted, but not yet implemented features.
    quickstart
    installation
    theory
+   failure-analysis
    writing-tests
    advanced-decision-graph
    modifying-messages

diff --git a/docs/source/quickstart.rst b/docs/source/quickstart.rst
@@ -168,11 +168,17 @@ This command should provide the following output if everything went fine:
     cipher, TLS 1.2 or earlier and RSA key exchange (or (EC)DHE if
     -d option is used)
 
-    version: 4
-
     Test end
-    successful: 2
-    failed: 0
+    ====================
+    version: 8
+    ====================
+    TOTAL: 2
+    SKIP: 0
+    PASS: 2
+    XFAIL: 0
+    FAIL: 0
+    XPASS: 0
+    ====================
 
 All the test scripts support at least ``--help`` option. For this script it
 will provide the following information:
@@ -187,9 +193,18 @@ will provide the following information:
                     names and not all of them, e.g "sanity"
      -e probe-name  exclude the probe from the list of the ones run
                     may be specified multiple times
-     -n num         only run `num` random tests instead of a full set
+     -x probe-name  expect the probe to fail. When such probe passes despite being marked like this
+                    it will be reported in the test summary and the whole script will fail.
+                    May be specified multiple times.
+     -X message     expect the `message` substring in exception raised during
+                    execution of preceding expected failure probe
+                    usage: [-x probe-name] [-X exception], order is compulsory!
+     -n num         run 'num' or all(if 0) tests instead of default(all)
                     ("sanity" tests are always executed)
-     -d             negotiate (EC)DHE instead of RSA key exchange
+     -d             negotiate (EC)DHE instead of RSA key exchange, send
+                    additional extensions, usually used for (EC)DHE ciphers
+     -C ciph        Use specified ciphersuite. Either numerical value or
+                    IETF name.
      --help         this message
 
 Almost all scripts support this set of command line options.
@@ -214,11 +229,17 @@ This produces similar output:
     Check if communication with typical group and cipher works with
     the TLS 1.3 server.
 
-    version: 2
-
     Test end
-    successful: 2
-    failed: 0
+    ====================
+    version: 7
+    ====================
+    TOTAL: 2
+    SKIP: 0
+    PASS: 2
+    XFAIL: 0
+    FAIL: 0
+    XPASS: 0
+    ====================
 
 Similarly to the :term:`TLS` 1.2 script, this one supports a set of options:
 
@@ -232,8 +253,16 @@ Similarly to the :term:`TLS` 1.2 script, this one supports a set of options:
                     names and not all of them, e.g "sanity"
      -e probe-name  exclude the probe from the list of the ones run
                     may be specified multiple times
-     -n num         only run `num` random tests instead of a full set
+     -x probe-name  expect the probe to fail. When such probe passes despite being marked like this
+                    it will be reported in the test summary and the whole script will fail.
+                    May be specified multiple times.
+     -X message     expect the `message` substring in exception raised during
+                    execution of preceding expected failure probe
+                    usage: [-x probe-name] [-X exception], order is compulsory!
+     -n num         run 'num' or all(if 0) tests instead of default(all)
                     ("sanity" tests are always executed)
+     -C ciph        Use specified ciphersuite. Either numerical value or
+                    IETF name.
      --help         this message
 
 As cryptographic parameter negotiation happens differently in :term:`TLS` 1.3