Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timing attacks - docs and minor fixes #679

Merged
merged 20 commits into from Jul 20, 2020
Merged

Timing attacks - docs and minor fixes #679

merged 20 commits into from Jul 20, 2020

Conversation

tomato42
Copy link
Member

@tomato42 tomato42 commented Jun 27, 2020

Description

Improve documentation around tests for timing attacks, make the scripts easier to use.

Motivation and Context

#673

Checklist

  • I have read the CONTRIBUTING.md document and my PR follows change requirements therein
  • the changes are also reflected in documentation and code comments
  • all new and existing tests pass (see Travis CI results)
  • test script checklist was followed for new scripts
  • new test script added to tlslite-ng.json and tlslite-ng-random-subset.json
  • new and modified scripts were ran against popular TLS implementations:
    • OpenSSL
    • NSS
    • GnuTLS
  • required version of tlslite-ng updated in requirements.txt and README.md

This change is Reviewable

@tomato42 tomato42 added the enhancement new feature to be implemented label Jun 27, 2020
@tomato42 tomato42 self-assigned this Jun 27, 2020
@tomato42 tomato42 added this to In progress in Vulnerability testers via automation Jun 27, 2020
@tomato42 tomato42 force-pushed the timing-docs branch 3 times, most recently from c1997c5 to c4848c9 Compare July 2, 2020 14:34
tlsfuzzer/analysis.py Outdated Show resolved Hide resolved
tlsfuzzer/analysis.py Outdated Show resolved Hide resolved
tlsfuzzer/analysis.py Outdated Show resolved Hide resolved
tlsfuzzer/analysis.py Outdated Show resolved Hide resolved
@tomato42 tomato42 force-pushed the timing-docs branch 2 times, most recently from 8ff92e7 to 3aff94e Compare July 2, 2020 19:48
tlsfuzzer/analysis.py Outdated Show resolved Hide resolved
tlsfuzzer/analysis.py Outdated Show resolved Hide resolved
TIMING.md Show resolved Hide resolved
@tomato42 tomato42 force-pushed the timing-docs branch 2 times, most recently from 6a53e65 to 9a74304 Compare July 8, 2020 17:54
@lgtm-com
Copy link

lgtm-com bot commented Jul 8, 2020

This pull request introduces 1 alert when merging 9a74304 into 49ae595 - view on LGTM.com

new alerts:

  • 1 for Unused import

@lgtm-com
Copy link

lgtm-com bot commented Jul 9, 2020

This pull request introduces 1 alert when merging 96d7606 into 49ae595 - view on LGTM.com

new alerts:

  • 1 for Unused import

@lgtm-com
Copy link

lgtm-com bot commented Jul 9, 2020

This pull request introduces 1 alert when merging ab8d154 into 49ae595 - view on LGTM.com

new alerts:

  • 1 for Unused import

tlsfuzzer/analysis.py Show resolved Hide resolved
tlsfuzzer/analysis.py Show resolved Hide resolved
@lgtm-com
Copy link

lgtm-com bot commented Jul 10, 2020

This pull request introduces 1 alert when merging 5d7f74e into 49ae595 - view on LGTM.com

new alerts:

  • 1 for Unused import

ncol=6,
loc='upper center',
bbox_to_anchor=(0.5, -0.15)
)

def generate_report(self):
def calc_diff_conf_int(self, pair, reps=5000, ci=0.95):
"""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function _write_individual_results has a Cognitive Complexity of 13 (exceeds 10 allowed). Consider refactoring.

@ueno
Copy link
Collaborator

ueno commented Jul 17, 2020

I was only able to find minor typos; looks good to me otherwise.

Copy link
Member Author

@tomato42 tomato42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed @codeclimate[bot] from 5 discussions.
Reviewable status: 1 of 15 files reviewed, 3 unresolved discussions (waiting on @ueno)


docs/source/timing-analysis.rst, line 5 at r39 (raw file):

Previously, ueno (Daiki Ueno) wrote…

"cryptographic"

Done.


scripts/test-bleichenbacher-timing.py, line 844 at r37 (raw file):

Previously, ueno (Daiki Ueno) wrote…

Can't you write those output as a single multi-line string?

good idea, fixed


tlsfuzzer/timing_runner.py, line 34 at r20 (raw file):

Previously, ueno (Daiki Ueno) wrote…

duplicate "of"

Done.

ueno
ueno previously approved these changes Jul 18, 2020
Copy link
Collaborator

@ueno ueno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 14 of 14 files at r40, 4 of 4 files at r41, 2 of 2 files at r42, 2 of 2 files at r43, 1 of 1 files at r44, 1 of 1 files at r45, 2 of 2 files at r46, 2 of 2 files at r47, 1 of 1 files at r48, 1 of 1 files at r49, 2 of 2 files at r50, 2 of 2 files at r51, 3 of 3 files at r52, 1 of 1 files at r53, 1 of 1 files at r54, 2 of 2 files at r55, 1 of 1 files at r56, 1 of 1 files at r57, 1 of 1 files at r58, 2 of 2 files at r59.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @ueno)

since running tests with large population sizes can take hours, if
not days, report progress of the process as it's running
since the assignment of ids to test names is random by design, it's
hard to find which IDs (class numbers) correspond to which tests,
this creates a simple legend file that puts that data in easy to
read and easy to find file
When the script is killed (by any reason, including Ctrl+C), the
tcpdump process will be left running

this fixes this issue
it's not necessary to send the AppData, the server is expected to
reject the Finished message

since the timing test randomises the the executed tests, it's not
necessary to randomise it before that (and it makes comparing
results from two runs harder)
because we collect observations in sets and the collection happens
over long periods of time (so noise that is constant for
a significant portion of collection will be visible in a portion of
observations), it means that our samples *are not* independent

because KS-test requires independent samples, we can't use it and
expect correct results (in practice, KS-test overestimates p-values
for dependent samples). Wilcoxon signed-rank test does work with
dependent samples. The downside is that it won't see change in variance
between samples. The upside is that it is significantly more sensitive
to changes in expected value.
since it's normal to observe multiple false positives with
200 test executed (like with the Bleichenbacher), we should rather
look at overall distribution of p-values of the Wilcoxon test to decide
if the small p-values match expected frequency or not
we have tests with very short pre-master secret values (0, 1 and 2 byte
long), and while we have also a test with slightly larger PMS
(49-byte long), we don't have a test with very long PMS

Add tests with a PMS twice as long as a normal one and also one
that is as large as a 1024 bit RSA key can handle (124-byte long)
As it's normal to verify if the results are reproducible by running
the same test twice (or to compare results before and after
a change), make it easier by ensuring that the order of data on
graphs is constant.
The plots are not really informative with large or noisy samples,
it is also not really possible to generate it for the full lucky13
test as it requires generating over 200k individual plots
Write the result of the test for uniformity of p-values to a text file

Add also the calculation (using bootstrapping) of the confidence interval
for the truncated mean of differences between the worst pair of samples
(ones that have the smallest p-value from Wilcoxon test).
Write that result to the text file too.
Document how to run the timing tests, why to run them like this and
how to interpret results.

Also move the info from TIMING.md to the main documentation.
because the code in analysis.py requires dependencies that are not
installed on "clean" 3.6 run, it underreports the code coverage
if we don't specify the data type for the array, pandas without header
uses python object()'s, that translates to gigabytes of memory needed
for processing 20M datapoints.
Coerce them into numpy arrays of float64 objects to significantly reduce
required memory.
Matplotlib *really* wants to keep onto graphs it once drawn, and it's
not really possible to free them (see
https://stackoverflow.com/q/28516828/462370)
so plot them in separate processes, so it has no option but to free
the memory
while nominally it's not different from "too short PKCS padding" test,
the number of zero bytes may have an impact on timing of decryption
and thus server response time to malformed CKE messages

Use amount of 0 bytes that is still possible with 1024 bit RSA keys,
but large enough so that any timing side channel will be much more
apparent than with the "too short PKCS padding"
make the test description mention the different groups of tests to
ease debugging and make interpretation of results easier
while the probes that modify the Finished message to create a malformed
ciphertext are very consistent, it also makes them detect Lucky13,
masking the signal from Bleichenbacher.
Use a probe that intentionally causes the master key calculation to use
wrong pre-master key to randomise the plaintext the server will
see on decryption.

Leave old probes in to make combining the results from multiple runs
still possible.

return np.quantile(cent_tend, [(1-ci)/2, 0.5, 1-(1-ci)/2])

def _write_individual_results(self):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function _write_individual_results has a Cognitive Complexity of 13 (exceeds 10 allowed). Consider refactoring.

@tomato42 tomato42 merged commit 4b990ab into master Jul 20, 2020
4 of 6 checks passed
Vulnerability testers automation moved this from In progress to Done Jul 20, 2020
@tomato42 tomato42 deleted the timing-docs branch July 22, 2020 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement new feature to be implemented
Projects
Development

Successfully merging this pull request may close these issues.

None yet

2 participants