New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timing attacks - docs and minor fixes #679
Conversation
c1997c5
to
c4848c9
Compare
8ff92e7
to
3aff94e
Compare
6a53e65
to
9a74304
Compare
|
This pull request introduces 1 alert when merging 9a74304 into 49ae595 - view on LGTM.com new alerts:
|
|
This pull request introduces 1 alert when merging 96d7606 into 49ae595 - view on LGTM.com new alerts:
|
|
This pull request introduces 1 alert when merging ab8d154 into 49ae595 - view on LGTM.com new alerts:
|
|
This pull request introduces 1 alert when merging 5d7f74e into 49ae595 - view on LGTM.com new alerts:
|
| ncol=6, | ||
| loc='upper center', | ||
| bbox_to_anchor=(0.5, -0.15) | ||
| ) | ||
|
|
||
| def generate_report(self): | ||
| def calc_diff_conf_int(self, pair, reps=5000, ci=0.95): | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function _write_individual_results has a Cognitive Complexity of 13 (exceeds 10 allowed). Consider refactoring.
|
I was only able to find minor typos; looks good to me otherwise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dismissed @codeclimate[bot] from 5 discussions.
Reviewable status: 1 of 15 files reviewed, 3 unresolved discussions (waiting on @ueno)
docs/source/timing-analysis.rst, line 5 at r39 (raw file):
Previously, ueno (Daiki Ueno) wrote…
"cryptographic"
Done.
scripts/test-bleichenbacher-timing.py, line 844 at r37 (raw file):
Previously, ueno (Daiki Ueno) wrote…
Can't you write those output as a single multi-line string?
good idea, fixed
tlsfuzzer/timing_runner.py, line 34 at r20 (raw file):
Previously, ueno (Daiki Ueno) wrote…
duplicate "of"
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 14 of 14 files at r40, 4 of 4 files at r41, 2 of 2 files at r42, 2 of 2 files at r43, 1 of 1 files at r44, 1 of 1 files at r45, 2 of 2 files at r46, 2 of 2 files at r47, 1 of 1 files at r48, 1 of 1 files at r49, 2 of 2 files at r50, 2 of 2 files at r51, 3 of 3 files at r52, 1 of 1 files at r53, 1 of 1 files at r54, 2 of 2 files at r55, 1 of 1 files at r56, 1 of 1 files at r57, 1 of 1 files at r58, 2 of 2 files at r59.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @ueno)
since running tests with large population sizes can take hours, if not days, report progress of the process as it's running
since the assignment of ids to test names is random by design, it's hard to find which IDs (class numbers) correspond to which tests, this creates a simple legend file that puts that data in easy to read and easy to find file
When the script is killed (by any reason, including Ctrl+C), the tcpdump process will be left running this fixes this issue
it's not necessary to send the AppData, the server is expected to reject the Finished message since the timing test randomises the the executed tests, it's not necessary to randomise it before that (and it makes comparing results from two runs harder)
because we collect observations in sets and the collection happens over long periods of time (so noise that is constant for a significant portion of collection will be visible in a portion of observations), it means that our samples *are not* independent because KS-test requires independent samples, we can't use it and expect correct results (in practice, KS-test overestimates p-values for dependent samples). Wilcoxon signed-rank test does work with dependent samples. The downside is that it won't see change in variance between samples. The upside is that it is significantly more sensitive to changes in expected value.
since it's normal to observe multiple false positives with 200 test executed (like with the Bleichenbacher), we should rather look at overall distribution of p-values of the Wilcoxon test to decide if the small p-values match expected frequency or not
we have tests with very short pre-master secret values (0, 1 and 2 byte long), and while we have also a test with slightly larger PMS (49-byte long), we don't have a test with very long PMS Add tests with a PMS twice as long as a normal one and also one that is as large as a 1024 bit RSA key can handle (124-byte long)
As it's normal to verify if the results are reproducible by running the same test twice (or to compare results before and after a change), make it easier by ensuring that the order of data on graphs is constant.
The plots are not really informative with large or noisy samples, it is also not really possible to generate it for the full lucky13 test as it requires generating over 200k individual plots
Write the result of the test for uniformity of p-values to a text file Add also the calculation (using bootstrapping) of the confidence interval for the truncated mean of differences between the worst pair of samples (ones that have the smallest p-value from Wilcoxon test). Write that result to the text file too.
Document how to run the timing tests, why to run them like this and how to interpret results. Also move the info from TIMING.md to the main documentation.
because the code in analysis.py requires dependencies that are not installed on "clean" 3.6 run, it underreports the code coverage
if we don't specify the data type for the array, pandas without header uses python object()'s, that translates to gigabytes of memory needed for processing 20M datapoints. Coerce them into numpy arrays of float64 objects to significantly reduce required memory.
Matplotlib *really* wants to keep onto graphs it once drawn, and it's not really possible to free them (see https://stackoverflow.com/q/28516828/462370) so plot them in separate processes, so it has no option but to free the memory
while nominally it's not different from "too short PKCS padding" test, the number of zero bytes may have an impact on timing of decryption and thus server response time to malformed CKE messages Use amount of 0 bytes that is still possible with 1024 bit RSA keys, but large enough so that any timing side channel will be much more apparent than with the "too short PKCS padding"
make the test description mention the different groups of tests to ease debugging and make interpretation of results easier
while the probes that modify the Finished message to create a malformed ciphertext are very consistent, it also makes them detect Lucky13, masking the signal from Bleichenbacher. Use a probe that intentionally causes the master key calculation to use wrong pre-master key to randomise the plaintext the server will see on decryption. Leave old probes in to make combining the results from multiple runs still possible.
|
|
||
| return np.quantile(cent_tend, [(1-ci)/2, 0.5, 1-(1-ci)/2]) | ||
|
|
||
| def _write_individual_results(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function _write_individual_results has a Cognitive Complexity of 13 (exceeds 10 allowed). Consider refactoring.
Description
Improve documentation around tests for timing attacks, make the scripts easier to use.
Motivation and Context
#673
Checklist
tlslite-ng.jsonandtlslite-ng-random-subset.jsonThis change is