Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run benchmarks at microsecond level and record advanced stats #6868

Merged
merged 17 commits into from Nov 18, 2023

Conversation

lealem47
Copy link
Contributor

@lealem47 lealem47 commented Oct 13, 2023

Description

First two commits: Allow benchmarks to be run on microsecond level. Not sure if using the μ character in the code is controversial or not.

Next two commits: Record mean, max, min and sd of operation durations. Allow user input for hash and cipher plain input. Allow users to specify a minimum number of operation runs.

Testing

Tested on linux, windows, android, IOS

Checklist

  • added tests
  • updated/added doxygen
  • updated appropriate READMEs
  • Updated manual and documentation

@lealem47 lealem47 self-assigned this Oct 13, 2023
@lealem47 lealem47 changed the title Microsecond Run benchmarks at microsecond level Oct 13, 2023
@lealem47 lealem47 requested a review from douzzer October 23, 2023 22:38
@lealem47 lealem47 assigned douzzer and unassigned lealem47 Oct 23, 2023
@lealem47 lealem47 changed the title Run benchmarks at microsecond level Run benchmarks at microsecond level and record advanced stats Oct 23, 2023
Copy link
Contributor

@douzzer douzzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good.

In a first round of review, I found a few nits to pick and a couple bugs to fix.

The output without the new features activated is indeed as before, accepted by multi-test benchmark-wolfcrypt-intelasm-all (which parses the output in detail).

The statistics look to be getting calculated correctly but in the next review round I can do a formal confirmation of the mathematical correctness if you'd like.

The PR rebased on current master passed multi-test super-quick-check, modulo one typo found by clang-tidy.

* BENCH_MICROSECOND
*
* display mean, max, min and sd of operation durations:
* ADVANCED_STATS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we name this something descriptive? How about MULTI_VALUE_STATISTICS?

if (bench_plain == NULL)
goto exit;

if ((size_t)XFREAD(bench_plain, 1, bench_buf_size, file) != (size_t)bench_buf_size) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line length

if (bench_cipher == NULL)
goto exit;

if ((size_t)XFREAD(bench_cipher, 1, bench_buf_size, file) != (size_t)bench_buf_size) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line length

#ifndef MAX_SAMPLE_RUNS
#define MAX_SAMPLE_RUNS 1000
#endif
#define NEW_LINE ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we name this something descriptive? If the NEW_LINE macro is being used as a separator, then it should be called something like STATS_CLAUSE_SEPARATOR. Mainly, don't give macros names that are intrinsically contradictory, since that inevitably leads to confusion.

#define SHOW_INTEL_CYCLES(b, n, s) b[XSTRLEN(b)] = '\n'
#define SHOW_INTEL_CYCLES_CSV(b, n, s) b[XSTRLEN(b)] = '\n'
#ifdef ADVANCED_STATS
#define SHOW_INTEL_CYCLES(b, n, s)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically these statement-type macros should be defined to WC_DO_NOTHING to make them no-ops.

milliEach = each * 1000; /* milliseconds */

#endif

#ifdef BENCH_MICROSECOND
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can move this up to the declaration: turn digits into a const int, and conditionalize the initializer on BENCH_MICROSECOND. Not important, but it's neater and more efficient.

"-print ベンチマーク統計の要約を表示する\n",
/* TODO: translate below */
"-hash_input <file> Input data to use for hash benchmarking\n",
"-cipher_input <file> Input data to use for cipher benchmarking\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing comma at the end of the line. And nope, I didn't spot that with my bare eyes -- thank clang-tidy bugprone-suspicious-missing-comma for that catch.

#endif
#ifdef ADVANCED_STATS
printf("max duration,min duration,mean duration,sd,\n");
Copy link
Contributor

@douzzer douzzer Oct 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is missing a comma before "max duration" when GENERATE_MACHINE_PARSEABLE_REPORT. The output looks like this:

"asym",Algorithm,key size,operation,avg ms,ops/μsec,ops,μsecs,cycles,cycles/opmax duration,min duration,mean duration,sd,

@lealem47
Copy link
Contributor Author

Thanks for that first round of review @douzzer. Should be ready for a second round. Lmk if I missed anything

@lealem47 lealem47 requested a review from douzzer October 24, 2023 20:11
Copy link
Contributor

@douzzer douzzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes for the previous round all look good.

Now on to round 2! Limiting my review to just the code orthogonalization.

Comment on lines 3676 to 3697
#ifdef MULTI_VALUE_STATISTICS
if (runs == 0) {
delta = current_time(0) - start;
min = delta;
max = delta;
}
else {
delta = current_time(0) - prev;
if (max < delta)
max = delta;
else if (min > delta)
min = delta;
}

if (runs < MAX_SAMPLE_RUNS) {
deltas[runs] = delta;
sum += delta;
runs++;
}

prev = current_time(0);
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lengthy code pattern repeats something like 125 times -- you should use a macro to orthogonalize the implementation.

The declarations of the working variables should also be done with a macro.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@lealem47 lealem47 requested a review from douzzer October 25, 2023 19:54
douzzer
douzzer previously approved these changes Oct 26, 2023
Copy link
Contributor

@douzzer douzzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool new capability!

@lealem47
Copy link
Contributor Author

lealem47 commented Nov 8, 2023

Jenkins Retest this please

@lealem47
Copy link
Contributor Author

lealem47 commented Nov 8, 2023

Jenkins Retest this please

@douzzer
Copy link
Contributor

douzzer commented Nov 14, 2023

retest this please.

Copy link
Contributor

@douzzer douzzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, tests good, one last opportunistic nit pick from me and then I'll call it good!


#ifdef HAVE_AES_DECRYPT
XMEMSET(dec, 0, sizeof(dec));

#ifdef MULTI_VALUE_STATISTICS
prev = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reset code pattern occurs over and over, and tracks the declarations from DECLARE_MULTI_VALUE_STATS_VARS(), so it should also be done with a macro.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@lealem47
Copy link
Contributor Author

Jenkins Retest this please

1 similar comment
@lealem47
Copy link
Contributor Author

Jenkins Retest this please

Copy link
Contributor

@douzzer douzzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sanitizer is reporting overruns:

$ ./configure --quiet --disable-jobserver --enable-fips=disabled --enable-all --enable-intelasm 'CFLAGS=-ggdb -fno-omit-frame-pointer -fsanitize=address,pointer-subtract,leak,undefined,float-cast-overflow,float-divide-by-zero,bounds-strict -fsanitize-recover=all --param=max-vartrack-size=128000000 -O2 -DBENCH_MICROSECOND -DMULTI_VALUE_STATISTICS' --disable-optflags
$ wolfcrypt/benchmark/benchmark -hash_input /etc/hosts -cipher_input /etc/services -min_runs 10
------------------------------------------------------------------------------
 wolfSSL version 5.6.4
------------------------------------------------------------------------------
Math:   Multi-Precision: Wolf(SP) word-size=64 bits=4096 sp_int.c
wolfCrypt Benchmark (block bytes 1048576, min 1.0 sec each)
=================================================================
==21782==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x624000003e20 at pc 0x7f18bfb75a2f bp 0x7fff88218ea0 sp 0x7fff88218e90
WRITE of size 32 at 0x624000003e20 thread T0
    #0 0x7f18bfb75a2e in Hash_gen wolfcrypt/src/random.c:543
    #1 0x7f18bfb75a2e in Hash_DRBG_Generate wolfcrypt/src/random.c:623
    #2 0x7f18bfb79047 in wc_RNG_GenerateBlock wolfcrypt/src/random.c:1831
    #3 0x556e41bd6011 in bench_rng wolfcrypt/benchmark/benchmark.c:3721
    #4 0x556e41c0db31 in benchmarks_do wolfcrypt/benchmark/benchmark.c:2777
    #5 0x556e41c0effc in benchmark_test wolfcrypt/benchmark/benchmark.c:3681
    #6 0x556e41c0effc in benchmark_test wolfcrypt/benchmark/benchmark.c:3612
    #7 0x556e41c0effc in wolfcrypt_benchmark_main wolfcrypt/benchmark/benchmark.c:12652
    #8 0x7f18be84bc05 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #9 0x7f18be84bcc6 in __libc_start_main_impl ../csu/libc-start.c:360
    #10 0x556e41bc53b0 in _start ../sysdeps/x86_64/start.S:115

0x624000003e2e is located 0 bytes to the right of 7470-byte region [0x624000002100,0x624000003e2e)
allocated by thread T0 here:
    #0 0x7f18c12bd48f in __interceptor_malloc /tmp/portage/sys-devel/gcc-12.3.1_p20230825/work/gcc-12-20230825/libsanitizer/asan/asan_malloc_linux.cpp:69
    #1 0x556e41c0d89a in benchmarks_do wolfcrypt/benchmark/benchmark.c:2697
    #2 0x556e41c0effc in benchmark_test wolfcrypt/benchmark/benchmark.c:3681
    #3 0x556e41c0effc in benchmark_test wolfcrypt/benchmark/benchmark.c:3612
    #4 0x556e41c0effc in wolfcrypt_benchmark_main wolfcrypt/benchmark/benchmark.c:12652
    #5 0x7f18be84bc05 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #6 0x7f18be84bcc6 in __libc_start_main_impl ../csu/libc-start.c:360
    #7 0x556e41bc53b0 in _start ../sysdeps/x86_64/start.S:115

SUMMARY: AddressSanitizer: heap-buffer-overflow wolfcrypt/src/random.c:543 in Hash_gen
Shadow bytes around the buggy address:
  0x0c487fff8770: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c487fff8780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c487fff8790: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c487fff87a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c487fff87b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c487fff87c0: 00 00 00 00[00]06 fa fa fa fa fa fa fa fa fa fa
  0x0c487fff87d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c487fff87e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c487fff87f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c487fff8800: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c487fff8810: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==21782==ABORTING

I suspect it's because the setup code for hash_input in benchmarks_do() isn't properly rounding the length up to the nearest 16-byte boundary (cursory read), and maybe also because it isn't properly setting the new value for bench_size.

@lealem47
Copy link
Contributor Author

Jenkins Retest this please

Copy link
Contributor

@douzzer douzzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

caught on final check-over:

    configure...   real 0m16.660s  user 0m8.609s  sys 0m9.381s
    build...wolfcrypt/benchmark/benchmark.c: In function ‘void bench_aesecb_internal(int, const byte*, word32, const char*, const char*)’:
9bece34ad0 (<lealem47@github.com> 2023-11-17 13:28:30 -0700 4371)     const int benchSz = (const int)bench_size;
wolfcrypt/benchmark/benchmark.c:4371:25: error: type qualifiers ignored on cast result type [-Werror=ignored-qualifiers]
 4371 |     const int benchSz = (const int)bench_size;
      |                         ^~~~~~~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors
make[2]: *** [Makefile:6311: wolfcrypt/benchmark/benchmark.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [Makefile:7980: all-recursive] Error 1
make: *** [Makefile:4708: all] Error 2
   real 0m14.307s  user 1m19.023s  sys 0m6.010s
    all-g++ fail_build
    failed config: '--enable-all' '--enable-testcert' '--enable-sp-math-all' 'CC=g++' 'CPPFLAGS=-DNO_WOLFSSL_CIPHER_SUITE_TEST -DWOLFSSL_OLD_PRIME_CHECK'

Copy link
Contributor

@douzzer douzzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good!

@douzzer douzzer merged commit 06dcc11 into wolfSSL:master Nov 18, 2023
105 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants