Fix the `connectblockslow` testvector generator script to be deterministic. #2388

nathan-at-least · 2017-05-19T03:11:19Z

In this PR review @bitcartel determined that he could not reproduce the same hash for the connectblockslow testvector that @str4d created.

We timeboxed fixing that non-determinism and hit the time limit, and decided to deploy with a 'baked in' non-deterministic testvector for the time being.

The purpose of this ticket is to fix that non-determinism so that anyone can recreate the same benchmark on identical data. After that is done, we should measure that against today's 'special instance' of the test vector to ensure there's no performance difference, then update our benchmarking system to use the deterministic dataset.

This is important for a variety of reasons. One is to detect regressions in the determinism itself, as well as to ensure repeatability of measurements.

The text was updated successfully, but these errors were encountered:

str4d · 2017-05-19T03:31:47Z

@bitcartel sent me the archive he made, and the non-determinism is obvious:

$ diffoscope block-107134-str4d.tar.gz block-107134-bitcartel.tar.gz 
 |################################################################################################################|  100%                             Time: 0:00:00 
--- block-107134-str4d.tar.gz
+++ block-107134-bitcartel.tar.gz
│   --- block-107134-str4d.tar
├── +++ block-107134-bitcartel.tar
├── file list
│ │ @@ -1,7 +1,7 @@
│ │ -drwxrwxr-x   0 str4d     (1000) str4d     (1000)        0 2017-05-17 00:00:00.000000 benchmark/
│ │ -drwxr-xr-x   0 str4d     (1000) str4d     (1000)        0 2017-05-17 00:00:00.000000 benchmark/block-107134-inputs/
│ │ --rw-r--r--   0 str4d     (1000) str4d     (1000)        0 2017-05-17 00:00:00.000000 benchmark/block-107134-inputs/LOCK
│ │ --rw-rw-r--   0 str4d     (1000) str4d     (1000)       50 2017-05-17 00:00:00.000000 benchmark/block-107134-inputs/MANIFEST-000002
│ │ --rw-rw-r--   0 str4d     (1000) str4d     (1000)       16 2017-05-17 00:00:00.000000 benchmark/block-107134-inputs/CURRENT
│ │ --rw-rw-r--   0 str4d     (1000) str4d     (1000)  1170023 2017-05-17 00:00:00.000000 benchmark/block-107134-inputs/000003.log
│ │ --rw-rw-r--   0 str4d     (1000) str4d     (1000)  1996220 2017-05-17 00:00:00.000000 benchmark/block-107134.dat
│ │ +drwxr-xr-x   0 bitcartel   (1000) bitcartel   (1000)        0 2017-05-17 00:00:00.000000 benchmark/
│ │ +-rw-r--r--   0 bitcartel   (1000) bitcartel   (1000)  1996220 2017-05-17 00:00:00.000000 benchmark/block-107134.dat
│ │ +drwxr-xr-x   0 bitcartel   (1000) bitcartel   (1000)        0 2017-05-17 00:00:00.000000 benchmark/block-107134-inputs/
│ │ +-rw-r--r--   0 bitcartel   (1000) bitcartel   (1000)       16 2017-05-17 00:00:00.000000 benchmark/block-107134-inputs/CURRENT
│ │ +-rw-r--r--   0 bitcartel   (1000) bitcartel   (1000)        0 2017-05-17 00:00:00.000000 benchmark/block-107134-inputs/LOCK
│ │ +-rw-r--r--   0 bitcartel   (1000) bitcartel   (1000)       50 2017-05-17 00:00:00.000000 benchmark/block-107134-inputs/MANIFEST-000002
│ │ +-rw-r--r--   0 bitcartel   (1000) bitcartel   (1000)  1170023 2017-05-17 00:00:00.000000 benchmark/block-107134-inputs/000003.log

The owner and group can easily be made deterministic with that tar flags --owner=NAME and --group=NAME. The directory listing order might take a bit more work.

str4d · 2017-05-19T04:42:44Z

Looked at how the Gitian descriptors do it - find | sort | tar | gzip.

The archive has also been moved from .tar.gz to .tar.xz for a 33% reduction in size. Closes zcash#2388.

…at-least Remove additional sources of nondeterminism from benchmark archive Closes #2388.

nathan-at-least added A-CI Area: Continuous Integration I-performance Problems and improvements with respect to performance A-testing Area: Tests and testing infrastructure labels May 19, 2017

nathan-at-least mentioned this issue May 19, 2017

Benchmark for calling ConnectBlock on a block with many inputs #2372

Merged

str4d mentioned this issue May 19, 2017

Remove additional sources of nondeterminism from benchmark archive #2389

Merged

str4d added the M-has-pr To-be-removed (GitHub has linked:pr filter) label May 19, 2017

str4d added a commit to str4d/zcash that referenced this issue May 20, 2017

Remove additional sources of determinism from benchmark archive

08dc788

The archive has also been moved from .tar.gz to .tar.xz for a 33% reduction in size. Closes zcash#2388.

zkbot added a commit that referenced this issue May 22, 2017

Auto merge of #2389 - str4d:2388-bench-archive-determinism, r=nathan-…

7ea88c9

…at-least Remove additional sources of nondeterminism from benchmark archive Closes #2388.

zkbot closed this as completed in #2389 May 22, 2017

nathan-at-least added this to Work Queue in Development Infrastructure Jul 3, 2017

nathan-at-least added this to In Progress in Development Infrastructure Jul 3, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the `connectblockslow` testvector generator script to be deterministic. #2388

Fix the `connectblockslow` testvector generator script to be deterministic. #2388

nathan-at-least commented May 19, 2017

str4d commented May 19, 2017 •

edited by bitcartel

Loading

str4d commented May 19, 2017

Fix the connectblockslow testvector generator script to be deterministic. #2388

Fix the connectblockslow testvector generator script to be deterministic. #2388

Comments

nathan-at-least commented May 19, 2017

str4d commented May 19, 2017 • edited by bitcartel Loading

str4d commented May 19, 2017

Fix the `connectblockslow` testvector generator script to be deterministic. #2388

Fix the `connectblockslow` testvector generator script to be deterministic. #2388

str4d commented May 19, 2017 •

edited by bitcartel

Loading