Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the connectblockslow testvector generator script to be deterministic. #2388

Closed
nathan-at-least opened this issue May 19, 2017 · 2 comments
Labels
A-CI Area: Continuous Integration A-testing Area: Tests and testing infrastructure I-performance Problems and improvements with respect to performance M-has-pr To-be-removed (GitHub has linked:pr filter)

Comments

@nathan-at-least
Copy link
Contributor

In this PR review @bitcartel determined that he could not reproduce the same hash for the connectblockslow testvector that @str4d created.

We timeboxed fixing that non-determinism and hit the time limit, and decided to deploy with a 'baked in' non-deterministic testvector for the time being.

The purpose of this ticket is to fix that non-determinism so that anyone can recreate the same benchmark on identical data. After that is done, we should measure that against today's 'special instance' of the test vector to ensure there's no performance difference, then update our benchmarking system to use the deterministic dataset.

This is important for a variety of reasons. One is to detect regressions in the determinism itself, as well as to ensure repeatability of measurements.

@nathan-at-least nathan-at-least added A-CI Area: Continuous Integration I-performance Problems and improvements with respect to performance A-testing Area: Tests and testing infrastructure labels May 19, 2017
@str4d
Copy link
Contributor

str4d commented May 19, 2017

@bitcartel sent me the archive he made, and the non-determinism is obvious:

$ diffoscope block-107134-str4d.tar.gz block-107134-bitcartel.tar.gz 
 |################################################################################################################|  100%                             Time: 0:00:00 
--- block-107134-str4d.tar.gz
+++ block-107134-bitcartel.tar.gz
│   --- block-107134-str4d.tar
├── +++ block-107134-bitcartel.tar
├── file list
│ │ @@ -1,7 +1,7 @@
│ │ -drwxrwxr-x   0 str4d     (1000) str4d     (1000)        0 2017-05-17 00:00:00.000000 benchmark/
│ │ -drwxr-xr-x   0 str4d     (1000) str4d     (1000)        0 2017-05-17 00:00:00.000000 benchmark/block-107134-inputs/
│ │ --rw-r--r--   0 str4d     (1000) str4d     (1000)        0 2017-05-17 00:00:00.000000 benchmark/block-107134-inputs/LOCK
│ │ --rw-rw-r--   0 str4d     (1000) str4d     (1000)       50 2017-05-17 00:00:00.000000 benchmark/block-107134-inputs/MANIFEST-000002
│ │ --rw-rw-r--   0 str4d     (1000) str4d     (1000)       16 2017-05-17 00:00:00.000000 benchmark/block-107134-inputs/CURRENT
│ │ --rw-rw-r--   0 str4d     (1000) str4d     (1000)  1170023 2017-05-17 00:00:00.000000 benchmark/block-107134-inputs/000003.log
│ │ --rw-rw-r--   0 str4d     (1000) str4d     (1000)  1996220 2017-05-17 00:00:00.000000 benchmark/block-107134.dat
│ │ +drwxr-xr-x   0 bitcartel   (1000) bitcartel   (1000)        0 2017-05-17 00:00:00.000000 benchmark/
│ │ +-rw-r--r--   0 bitcartel   (1000) bitcartel   (1000)  1996220 2017-05-17 00:00:00.000000 benchmark/block-107134.dat
│ │ +drwxr-xr-x   0 bitcartel   (1000) bitcartel   (1000)        0 2017-05-17 00:00:00.000000 benchmark/block-107134-inputs/
│ │ +-rw-r--r--   0 bitcartel   (1000) bitcartel   (1000)       16 2017-05-17 00:00:00.000000 benchmark/block-107134-inputs/CURRENT
│ │ +-rw-r--r--   0 bitcartel   (1000) bitcartel   (1000)        0 2017-05-17 00:00:00.000000 benchmark/block-107134-inputs/LOCK
│ │ +-rw-r--r--   0 bitcartel   (1000) bitcartel   (1000)       50 2017-05-17 00:00:00.000000 benchmark/block-107134-inputs/MANIFEST-000002
│ │ +-rw-r--r--   0 bitcartel   (1000) bitcartel   (1000)  1170023 2017-05-17 00:00:00.000000 benchmark/block-107134-inputs/000003.log

The owner and group can easily be made deterministic with that tar flags --owner=NAME and --group=NAME. The directory listing order might take a bit more work.

@str4d
Copy link
Contributor

str4d commented May 19, 2017

Looked at how the Gitian descriptors do it - find | sort | tar | gzip.

@str4d str4d added the M-has-pr To-be-removed (GitHub has linked:pr filter) label May 19, 2017
str4d added a commit to str4d/zcash that referenced this issue May 20, 2017
The archive has also been moved from .tar.gz to .tar.xz for a
33% reduction in size.

Closes zcash#2388.
zkbot added a commit that referenced this issue May 22, 2017
…at-least

Remove additional sources of nondeterminism from benchmark archive

Closes #2388.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-CI Area: Continuous Integration A-testing Area: Tests and testing infrastructure I-performance Problems and improvements with respect to performance M-has-pr To-be-removed (GitHub has linked:pr filter)
Projects
No open projects
Development

No branches or pull requests

2 participants