New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the build reproducible #148
Conversation
|
I'm not entirely sure what happened in #146, but trying again here assuming good faith all round :) |
|
@lamby fs.readdirSync uses readdir(3) underneath which has non-deterministic ordering according to the documentation: I think this list needs to be sorted explicitly. |
a6d0176
to
1250870
Compare
Thanks, updated. (Curiously, when I was trying the |
build.js
Outdated
| ids.push(id); | ||
| names[name] = id; | ||
| return id; | ||
| return '_' + crypto.pbkdf2Sync(name, salt, 0, 2, 'sha512').toString('hex'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that two names could collide? 4 chars doesn't seem like very many but I also want to keep these identifiers as short as possible since minifiers rarely handle mangling property names.
We could keep the old code that caches the mapping from name to id, and keeps trying new ids until a unique one is found:
if (name in names) return names[name];
var id;
var keyLength = 1;
do {
id = '_' + crypto.pbkdf2Sync(
name,
salt,
0, Math.ceil(keyLength / 2),
'sha512'
).toString('hex').substr(0, keyLength);
keyLength++;
} while (ids.indexOf(id) !== -1)
ids.push(id);
names[name] = id;
return id;This would start us with one character identifiers, but simply add another character whenever a collision occurred, until a unique identifier was found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mm, 16^4 == 65536 so 32768 with the birthday paradox. I think a cleaner solution than the re-use loop would be to increase the bytes but change the representation/base from [0-9a-e] to something like [0-9a-zA-Z] to keep it at 4 ascii chars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to remove all risk of a collision rather than just adding entropy. Swapping to [0-9a-zA-Z] would be a good idea, but ideally I'd like it to use single character identifiers where possible. I remember it making a measurable difference to performance benchmarks when going from full names to these mangled, shorter names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we select one character in the range of [a-z] randomly to ensure no variable starts with a number, then append a base36 ([0-9a-z]) counter for each name we issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kpcyrd (That's not seeded on the contents of src/.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They need to start with _ anyway to mark them as private. It therefore doesn't need to start with a letter. We could use:
const names = {};
const characterSet = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
let i = characterSet.indexOf(shasum.digest('base64').replace(/[^0-9a-zA-Z]/g, '')[0]);
function getIdFor(name) {
if (names[name]) return names[name];
return names[name] = '_' + characterSet[i++ % characterSet.length]
}This would be simpler than the what we have been using, and it would give us a single character identifier for up to 62 identifiers (and we definitely don't have anywhere near that many).
|
Hey folks, any update on this? |
|
@lamby The ball is in your court. I've explained what I'd like to see in order to merge this:
|
Whilst working on the Reproducible Builds effort [0], we noticed that node-promise could not be built reproducibly as it uses random numbers as throwaway identifiers. This patch uses a determinstic name for these identifiers based on the contents of the src/ directory. This was filed in @Debian as https://bugs.debian.org/886277. There was also a previous aborted attempt in then#146. [0] https://reproducible-builds.org/ [1] then#146 Signed-off-by: Chris Lamb <chris@chris-lamb.co.uk>
1250870
to
c62e840
Compare
|
Thanks @ForbesLindesay, I must have missed that comment. Updated patch pushed, rebased against |
Whilst working on the Reproducible Builds effort [0], we noticed
that node-promise could not be built reproducibly as it uses
random numbers as throwaway identifiers.
This patch uses a determinstic suffix for these identifiers based
on the contents of the src/ directory.
This was filed in @Debian as https://bugs.debian.org/886277. There was
also a previous aborted attempt in #146.
[0] https://reproducible-builds.org/
[1] #146
Signed-off-by: Chris Lamb chris@chris-lamb.co.uk