New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the build reproducible #148

Merged
merged 1 commit into from Sep 15, 2018

Conversation

Projects
None yet
4 participants
@lamby
Contributor

lamby commented Feb 11, 2018

Whilst working on the Reproducible Builds effort [0], we noticed
that node-promise could not be built reproducibly as it uses
random numbers as throwaway identifiers.

This patch uses a determinstic suffix for these identifiers based
on the contents of the src/ directory.

This was filed in @Debian as https://bugs.debian.org/886277. There was
also a previous aborted attempt in #146.

[0] https://reproducible-builds.org/
[1] #146

Signed-off-by: Chris Lamb chris@chris-lamb.co.uk

@lamby

This comment has been minimized.

Contributor

lamby commented Feb 11, 2018

I'm not entirely sure what happened in #146, but trying again here assuming good faith all round :)

@coveralls

This comment has been minimized.

coveralls commented Feb 11, 2018

Coverage Status

Coverage remained the same at 95.783% when pulling c62e840 on lamby:reproducible-build-2 into cf30e9f on then:master.

@kpcyrd

This comment has been minimized.

kpcyrd commented Feb 11, 2018

@lamby fs.readdirSync uses readdir(3) underneath which has non-deterministic ordering according to the documentation:

       The order in which filenames are read by successive calls to
       readdir() depends on the filesystem implementation; it is unlikely
       that the names will be sorted in any fashion.

I think this list needs to be sorted explicitly.

@lamby

This comment has been minimized.

Contributor

lamby commented Feb 11, 2018

I think this list needs to be sorted explicitly.

Thanks, updated. (Curiously, when I was trying the ++idx patch I was running under disorderfs and, as I wasn't seeing any variation, I thus assumed readdirSync was sorted.)

build.js Outdated
ids.push(id);
names[name] = id;
return id;
return '_' + crypto.pbkdf2Sync(name, salt, 0, 2, 'sha512').toString('hex');

This comment has been minimized.

@ForbesLindesay

ForbesLindesay Feb 12, 2018

Member

Is it possible that two names could collide? 4 chars doesn't seem like very many but I also want to keep these identifiers as short as possible since minifiers rarely handle mangling property names.

We could keep the old code that caches the mapping from name to id, and keeps trying new ids until a unique one is found:

  if (name in names) return names[name];
  var id;
  var keyLength = 1;
  do {
    id = '_' + crypto.pbkdf2Sync(
      name,
      salt,
      0, Math.ceil(keyLength / 2),
      'sha512'
    ).toString('hex').substr(0, keyLength);
    keyLength++;
  } while (ids.indexOf(id) !== -1)
  ids.push(id);
  names[name] = id;
  return id;

This would start us with one character identifiers, but simply add another character whenever a collision occurred, until a unique identifier was found.

This comment has been minimized.

@lamby

lamby Feb 12, 2018

Contributor

Mm, 16^4 == 65536 so 32768 with the birthday paradox. I think a cleaner solution than the re-use loop would be to increase the bytes but change the representation/base from [0-9a-e] to something like [0-9a-zA-Z] to keep it at 4 ascii chars.

This comment has been minimized.

@ForbesLindesay

ForbesLindesay Feb 12, 2018

Member

I'd prefer to remove all risk of a collision rather than just adding entropy. Swapping to [0-9a-zA-Z] would be a good idea, but ideally I'd like it to use single character identifiers where possible. I remember it making a measurable difference to performance benchmarks when going from full names to these mangled, shorter names.

This comment has been minimized.

@kpcyrd

kpcyrd Feb 12, 2018

What if we select one character in the range of [a-z] randomly to ensure no variable starts with a number, then append a base36 ([0-9a-z]) counter for each name we issue?

This comment has been minimized.

@lamby

lamby Feb 12, 2018

Contributor

@kpcyrd (That's not seeded on the contents of src/.)

This comment has been minimized.

@ForbesLindesay

ForbesLindesay Feb 12, 2018

Member

They need to start with _ anyway to mark them as private. It therefore doesn't need to start with a letter. We could use:

const names = {};
const characterSet = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
let i = characterSet.indexOf(shasum.digest('base64').replace(/[^0-9a-zA-Z]/g, '')[0]);

function getIdFor(name) {
  if (names[name]) return names[name];
  return names[name] = '_' + characterSet[i++ % characterSet.length]
}

This would be simpler than the what we have been using, and it would give us a single character identifier for up to 62 identifiers (and we definitely don't have anywhere near that many).

@lamby

This comment has been minimized.

Contributor

lamby commented Aug 27, 2018

Hey folks, any update on this?

@ForbesLindesay

This comment has been minimized.

Member

ForbesLindesay commented Aug 27, 2018

@lamby The ball is in your court. I've explained what I'd like to see in order to merge this:

  1. Single character identifier names where possible
  2. Zero chance of collision in generated names (simple to achieve using something like the code I posted)
Make the build reproducible
Whilst working on the Reproducible Builds effort [0], we noticed
that node-promise could not be built reproducibly as it uses
random numbers as throwaway identifiers.

This patch uses a determinstic name for these identifiers based on the
contents of the src/ directory.

This was filed in @Debian as https://bugs.debian.org/886277. There was
also a previous aborted attempt in #146.

 [0] https://reproducible-builds.org/
 [1] #146

Signed-off-by: Chris Lamb <chris@chris-lamb.co.uk>
@lamby

This comment has been minimized.

Contributor

lamby commented Aug 28, 2018

Thanks @ForbesLindesay, I must have missed that comment. Updated patch pushed, rebased against master. Thanks o/

@ForbesLindesay ForbesLindesay merged commit 6a376fe into then:master Sep 15, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage remained the same at 95.783%
Details

@lamby lamby deleted the lamby:reproducible-build-2 branch Sep 15, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment