Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the build reproducible #148

Merged
merged 1 commit into from Sep 15, 2018
Merged

Conversation

lamby
Copy link
Contributor

@lamby lamby commented Feb 11, 2018

Whilst working on the Reproducible Builds effort [0], we noticed
that node-promise could not be built reproducibly as it uses
random numbers as throwaway identifiers.

This patch uses a determinstic suffix for these identifiers based
on the contents of the src/ directory.

This was filed in @Debian as https://bugs.debian.org/886277. There was
also a previous aborted attempt in #146.

[0] https://reproducible-builds.org/
[1] #146

Signed-off-by: Chris Lamb chris@chris-lamb.co.uk

@lamby
Copy link
Contributor Author

lamby commented Feb 11, 2018

I'm not entirely sure what happened in #146, but trying again here assuming good faith all round :)

@coveralls
Copy link

coveralls commented Feb 11, 2018

Coverage Status

Coverage remained the same at 95.783% when pulling c62e840 on lamby:reproducible-build-2 into cf30e9f on then:master.

@kpcyrd
Copy link

kpcyrd commented Feb 11, 2018

@lamby fs.readdirSync uses readdir(3) underneath which has non-deterministic ordering according to the documentation:

       The order in which filenames are read by successive calls to
       readdir() depends on the filesystem implementation; it is unlikely
       that the names will be sorted in any fashion.

I think this list needs to be sorted explicitly.

@lamby
Copy link
Contributor Author

lamby commented Feb 11, 2018

I think this list needs to be sorted explicitly.

Thanks, updated. (Curiously, when I was trying the ++idx patch I was running under disorderfs and, as I wasn't seeing any variation, I thus assumed readdirSync was sorted.)

build.js Outdated
ids.push(id);
names[name] = id;
return id;
return '_' + crypto.pbkdf2Sync(name, salt, 0, 2, 'sha512').toString('hex');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that two names could collide? 4 chars doesn't seem like very many but I also want to keep these identifiers as short as possible since minifiers rarely handle mangling property names.

We could keep the old code that caches the mapping from name to id, and keeps trying new ids until a unique one is found:

  if (name in names) return names[name];
  var id;
  var keyLength = 1;
  do {
    id = '_' + crypto.pbkdf2Sync(
      name,
      salt,
      0, Math.ceil(keyLength / 2),
      'sha512'
    ).toString('hex').substr(0, keyLength);
    keyLength++;
  } while (ids.indexOf(id) !== -1)
  ids.push(id);
  names[name] = id;
  return id;

This would start us with one character identifiers, but simply add another character whenever a collision occurred, until a unique identifier was found.

Copy link
Contributor Author

@lamby lamby Feb 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mm, 16^4 == 65536 so 32768 with the birthday paradox. I think a cleaner solution than the re-use loop would be to increase the bytes but change the representation/base from [0-9a-e] to something like [0-9a-zA-Z] to keep it at 4 ascii chars.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to remove all risk of a collision rather than just adding entropy. Swapping to [0-9a-zA-Z] would be a good idea, but ideally I'd like it to use single character identifiers where possible. I remember it making a measurable difference to performance benchmarks when going from full names to these mangled, shorter names.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we select one character in the range of [a-z] randomly to ensure no variable starts with a number, then append a base36 ([0-9a-z]) counter for each name we issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kpcyrd (That's not seeded on the contents of src/.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They need to start with _ anyway to mark them as private. It therefore doesn't need to start with a letter. We could use:

const names = {};
const characterSet = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
let i = characterSet.indexOf(shasum.digest('base64').replace(/[^0-9a-zA-Z]/g, '')[0]);

function getIdFor(name) {
  if (names[name]) return names[name];
  return names[name] = '_' + characterSet[i++ % characterSet.length]
}

This would be simpler than the what we have been using, and it would give us a single character identifier for up to 62 identifiers (and we definitely don't have anywhere near that many).

@lamby
Copy link
Contributor Author

lamby commented Aug 27, 2018

Hey folks, any update on this?

@ForbesLindesay
Copy link
Member

@lamby The ball is in your court. I've explained what I'd like to see in order to merge this:

  1. Single character identifier names where possible
  2. Zero chance of collision in generated names (simple to achieve using something like the code I posted)

Whilst working on the Reproducible Builds effort [0], we noticed
that node-promise could not be built reproducibly as it uses
random numbers as throwaway identifiers.

This patch uses a determinstic name for these identifiers based on the
contents of the src/ directory.

This was filed in @Debian as https://bugs.debian.org/886277. There was
also a previous aborted attempt in then#146.

 [0] https://reproducible-builds.org/
 [1] then#146

Signed-off-by: Chris Lamb <chris@chris-lamb.co.uk>
@lamby
Copy link
Contributor Author

lamby commented Aug 28, 2018

Thanks @ForbesLindesay, I must have missed that comment. Updated patch pushed, rebased against master. Thanks o/

@ForbesLindesay ForbesLindesay merged commit 6a376fe into then:master Sep 15, 2018
@lamby lamby deleted the reproducible-build-2 branch September 15, 2018 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants