New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the build reproducible #146
Conversation
|
The identifiers are purposely not predictable from release to release, they should at least incorporate some version-dependent seed. |
|
The identifiers are purposely not predictable from release to release
How come?
…--
Chris Lamb
chris-lamb.co.uk / @lolamby
|
|
This is to guard against people using these internal properties. Alternative approaches like Symbols still have too much of a performance hit, so we just randomise it each time we do a new release, that way anyone who attempts to directly access internal state will find their module broken on each new release of I would be happy to accept something that seeded the random number generator using, say, the contents of all the files in src - that way it would change in an unpredictable way between releases, but the build would be reproducible providing the exact source was used. |
Whilst working on the Reproducible Builds effort [0], we noticed that node-promise could not be built reproducibly as it uses random numbers as throwaway identifiers. This patch uses determinstic numbers based on SOURCE_DATE_EPOCH[1]. [0] https://reproducible-builds.org/ [1] https://reproducible-builds.org/specs/source-date-epoch/ Signed-off-by: Chris Lamb <chris@chris-lamb.co.uk>
566ebe9
to
6e6cf52
Compare
|
@ForbesLindesay Cool, that kinda makes sense. I've updated the PR to accomodate this. |
| @@ -5,16 +5,12 @@ var rimraf = require('rimraf'); | |||
| var acorn = require('acorn'); | |||
| var walk = require('acorn/dist/walk'); | |||
|
|
|||
| var ids = []; | |||
| var idx = process.env.SOURCE_DATE_EPOCH || parseInt(Math.random() * 1000, 10); | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless we set SOURCE_DATE_EPOCH, which we're not going to do, you're not going to be able to reproduce our build, which is what I assume you're going to want to do. The parseInt belongs around SOURCE_DATE_EPOCH which is the bit that's actually a string. You can just use Math.floor or Math.round to convert the floating point number returned by Math.random into an integer.
My advice would be to have this read in all the files in src and use them to generate a sha512 hash, then base64 encode that and remove characters that don't match [a-zA-Z0-9]. This should give you a number of fairly random seeming (but consistent as long as the src doesn't change) characters. You can then have the getIdFor function take the first unused character from that string, and failing that the first un-used pair from that string, and failing that the first un-used tripple from that string and so on. That should comfortably provide enough identifiers given that we only really need about 5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless we set SOURCE_DATE_EPOCH, which we're not going to do, you're not going to be able to reproduce our build
Not sure why you would want to reproduce your build per-se? Thse use-case is people wanting to reproduce a specific build given a known build environment (ie. the value of SOURCE_DATE_EPOCH).
My advice would be to have this read in all the files in src and use them to generate a sha512 hash, then base64 encode [..]
Whilst this would certianly work, this seems a somewhat excessive amount of work for such a smallish sub-feature.. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why you would want to reproduce your build
We already publish a build, so there’s no reason to need a way to build it yourself, other than verifying that out build is genuinely the result of building from source. People might want to do that to check we didn’t introduce a backdoor into the compiled code. To do that, they would need to reproduce our build. I can’t think why anybody would need to reproduce a separate build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can’t think why anybody would need to reproduce a separate build.
In Debian, or example. Or anyone rebuilding from source, really... :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would anyone build from source? This isn’t platform dependent, everyone is going to want to run JavaScript at the end of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would anyone build from source?
Hm? Why would they not? :p
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They wouldn't because we already provide pre-built versions. There is no reason to build from source. It is up to you to provide reasons why you want to build this module from source, not me to tell you why I think you don't need it. This is your feature request, it's up to you to justify its value! Without some proper justification, I'm going to lose interest in this discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, building from source is the part that empowers all the free/open source software world to do the things they do. E.g. in openSUSE we try to build everything from source and sometimes we even add patches on top of upstream sources.
Sorry for disturb, but why? Why this library needs to guard against people using these internal properties? Why is documentation warning not enough? |
They're too tempting/convenient. If they weren't well hidden, people would use them. Most people using this would probably (incorrectly) decide it's worth the risk of breakages in the future to have a little convenience now. By rotating these properties at random, we make sure that you feel that pain sooner rather than later. Lots of libraries on npm depend on internal features of other libraries, sometimes that's not too big a deal. In this case, that would be disastrous, so we prevent it. |
|
I'm going to close this until a reasonable justification for the feature request is provided. Making it possible to verify our build, I would support, but this doesn't do that. Helping people who just want to waste electricity reproducing work we've already done for them, I do not understand. |
Hm, I think one of us has some confusion about the goal of reproducible builds. Indeed, this change would surely make it possible to verify your build, ie. from the same sources we get the exact same outputs. :) |
|
No. This change would make it possible to verify a build, but not our build. We wouldn't be setting the |
True, and apologies as I now see you already wrote that. However I would disagree with
.. in that downstreams (eg. Debian) are building this library and distributing binaries/builds of it. There is great value in being able to reproduce this result. |
|
Debian should not be building this library from source. It is a waste of electricity and time. I am happy to make a firm commitment to never helping them do something that wasteful. |
Please could you elaborate why in more detail? Debian is a free software distribution - simply using upstream's provided artefacts as-is does not align with our goals of letting our users (or downstream distributions) make modifications to the code which — naturally — requires that binaries are built from their sources. |
|
@ForbesLindesay how can one be sure that the build he has is from you and not altered by a third party if it's not reproducible from sources? |
|
@P-EB Exactly, if @lamby was asking us to make our build reproducible, that would be fine. That would allow him to verify our published build against our sources. What @lamby is asking for does not let him reproduce our build. It lets him create a separate, reproducible build. This has no value.
This is already open source software. If people want to modify our promise implementation, they can fork it here, publish a new package, and build a fork of debian that points at that forked promise implementation. This is already how the entire node ecosystem works. Modifications require that it is possible to go and build something from source, doing so if you're not making modifications is pointless. |
|
I grant the distinction between your builds and the builds of others, but putting aside the issue of validating that your published build matches its sources, I'm afraid I simply don't follow why you think reproducing a separate build is of no value. :) |
Should your build be reproducible then this question wouldn't exist at all. The fact that it isn't also means that someone taking your blobs and your codebase can't verify that everything went well. So, may we agree that there is an issue in the current situation of your codebase that requires to be fixed by putting deterministic identifiers (the fact that they change across releases is something simple to achieve)? |
|
@P-EB Yes, as I've always maintained, I'm happy to accept a pull request that has this outcome. I have rejected this one because it does not. @lamby I'm tired of trying to explain this. The burden of proof is on you to provide an example of a benefit that results from being able to create a reproducible build of a library, but not being able to reproduce the official build. It is not an end unto itself; you must provide an example of a positive benefit to this. I cannot conclusively prove it is useless, because there could always be a use case that I have yet to think of, but I haven't been able to come up with one, and you've had plenty of time and opportunity to provide one. You are asking me to review, and then maintain, code that does something that I see no evidence needs doing. |
|
I'm missing something then as, to me, the use case has been explicit and implicit throughout the PR - Debian is shipping this library, building it from source. (Not that it makes much difference but FYI you certainly hold the minority opinion in the free software community on this :p.) |
I doubt it: npm had 753,071,751 downloads in the last day and almost no npm packages are set up to build from source on download. This package alone had 586,003 downloads in the last day, none of those will be building from source, because if they were they would get it from GitHub. As I've said:
I will accept a well written pull request to make our build reproducible, but I'm going to lock this issue as debating with you is not a productive use of anyone's time. |
|
I feel decently sure that we're not going to get Debian to change their policy (nor do I feel they should), and we could quite reasonably derive |
|
Verifying our build has broad benefits, it lets you verify that the build everyone is using is correct. Building from source for yourself only verifies your own personal build. Re changing Debian's policy - just because I can't fix what they're doing doesn't mean I have to support it. They're not paying me so I have no obligation to help them and I believe the approach they are taking is wasting people's time and computing power. If this pull request included code to set |
Whilst working on the Reproducible Builds effort [0], we noticed
that node-promise could not be built reproducibly as it uses
random numbers as throwaway identifiers.
This patch uses determinstic numbers instead.
This was filed in @Debian as https://bugs.debian.org/886277
[0] https://reproducible-builds.org/
Signed-off-by: Chris Lamb chris@chris-lamb.co.uk