-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Verify that build is reproducible #21733
Conversation
great approach |
@hboutemy I agree. I happen to know that the build is not username/hostname/current directory sensitive, because I've tested that manually but I don't know how to test that in the CI in a reliable way without pushing artifacts somewhere for other runner node to pick it up. I could use github caching for that I guess. |
yes, finding a reasonably simple setup is key: one way is to compare a local build against a local build done in a container or just accept that CI won't absolutely check everything :) |
b4b2317
to
ea40fc0
Compare
.github/workflows/ci.yml
Outdated
cache: 'false' | ||
cleanup-node: true | ||
- name: First build to a local repository | ||
# All dependencies are downloaded to and artifacts generated to build_1 directory, .m2 isn't used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- .m2 -> ~/.m2/repository (in particular, i assume other files from ~/.m2 may still be used)
- isn't used -> isn't used because ...
ref: | | ||
${{ github.event_name == 'repository_dispatch' && | ||
github.event.client_payload.pull_request.head.sha == github.event.client_payload.slash_command.args.named.sha && | ||
format('refs/pull/{0}/head', github.event.client_payload.pull_request.number) || '' }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nineinchnick review this
diff \ | ||
--exclude=maven-metadata-local.xml \ | ||
--exclude=_remote.repositories \ | ||
--exclude="*.lastUpdated" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we should be comparing only the stuff produced by the build -- ie our binaries
we should not touch binaries downloaded from internet, or metadata files about such downloads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hence the diff on the io/trino
ea40fc0
to
67567b7
Compare
67567b7
to
2d8c460
Compare
How much time does this add to the build/CI? |
@martint around |
@martint we can run this only on master if we want to save some time. |
- name: Compare builds | ||
run: | | ||
diff \ | ||
`# Excluding maven metadata files - downloaded dependencies and generated artifacts should be exactly the same` \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this a YAML comment above the run command? Having it as a shells script comment looks weird to me
--exclude=_remote.repositories \ | ||
--exclude="*.lastUpdated" \ | ||
--exclude="resolver-status.properties" \ | ||
-bur build_1/io/trino build_2/io/trino |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move the -bur
above, so that it's easier to see what this command is during. I also prefer to make them separate args for readability:
diff -r -u -b \
--exclude ... \
build_1/io/trino build_2/io/trino
Could also remove the duplication
{build_1,build2}/io/trino
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need the -b
flag? What text files do we have, other than POMs? I assume the primary goal here is to compare the binary JAR files.
cache: false # cache not used, dependencies will be downloaded to `maven.repo.local` | ||
cleanup-node: true # not enough space to fetch dependencies and build twice | ||
- name: First build to a local repository | ||
# ~/.m2/repository isn't used because we are using `maven.repo.local` to store all downloaded dependencies and generated artifacts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about moving this comment above, so that it's shared. We could also reword this to describe what we are doing here:
# Build twice, using `maven.repo.local` to specify a unique local repository for each build.
# ~/.m2/repository is not used here, as the downloaded dependencies and generated artifacts use the specified repo.
- name: First build to a local repository
run: ...
- name: Second build to a local repository
run: ...
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
I just added an independent rebuild comparison: https://github.com/jvm-repo-rebuild/reproducible-central/tree/master/content/io/trino (README will be generated tonight) |
@hboutemy thanks for letting me know. I'll try to fix the rpm to be independent of the hostname |
@hboutemy It is merged now and will be in the 449 release |
@hboutemy 449 is out. Can you try it? |
Oh, I can see that it was already updated: https://github.com/jvm-repo-rebuild/reproducible-central/blob/master/content/io/trino/trino-root-449.buildcompare |
@wendigo yes, batch is now over, result is visible in READMEhttps://github.com/jvm-repo-rebuild/reproducible-central/blob/master/content/io/trino/README.md |
@hboutemy show me the badge :) |
Which artifacts should we use, This displays as:
|
@hboutemy that's what I've added to readme :) |
No description provided.