Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify that build is reproducible #21733

Closed
wants to merge 1 commit into from

Conversation

wendigo
Copy link
Contributor

@wendigo wendigo commented Apr 27, 2024

No description provided.

@hboutemy
Copy link
Contributor

great approach
notice that given the 2 builds happen on the exact same environment, you won't detect any impact of hostname, username, current directory: it should be easy and very valuable to do the second build from a renamed directory to detect if any current directory is used (that's what happens the most in real life)

@wendigo
Copy link
Contributor Author

wendigo commented Apr 28, 2024

@hboutemy I agree. I happen to know that the build is not username/hostname/current directory sensitive, because I've tested that manually but I don't know how to test that in the CI in a reliable way without pushing artifacts somewhere for other runner node to pick it up. I could use github caching for that I guess.

@hboutemy
Copy link
Contributor

yes, finding a reasonably simple setup is key: one way is to compare a local build against a local build done in a container

or just accept that CI won't absolutely check everything :)

.github/workflows/ci.yml Outdated Show resolved Hide resolved
.github/workflows/ci.yml Outdated Show resolved Hide resolved
.github/workflows/ci.yml Outdated Show resolved Hide resolved
@wendigo wendigo force-pushed the serafin/reproducible-build-test branch from b4b2317 to ea40fc0 Compare April 29, 2024 11:47
@wendigo wendigo requested a review from findepi April 29, 2024 12:01
.github/workflows/ci.yml Outdated Show resolved Hide resolved
.github/workflows/ci.yml Outdated Show resolved Hide resolved
.github/workflows/ci.yml Outdated Show resolved Hide resolved
.github/workflows/ci.yml Outdated Show resolved Hide resolved
.github/workflows/ci.yml Outdated Show resolved Hide resolved
cache: 'false'
cleanup-node: true
- name: First build to a local repository
# All dependencies are downloaded to and artifacts generated to build_1 directory, .m2 isn't used
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • .m2 -> ~/.m2/repository (in particular, i assume other files from ~/.m2 may still be used)
  • isn't used -> isn't used because ...

.github/workflows/ci.yml Outdated Show resolved Hide resolved
Comment on lines +97 to +100
ref: |
${{ github.event_name == 'repository_dispatch' &&
github.event.client_payload.pull_request.head.sha == github.event.client_payload.slash_command.args.named.sha &&
format('refs/pull/{0}/head', github.event.client_payload.pull_request.number) || '' }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nineinchnick review this

.github/workflows/ci.yml Outdated Show resolved Hide resolved
diff \
--exclude=maven-metadata-local.xml \
--exclude=_remote.repositories \
--exclude="*.lastUpdated" \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should be comparing only the stuff produced by the build -- ie our binaries
we should not touch binaries downloaded from internet, or metadata files about such downloads

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hence the diff on the io/trino

@wendigo wendigo force-pushed the serafin/reproducible-build-test branch from ea40fc0 to 67567b7 Compare April 29, 2024 12:25
@wendigo wendigo force-pushed the serafin/reproducible-build-test branch from 67567b7 to 2d8c460 Compare April 29, 2024 13:03
@martint
Copy link
Member

martint commented Apr 29, 2024

How much time does this add to the build/CI?

@wendigo
Copy link
Contributor Author

wendigo commented Apr 29, 2024

@martint around 12 17 minutes

@wendigo
Copy link
Contributor Author

wendigo commented Apr 29, 2024

@martint we can run this only on master if we want to save some time.

- name: Compare builds
run: |
diff \
`# Excluding maven metadata files - downloaded dependencies and generated artifacts should be exactly the same` \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this a YAML comment above the run command? Having it as a shells script comment looks weird to me

--exclude=_remote.repositories \
--exclude="*.lastUpdated" \
--exclude="resolver-status.properties" \
-bur build_1/io/trino build_2/io/trino
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the -bur above, so that it's easier to see what this command is during. I also prefer to make them separate args for readability:

diff -r -u -b \
    --exclude ... \
    build_1/io/trino build_2/io/trino

Could also remove the duplication

{build_1,build2}/io/trino

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the -b flag? What text files do we have, other than POMs? I assume the primary goal here is to compare the binary JAR files.

cache: false # cache not used, dependencies will be downloaded to `maven.repo.local`
cleanup-node: true # not enough space to fetch dependencies and build twice
- name: First build to a local repository
# ~/.m2/repository isn't used because we are using `maven.repo.local` to store all downloaded dependencies and generated artifacts
Copy link
Member

@electrum electrum May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about moving this comment above, so that it's shared. We could also reword this to describe what we are doing here:

# Build twice, using `maven.repo.local` to specify a unique local repository for each build.
# ~/.m2/repository is not used here, as the downloaded dependencies and generated artifacts use the specified repo.
- name: First build to a local repository
  run: ...
- name: Second build to a local repository
  run: ...

Copy link

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label May 24, 2024
@wendigo wendigo added the stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed. label May 24, 2024
@hboutemy
Copy link
Contributor

I just added an independent rebuild comparison: https://github.com/jvm-repo-rebuild/reproducible-central/tree/master/content/io/trino (README will be generated tonight)
result is really good: only 1 different file = the rpm file, that at least contains the hostname of the machine creating it https://github.com/jvm-repo-rebuild/reproducible-central/blob/master/content/io/trino/trino-root-448.diffoscope

@wendigo
Copy link
Contributor Author

wendigo commented May 25, 2024

@hboutemy thanks for letting me know. I'll try to fix the rpm to be independent of the hostname

@wendigo
Copy link
Contributor Author

wendigo commented May 25, 2024

@hboutemy this will fix the rpm: #22135

@electrum
Copy link
Member

@hboutemy It is merged now and will be in the 449 release

@wendigo
Copy link
Contributor Author

wendigo commented Jun 1, 2024

@hboutemy 449 is out. Can you try it?

@wendigo
Copy link
Contributor Author

wendigo commented Jun 1, 2024

@hboutemy
Copy link
Contributor

hboutemy commented Jun 1, 2024

@wendigo yes, batch is now over, result is visible in READMEhttps://github.com/jvm-repo-rebuild/reproducible-central/blob/master/content/io/trino/README.md
congrats, release 449 is confirmed fully reproducible! you can add your badge :)

@wendigo
Copy link
Contributor Author

wendigo commented Jun 3, 2024

@hboutemy show me the badge :)

@nineinchnick
Copy link
Member

nineinchnick commented Jun 3, 2024

https://github.com/jvm-repo-rebuild/reproducible-central/blob/master/doc/TOOLS.md#4-add-reproducible-builds-badge-to-a-project-with-reproducible-releases

Which artifacts should we use, io.trino:trino-server?

This displays as:

[![Reproducible Builds](https://img.shields.io/badge/Reproducible_Builds-ok-success?labelColor=1e5b96)](https://github.com/jvm-repo-rebuild/reproducible-central#io.trino:trino-root)

Reproducible Builds

@hboutemy
Copy link
Contributor

hboutemy commented Jun 5, 2024

@wendigo
Copy link
Contributor Author

wendigo commented Jun 5, 2024

@hboutemy that's what I've added to readme :)

@wendigo wendigo closed this Jun 21, 2024
@wendigo wendigo deleted the serafin/reproducible-build-test branch June 27, 2024 12:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed stale stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed.
Development

Successfully merging this pull request may close these issues.

6 participants