Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tracking] Reproducible builds #1666

Open
7 of 12 tasks
gendx opened this issue Mar 5, 2020 · 17 comments
Open
7 of 12 tasks

[Tracking] Reproducible builds #1666

gendx opened this issue Mar 5, 2020 · 17 comments
Labels

Comments

@gendx
Copy link
Contributor

gendx commented Mar 5, 2020

Context

When working on google/OpenSK#67, we realized that building some boards was failing to build on my desktop, but not on Travis-CI. The reason was that the paths on the filesystem are stored in the binary. These are for example included by Rust to indicate the line where some code panicked. You can check how many of these are in your binary by running strings on it.

Because the stored paths are absolute, the size they take depends on which directory is used for building Tock, and therefore the resulting binary size will increase or decrease accordingly (can be a few KB).

But because we use custom linker scripts with a fixed layout such as:

MEMORY
{
rom (rx) : ORIGIN = 0x00000000, LENGTH = 128K
prog (rx) : ORIGIN = 0x00030000, LENGTH = 832K
ram (rwx) : ORIGIN = 0x20000000, LENGTH = 256K
}
MPU_MIN_ALIGN = 8K;
INCLUDE ../../kernel_layout.ld

it's possible that someone hits the limit of the rom section's size, whereas Travis-CI and/or other developers don't hit this threshold.

Reproducible builds

Reproducible builds have multiple advantages.

  • One is to be able to re-build Tock on another environment and still obtain binaries with the exact same cryptographic hash.
  • The fact that Tock uses custom linker scripts with hard constraints on code size is another advantage, to make sure that all developers have the same view of it, rather than having a layout file working for Alice but not for Bob.

Making builds reproducible isn't trivial, as it requires the Rust compiler (and LLVM) to give us reproducible results. But some things (such as rust-toolchain) are in Tock's control, and therefore I think it makes sense to track what's currently allowing and preventing reproducible builds.

  • Pinned Rust compiler. The rust-toolchain file in the root directory pins a given version of (nightly) Rust, so that everyone uses the same compiler.
  • (Add --remap-path-prefix to make builds more deterministic and smaller. #1668) Build directory present in the binary. As explained here, the build directory's absolute path is present in the binary. This affects binary size. Potential solutions:
    • Trimming all paths (e.g. with a "no debug" build without debug information).
    • Making all paths relative to the build directory. This would require support in the Rust compiler. This could be done with the --remap-path-prefix parameter of the Rust compiler.
  • ([Bug] [Bad nightly version] Running make ci-travis twice triggers "unused attribute" errors. #1657) Idempotent make rules Buggy nightly compiler. On some nightly version(s) (observed on nightly-2020-02-27), running make ci-travis twice in a row on the same code triggers different warnings the second time.
  • (Print a SHA-256 sum of built binaries, to check for reproducibility. #1669) Printing a cryptographic hash of built boards. This will allow one to check whether they obtained the same binaries as Travis (or other CI services).
  • (Use a Cargo workspace to make builds reproducible and speed up CI. #1714) Building everything in a common Cargo workspace. This will allow Cargo to identify all the involved crates as relative paths within the workspace, instead of (non-reproducible) absolute paths on the filesystem. See [Tracking] Reproducible builds #1666 (comment).
  • Check the cryptographic hashes of boards on a single machine. Run make ci-travis on two distinct build folders on the same machine (e.g. /path/to/tock and /path/to/tock2) and compare the checksums. If they don't match, investigate why and add item(s) to this list.
  • (Implement a --deterministic mode for reproducibility. elf2tab#16) Deterministic mode for elf2tab. Sources of non-determinism in elf2tab are filesystem metadata in the TAR layer (timestamp, permissions, user ID), and a build timestamp.
  • Builds are reproducible regardless of the OS. As shown in Add GitHub workflow to check that binaries are reproducible. google/OpenSK#94, the cryptographic hashes obtained on Linux currently don't match those obtained on MacOS.
  • Publish cryptographic hashes of builds. There could be some CI workflow in GitHub (running on each commit), which would build all the boards and publish the cryptographic hashes of the binaries. Then anyone can build the binaries on their machine and compare them with the published hashes.
  • Use a fixed TOCK_KERNEL_VERSION value. Currently, this value (embedded in some panic error messages) depends on the latest git commit. Because Travis-CI adds a merge commit to check each pull-request, the TOCK_KERNEL_VERSION will be different than on a local developing branch. There could be a rule or option in the Makefiles to build with a fixed or provided TOCK_KERNEL_VERSION value instead.
    # Ask git what version of the Tock kernel we are compiling, so we can include
    # this within the binary. If Tock is not within a git repo then we fallback to
    # a set string which should be updated with every release.
    export TOCK_KERNEL_VERSION := $(shell git describe --tags --always 2> /dev/null || echo "1.4+")
  • Check the cryptographic hashes of boards across machines. Run make ci-travis locally and compare the checksums with those of Travis-CI. If they don't match, investigate why and add item(s) to this list.
  • Enforce reproducibility in CI? Have some CI pipeline that performs some reproducibility checks, and prevents merging a pull-request if builds are not reproducible anymore. For example, a bad pull request removes the --remap-path-prefix parameter, making builds not reproducible.
@bradjc
Copy link
Contributor

bradjc commented Mar 5, 2020

Wow.

Looks like --remap-path-prefix might be able to help with this? https://www.reddit.com/r/rust/comments/95ylb8/rust_release_binary_contains_absolute_paths_from/e3wnemq/

diff --git a/boards/Makefile.common b/boards/Makefile.common
index 0d4eae0f2..3625f467d 100644
--- a/boards/Makefile.common
+++ b/boards/Makefile.common
@@ -16,7 +16,8 @@ RUSTUP    ?= rustup
 # lld the actual page size so it doesn't have to be conservative.
 RUSTFLAGS_FOR_CARGO_LINKING ?= -C link-arg=-Tlayout.ld -C linker=rust-lld \
 -C linker-flavor=ld.lld -C relocation-model=dynamic-no-pic \
--C link-arg=-zmax-page-size=512
+-C link-arg=-zmax-page-size=512 \
+--remap-path-prefix=/Users/bradjc/git=

 # Disallow warnings for continuous integration builds. Disallowing them here
 # ensures that warnings during testing won't prevent compilation from succeeding.

@gendx
Copy link
Contributor Author

gendx commented Mar 6, 2020

Nice find @bradjc! We still need to extract the main Makefile's directory instead of hard-coding a given user's build directory.

@gendx
Copy link
Contributor Author

gendx commented Mar 6, 2020

Updated the list to refer to #1657, as the Makefile doesn't seem idempotent.

@bradjc
Copy link
Contributor

bradjc commented Mar 18, 2020

#1657 was merged, has this been addressed at this point?

@gendx
Copy link
Contributor Author

gendx commented Mar 18, 2020

#1657 was merged, has this been addressed at this point?

I can check the box, although the reason is still unclear, so it might come back when we update to another nightly.

@gendx
Copy link
Contributor Author

gendx commented Mar 25, 2020

I did a quick test, building the imix board in two build folders (/path/to/tock and /path/to/tock2) on the same machine. The SHA-256 were different.

Using V=1 make in the boards/imix/ folder shows different metadata parameters passed to rustc.

# in /path/to/tock
rustc --crate-name tock_cells ... \
    -C metadata=a72da9d1edb1895f \
    -C extra-filename=-a72da9d1edb1895f \
    ...
# in /path/to/tock2
rustc --crate-name tock_cells ... \
    -C metadata=e0d01c51ad799de2 \
    -C extra-filename=-e0d01c51ad799de2 \
    ...

I think that this metadata parameter is used as a seed for non-deterministic elements in the compiler (e.g. ordering of elements inside HashMaps, maybe also allocation of registers, or choice of equivalent instructions), so ultimately the code is different inside the binary.


Relatedly, rust-lang/cargo#6966 removed --remap-path-prefix in the computation of this metadata parameter. But there are still a bunch of absolute paths passed to rustc, which could be the issue.

@gendx
Copy link
Contributor Author

gendx commented Mar 25, 2020

Following rust-secure-code/wg#28 (comment), I also tried to use --remap-path-prefix for $HOME (but this overrode the current --remap-path-prefix), as well as $HOME/.cargo and $HOME/.rustup (these two had no effect on the SHA-256).

I also tried to remap the sysroot (rust-lang/rust#63505), but again didn't observe any effect.


I also found reproducibility issues related to the linker:

as well as debug symbols (we use -C debuginfo=2):

Tweaking these bits of the configuration may make Tock builds more reproducible.

@gendx
Copy link
Contributor Author

gendx commented Mar 25, 2020

I did a few more tests.

  • If I copy the boards/imix folder into boards/imix2, both build the same binary.
  • If I copy the boards/components folder into boards/components2, and make boards/imix2 point to ../components2 (in the Cargo.toml), the metadata for the components crate is different, and in turn imix/imix2 have different metadata (the other crates have the same metadata). The resulting binary is different.
  • As before, if I copy /build-dir/tock into /build-dir/tock2, then /build-dir/tock/boards/imix and /build-dir/tock2/boards/imix build different binaries.
  • However, if I make /build-dir/tock2/boards/imix have all its dependencies point back to /build-dir/tock (with relative paths in the Cargo.toml), then this builds the same binary as /build-dir/tock/boards/imix.

The invocation for crates internal to Tock looks like the following.

rustc \
    --crate-name components \
    --edition=2018 \
    /build-path/tock/boards/components/src/lib.rs \
    ...

On the other hand, the board invocation looks like the following.

rustc \
    --crate-name imix \
    --edition=2018 \
    src/main.rs \
    ...

What strikes me is the absolute path for local dependencies (/build-path/tock/boards/components/src/lib.rs) vs. a relative path for the current binary (src/main.rs). My guess is that locally defined crates (as opposed to those from crates.io) are identified by their absolute path, even though the Cargo.toml uses a relative path.

This is consistent with the results I observed in my tests.


Update

After a few more tests, it seems that defining a Cargo workspace encapsulating all the Tock crates would make the build reproducible (at least on a given machine), because then Cargo identifies all the crates by a relative path to the workspace root.

@gendx
Copy link
Contributor Author

gendx commented Mar 30, 2020

With #1714 (currently in review), we're getting close to having reproducible builds of Tock!

One remaining discrepancy is due to the TOCK_KERNEL_VERSION value embedded in some panic messages, because this value changes for every git commit. In particular, Travis-CI adds a merge commit when testing any pull-request. There could be some additional rule or option in the Makefiles to use a fixed TOCK_KERNEL_VERSION value instead.

bors bot added a commit that referenced this issue Apr 8, 2020
1714: Use a Cargo workspace to make builds reproducible and speed up CI. r=gendx a=gendx

### Pull Request Overview

This pull request adds a [Cargo workspace](https://doc.rust-lang.org/book/ch14-03-cargo-workspaces.html) at the top-level directory to make build reproducible (see the analysis in #1666 (comment)).

With a workspace, Cargo now only writes to a single `target/` directory in the root folder, instead of having a separate `target/` in each board. This means that some changes have to be made in the Makefiles and tools.

Another benefit of a single workspace is that shared dependencies can be built once (per CPU architecture) and cached across boards. For example, the kernel or the capsules only need to be compiled once per CPU architecture, instead of once per board, which can significantly speed up commands like `make allboards`.


### Testing Strategy

This pull request was tested by:
- Duplicating the Tock directory (two identical copies in `/path/to/tock` and `/path/to/tock2`), and running `make allboards`, checking that the SHA-256 checksum match.
- Running `make flash` in the nRF52840-DK board's folder, and checking that flashing the kernel worked properly.
- Travis-CI.


### TODO or Help Wanted

This pull request still needs:
- [x] Fixing netlify. Having a single `target/` folder allows to greatly simplify the `tools/build-all-docs.sh` script. I checked the output on a few files and it looks consistent, but some pages might break. On the other hand, some crates such as the [arty_e21](https://docs.tockos.org/arty_e21/index.html) board are currently not indexed on the navigation bar, and the `arty_e21` chip was not even documented so far on https://docs.tockos.org/.
- [x] A bit of cleanup throughout Makefiles. ~~For now I made just enough changes in `boards/Makefile.common` for `make ci-travis` to work, but other make rules should be checked.~~
- [ ] Checking that the `tools/` still work properly with a single target folder now at the top-level.
  - [x] `build-all-docs.sh` Simplified, as mentioned above, now that all docs are built in the same folder.
  - [x] `post_size_changes_to_github.sh` Updated, but it's currently broken for other reasons (see #1722).
- [x] Now fixed with a minimum search depth of 2. ~~I had to remove the `tools/check_wildcard_imports.sh` in `make ci-travis` because otherwise it reported the following error. The tool would need some refactoring.~~
  ```
  Wildcard import(s) found in ..
  Tock style rules prohibit this use of wildcard imports.
  
  The following wildcard imports were found:
  libraries/tock-register-interface/src/macros.rs:                use super::super::*;
  tools/check_wildcard_imports.sh:        if $(git grep -q 'use .*\*;' -- ':!src/macros.rs'); then
  tools/check_wildcard_imports.sh:                git grep 'use .*\*;'
  ```
- [x] Cargo complained that the `[profile.dev]` and `[profile.release]` rules must be in the top-level workspace. I put them there given that all boards had the same profiles, but it's something to keep in mind if we want to use custom workspace for specific boards. I'm also not sure what's the impact of these profiles on the other libraries in the workspace. This didn't seem to cause any issue though.
- [Optional] Should `tools/` also be part of the workspace? Have their own workspace?


### Documentation Updated

- [x] Updated the relevant files in `/docs`, or no updates are required. Now the arty-e21 chip and board appear in the generated docs :)

### Formatting

- [x] Ran `make formatall`.


Co-authored-by: Guillaume Endignoux <guillaumee@google.com>
Co-authored-by: gendx <gendx@users.noreply.github.com>
@gendx
Copy link
Contributor Author

gendx commented Apr 8, 2020

With #1714 merged, we've reached a big milestone in terms of reproducibility. I now have the same hashes when building from master on various folders on the same machine, as well as on various machines.

The only remaining part is the TOCK_KERNEL_VERSION that changes on every commit, including Travis-CI's implicit merge commits, because this is stored inside the binary as some debugging information.

  • Would it make sense to have some make repro rule that would use a fixed TOCK_KERNEL_VERSION?
  • Should there be some make reprocheck which would check the checksums against some checksum file that we have to keep updated on every commit - to check that we don't have any regression in terms of reproducibility? That is, this would run make repro, compute the checksum, and compare them to some boards.sha256 file containing expected checksums. Travis-CI, or some other CI tool, would then run this make reprocheck test.

@ppannuto
Copy link
Member

ppannuto commented Apr 8, 2020

I might argue that TOCK_KERNEL_VERSION is currently doing the right thing -- it captures exactly the state of the repository that was built for a given image. The addition of a merge commit means a different point in the history of repository. Unless we rebase every branch before pulling, it is quite possible/likely that the travis merge commit will introduce meaningful code changes compared to the candidate branch.

Travis has both "pr" and "push" builds:

  • pr: what would happen if this branch were merged to master (adds a commit)
  • push: what is the status of this branch (no added commit)

According to the travis docs:

TRAVIS_PULL_REQUEST is set to the pull request number if the current job is a pull request build, or false if it’s not.

So I think what might make the most sense would be enforcing the reproducibility check on the push build but not on the pr build

@gendx
Copy link
Contributor Author

gendx commented Apr 9, 2020

Unless we rebase every branch before pulling, it is quite possible/likely that the travis merge commit will introduce meaningful code changes compared to the candidate branch.

That's a good point indeed!

I might argue that TOCK_KERNEL_VERSION is currently doing the right thing -- it captures exactly the state of the repository that was built for a given image. The addition of a merge commit means a different point in the history of repository.

One thing to note is that a commit's hash includes both the state of the repository (as a filesystem), but also the commit message, commit author, and timestamp. So Travis' merge commit will generally be different than what one can obtain locally by creating their own merge commit.

For example, the latest commit on master contains the following git object.

$ git cat-file -p HEAD
tree 629404bf87e4cc0ac1544ff95cd5b4e242a97640
parent a4ad8070469c8403c81104efefa1cb2bbcba726a
parent d3a6cd58ec31682dba16d56a46269d91f26c51b7
author bors[bot] <26634292+bors[bot]@users.noreply.github.com> 1586357094 +0000
committer GitHub <noreply@github.com> 1586357094 +0000
gpgsig -----BEGIN PGP SIGNATURE-----
 
 wsBcBAABCAAQBQJejeNmCRBK7hj4Ov3rIwAAdHIIAAuCeUnrhZWwbrF4T/8Jjba7
 z+hMeosGRjWh5IpCGh7RvOSajtUrtNDdiVjkET3BoJYTjrg3C3xwAPGFw9OMDRxG
 Z07iT4rBrgv6YnTqpOaqLYlrDcEnOSHfupHX+6CKK3r8ebySaASzT7MGAYRG8+1f
 KwOrC1C5W5BnfFIz7TWbb968r+ZGw7dyrgQWIGPcdoY+vYz4yi4jGTZh4ahJufb4
 xlr6mDFX5KjbDbMZcPn+7qFMHnYQ0tMdaYouuJf9rrOLWUCTz5zWb/cydKDJ9bK4
 1yOEpw73AwN1IBmqEPllsd+Uy2GhtvkAbyz1zM6mK0cvX0oOBpsjxHg2maMxwYY=
 =1AW5
 -----END PGP SIGNATURE-----
 

Merge #1743

1743: Update stm32 boards makefiles r=ppannuto a=alexandruradovici

[...]

The commit is f920947, which can then be retrieved from the command line with git hash-object on the commit object.

$ git cat-file -p HEAD | git hash-object -t commit --stdin
f9209471923d4a6dbacdb6464ffe548bd7108aa9

Regarding reproducibility, I think that a more interesting description of the state of the repository would be the "tree" (first line in the pretty-printed commit), which is a hash of all the files checked into the repository.

$ git cat-file -p HEAD^{tree}
040000 tree b633c8fd62afc5bfdb32db071e4dd00daa3efce7	.github
100644 blob 5230cda22cc728e454dde2b0e3e9c5361821a60a	.gitignore
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	.gitmodules
100644 blob 6ddd26fc3f7b7964723dec145f26e4d90720c52b	.travis.yml
[...]

So I think what might make the most sense would be enforcing the reproducibility check on the push build but not on the pr build

Enforcing the push build would be simpler for developers of pull requests. But that would mean that the actual state of the repository on master would almost always be different, due to the merge commits put on top of it.

Requiring to always rebase pull requests on the current master would avoid the merge commit problem (and I think GitHub can be configured to require rebased pull requests), at the expense of more burden for developers of pull requests (especially when there are unrelated pull requests in parallel, i.e. without any merge conflicts, in which case the current model works smoothly).

@ppannuto
Copy link
Member

ppannuto commented Apr 9, 2020

One thing to note is that a commit's hash includes both the state of the repository (as a filesystem), but also the commit message...

Hmm, this is a very good point. We're partially running into a usability vs. correctness challenge here I think. While we could dig into the git internals, the intent of including a hash in the panic trace is that it's easy for developers to investigate reported errors (i.e. can reproduce by simply git checkout xxx); that falls apart a bit with more nuanced tracing of git.

$ git checkout b633c8fd62af
fatal: Cannot switch branch to a non-commit 'b633c8fd62af'

Of course, you could do something like https://stackoverflow.com/questions/41088069/how-to-find-the-commits-that-point-to-a-git-tree-object -- and we could put something in tools/ that does this, but that's getting complex I think

One idea I thought about briefly would be modifying the label to be git describe --tags --always --all, which would effectively make the label (1) a tag if a tagged release, (2) the branch name [most common case], (3) a hash. But branches are a bit too ephemeral and could easily be pointing to a later/different commit between deployment and bug reporting (particularly since I think the travis build would always report the name heads/staging, as that's the branch bors uses to build)


Stepping back a bit, I think we should ask what we would like the reproducibility mechanism to look like (this became very stream-of-conscious, but hopefully useful for discussion):

  • Should there be an in-file-system record of reproduction artifact?

    • pro: highly audit-able, deterministic, and strong enforcement of reproducible builds
    • con: large amount of churn / noise in git history
      • mitigation: store just one hash-of-hashes of all boards?
    • con: possibly annoying for development, effectively requires an allboards build on all commits or pushes (consider the 'fix a spelling error in a comment' type commit having a several minute latency)
    • my current thinking: against
  • Should reproducibility be enforced on every commit, every commit to master?

    • every commit too fine-grained for the development reasons listed above
    • every commit to master basically necessary to make sure things don't break (this is the ethos of CI really)
    • my current thinking: somehow
  • Can we do this wholly in the cloud? Idea: (ab)using tags

    • Add a github action / cloud service / etc that runs on PRs that tags every commit git tag repro-{hash_of_hashes} with a hash of all the hashes of all the boards
    • Add a github action that triggers on the creation of a tag that adds a status check to the PR that validates the reproduction of the build
    • pro: reproduction artifacts are part of the repository without cluttering the typical log / history mechanism
    • con: (maybe?) are lots of lightweight tags bad in any way?

@gendx
Copy link
Contributor Author

gendx commented Apr 9, 2020

Stepping back a bit, I think we should ask what we would like the reproducibility mechanism to look like (this became very stream-of-conscious, but hopefully useful for discussion):

  • Should there be an in-file-system record of reproduction artifact?

    • pro: highly audit-able, deterministic, and strong enforcement of reproducible builds

    • con: large amount of churn / noise in git history

      • mitigation: store just one hash-of-hashes of all boards?
    • con: possibly annoying for development, effectively requires an allboards build on all commits or pushes (consider the 'fix a spelling error in a comment' type commit having a several minute latency)

    • my current thinking: against

Definitely agree with the concerns. That would be a lot of annoyance to keep in sync on every commit.

  • Should reproducibility be enforced on every commit, every commit to master?

    • every commit too fine-grained for the development reasons listed above
    • every commit to master basically necessary to make sure things don't break (this is the ethos of CI really)
    • my current thinking: somehow

Similarly, the overhead on developers would be quite high.

  • Can we do this wholly in the cloud? Idea: (ab)using tags

    • Add a github action / cloud service / etc that runs on PRs that tags every commit git tag repro-{hash_of_hashes} with a hash of all the hashes of all the boards
    • Add a github action that triggers on the creation of a tag that adds a status check to the PR that validates the reproduction of the build
    • pro: reproduction artifacts are part of the repository without cluttering the typical log / history mechanism
    • con: (maybe?) are lots of lightweight tags bad in any way?

I quite like this idea (providing transparency) - some action builds the boards on each commit (or each master branch commit), and publishes the hashes. Tags may add quite some extra overhead, but I don't think they would be necessary. The UX could be similar to the size reporting tool (#1701). Then anyone can reproduce this on their machine and compare the hashes.


The question that remains is how to enforce reproducibility after each commit? Ideally, there should be some CI workflow checking that builds remain reproducible.

For example, if some commit removes the --remap-path-prefix argument in the Makefile

--remap-path-prefix=$(TOCK_ROOT_DIRECTORY)= \

then the builds are not reproducible anymore across directories, but there is currently no tool in Tock's CI to prevent that.


  • In any case, I think that providing transparency by publishing the hashes is a good step forward.
  • If we don't find a scalable solution for enforcing reproducibility (as @ppannuto mentioned, updating the hashes on each commit wouldn't be scalable in terms of developer overhead), then it's not too dramatic, and we can notice problems every now and then + do more thorough manual checks on every release. Similarly to the OSX support, which isn't enforced in CI but works most of the time and occasionally an OSX developer notices a problem and we fix the bug.

@gendx
Copy link
Contributor Author

gendx commented Apr 15, 2020

While working on google/OpenSK#94, I noticed that although builds seem reproducible across Linux machines (my local development machine vs. GitHub workflows), the binaries obtained on MacOS are currently not the same.

@gendx
Copy link
Contributor Author

gendx commented Apr 28, 2020

I added a reference to tock/elf2tab#16 to track reproducibility of elf2tab packaging.

@bradjc
Copy link
Contributor

bradjc commented Sep 5, 2023

Does anyone know of any github/rust projects that do something similar in terms of using CI to verify builds are reproducible and publishing SHAs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants