Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable pinning with stack_snapshot #1376

Merged
merged 8 commits into from Jul 2, 2020
Merged

Enable pinning with stack_snapshot #1376

merged 8 commits into from Jul 2, 2020

Conversation

aherrmann
Copy link
Member

@aherrmann aherrmann commented Jun 26, 2020

depends on #1379

Closes #1310

This implements pinning for stack_snapshot similar to rules_jvm_external's maven_install. By default stack_snapshot works as before, calling stack ls dependencies json to determine package versions and the dependency graph on the fly and calling stack unpack to fetch the package sources. However, if the optional attribute stack_snapshot_json is specified then it is read as a lock file that stores all relevant information to be able to fetch and build the snapshot packages without invoking stack. The lock file can be generated or updated with the script @<workspace>-unpinned//:pin.

This makes re-fetching stack_snapshot repositories faster as Hackage and archive dependencies can be cached in Bazel's repository cache, and stack does not need to update it's database to determine package versions and dependencies.
Closes #1112
Closes #1168

Additionally, this avoids issues due to concurrent invocations of stack which can fail due to a race condition on the Hackage security lock.
Closes #1167

The generated snapshot.json file is meant to be checked into the user's repository. It should not be updated manually, only using bazel run @<workspace>-unpinned//:pin. Similar to rules_jvm_external we validate the contents of the lock file and store and check a checksum to detect a corrupted lock file.

This PR extends the stack_snapshot API docs and adds a comment to examples/WORKSPACE and the WORKSPACE generated by the start script instructing the user how to pin stack_snapshot dependencies. Additionally, this pins the stack_snapshot repositories @stackage, @stackage-zlib, and @ghcide in rules_haskell. The @stackage repository in examples remains unpinned and serves as a test-case for an unpinned stack_snapshot.

Notes

  • The resulting stack_snapshot workspaces are identical with the exception of revisions of Cabal files.
  • stack ls dependencies json does not expose the Hackage revision of any given package used in the snapshot, only the version. The original Cabal file may not work in the given Stackage snapshot, meaning a later revision is required, but downloading the latest revision is not reproducible. To work around this we pin the current git revision of the all-cabal-hashes repository and download Cabal files from that git revision. This may potentially use a later Hackage revision than used in the Stackage snapshot.
  • stack ls dependencies json does not give us access to sha256 of Hackage or archive dependencies. See Feature Request: Expose sha256 in ls dependencies json commercialhaskell/stack#5274 and sha256 and size in ls dependencies json for archive dependencies commercialhaskell/stack#5280. We use the all-cabal-hashes repository to obtain sha256 of Hackage dependencies, and fetch archive dependencies during pinning to determine the hash.
  • stack ls dependencies json does not provide subdirectory information of archive dependencies. This change includes a heuristic which expects the Cabal file at the archive root or underneath a top-level directory and defines stripPrefix accordingly.
  • Bazel cannot cache git (or hg) dependencies in the repository cache as of now, see Skylark API for accessing repository cache in repository rules bazelbuild/bazel#5086. This implementation keeps using stack unpack for these kinds of dependencies. To avoid issues around stack update this should also be switched to a Bazel implementation in the future.

@aherrmann aherrmann force-pushed the stack-lock branch 2 times, most recently from c1e46ac to a64128f Compare June 29, 2020 17:38
@aherrmann aherrmann changed the base branch from master to refactor June 29, 2020 17:39
@Profpatsch
Copy link
Contributor

Hyped for this. It removes the 5–10 second stack overhead that happened every time the bazel daemon was started, right?

@aherrmann
Copy link
Member Author

It removes the 5–10 second stack overhead that happened every time the bazel daemon was started, right?

That's right, to be precise it removes any calls to stack for stack_snapshot that use the lock file and, for now, don't use git or hg dependencies.

@aherrmann aherrmann marked this pull request as ready for review June 30, 2020 16:00
Base automatically changed from refactor to master July 1, 2020 07:14
@dpulls
Copy link

dpulls bot commented Jul 1, 2020

🎉 All dependencies have been resolved !

Copy link
Contributor

@hanshoglund hanshoglund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a great improvement to me. I have not reviewed the implementation in haskell/cabal.bzl in detail, but the switch to pinned Stackage in the rules_haskell workspace is reassuring enough for a new feature in my opinion.

It would be good to add more regression tests for the unpinned use case, as this is what current projects will see.

Otherwise my concerns are mostly about documentation. I suggest opening an issue to track improvements there.

examples/WORKSPACE Outdated Show resolved Hide resolved
start Outdated Show resolved Hide resolved
Copy link
Contributor

@Profpatsch Profpatsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fairly straight-forward, just a lot of footwork to implement everything. LGTM

haskell/cabal.bzl Outdated Show resolved Hide resolved
haskell/cabal.bzl Outdated Show resolved Hide resolved
haskell/cabal.bzl Show resolved Hide resolved
aherrmann added a commit that referenced this pull request Jul 2, 2020
aherrmann added a commit that referenced this pull request Jul 2, 2020
aherrmann added a commit that referenced this pull request Jul 2, 2020
aherrmann added a commit that referenced this pull request Jul 2, 2020
aherrmann added a commit that referenced this pull request Jul 2, 2020
aherrmann added a commit that referenced this pull request Jul 2, 2020
aherrmann and others added 8 commits July 2, 2020 11:56
Adds an optional attribute `stack_snapshot_json` to `stack_snapshot`
which pins the result of `stack ls dependencies json` and additional
reproducibility information. If specified, then `stack_snapshot` will
not need to invoke `stack` to resolve package versions and dependencies
or to fetch Hackage or archive dependencies.

- This makes re-fetching `stack_snapshot` repositories faster as Hackage
  and archive dependencies can be cached in Bazel's repository cache,
  and `stack` does not need to update it's database to determine package
  versions and dependencies.
- Additionally, this avoids issues due to concurrent invocations of
  `stack` which can fail due to a race condition on the Hackage security
  lock.

The lock file can be generated or updated by invoking the script
`@<workspace>-unpinned//:pin`.

If the `stack_snapshot_json` attribute is not specified, then
`stack_snapshot` will behave as before and resolve and fetch external
packages using `stack` on every Bazel fetch.

The resulting `stack_snapshot` workspaces are identical with the
exception of revisions of Cabal files. `stack ls dependencies json` does
not expose the Hackage revision of any given package used in the
snapshot, only the version. However, Cabal file downloads are not
reproducible as Hackage revisions may change them over time and a
revision >0 may be required for a given snapshot to avoid issues with
version ranges. The pinning implementation pins the current git revision
of the [all-cabal-hashes][all-cabal-hashes] repository and downloads
Cabal files from that revision. This may potentially use a later
revision than used in a particular snapshot.

[all-cabal-hashes]: https://github.com/commercialhaskell/all-cabal-hashes

`stack ls dependencies json` does not provide subdirectory information
of `archive` dependencies. This change includes a heuristic which
expects the Cabal file at the archive root or underneath a top-level
directory and defines `stripPrefix` accordingly.
Co-authored-by: Hans Höglund <hanshoglund@users.noreply.github.com>
The Stackage snapshot is already pinned by the `snapshot` or
`local_snapshot` attribute. It's the `stack_snapshot` specific package
metadata that is unpinned and requires invoking `stack` on every build.
@aherrmann
Copy link
Member Author

It would be good to add more regression tests for the unpinned use case, as this is what current projects will see.

Otherwise my concerns are mostly about documentation. I suggest opening an issue to track improvements there.

That's a good idea, I've opened #1385 to that end. Feel free to edit and add to the list.

@aherrmann aherrmann added the merge-queue merge on green CI label Jul 2, 2020
@mergify mergify bot merged commit a4bf003 into master Jul 2, 2020
@mergify mergify bot deleted the stack-lock branch July 2, 2020 10:47
@mergify mergify bot removed the merge-queue merge on green CI label Jul 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants