Enable pinning with stack_snapshot #1376

aherrmann · 2020-06-26T16:34:02Z

depends on #1379

This implements pinning for stack_snapshot similar to rules_jvm_external's maven_install. By default stack_snapshot works as before, calling stack ls dependencies json to determine package versions and the dependency graph on the fly and calling stack unpack to fetch the package sources. However, if the optional attribute stack_snapshot_json is specified then it is read as a lock file that stores all relevant information to be able to fetch and build the snapshot packages without invoking stack. The lock file can be generated or updated with the script @<workspace>-unpinned//:pin.

This makes re-fetching stack_snapshot repositories faster as Hackage and archive dependencies can be cached in Bazel's repository cache, and stack does not need to update it's database to determine package versions and dependencies.
Closes #1112
Closes #1168

Additionally, this avoids issues due to concurrent invocations of stack which can fail due to a race condition on the Hackage security lock.
Closes #1167

The generated snapshot.json file is meant to be checked into the user's repository. It should not be updated manually, only using bazel run @<workspace>-unpinned//:pin. Similar to rules_jvm_external we validate the contents of the lock file and store and check a checksum to detect a corrupted lock file.

This PR extends the stack_snapshot API docs and adds a comment to examples/WORKSPACE and the WORKSPACE generated by the start script instructing the user how to pin stack_snapshot dependencies. Additionally, this pins the stack_snapshot repositories @stackage, @stackage-zlib, and @ghcide in rules_haskell. The @stackage repository in examples remains unpinned and serves as a test-case for an unpinned stack_snapshot.

Notes

The resulting stack_snapshot workspaces are identical with the exception of revisions of Cabal files.
stack ls dependencies json does not expose the Hackage revision of any given package used in the snapshot, only the version. The original Cabal file may not work in the given Stackage snapshot, meaning a later revision is required, but downloading the latest revision is not reproducible. To work around this we pin the current git revision of the all-cabal-hashes repository and download Cabal files from that git revision. This may potentially use a later Hackage revision than used in the Stackage snapshot.
stack ls dependencies json does not give us access to sha256 of Hackage or archive dependencies. See Feature Request: Expose sha256 in ls dependencies json commercialhaskell/stack#5274 and sha256 and size in ls dependencies json for archive dependencies commercialhaskell/stack#5280. We use the all-cabal-hashes repository to obtain sha256 of Hackage dependencies, and fetch archive dependencies during pinning to determine the hash.
stack ls dependencies json does not provide subdirectory information of archive dependencies. This change includes a heuristic which expects the Cabal file at the archive root or underneath a top-level directory and defines stripPrefix accordingly.
Bazel cannot cache git (or hg) dependencies in the repository cache as of now, see Skylark API for accessing repository cache in repository rules bazelbuild/bazel#5086. This implementation keeps using stack unpack for these kinds of dependencies. To avoid issues around stack update this should also be switched to a Bazel implementation in the future.

Profpatsch · 2020-06-30T00:44:28Z

Hyped for this. It removes the 5–10 second stack overhead that happened every time the bazel daemon was started, right?

aherrmann · 2020-06-30T08:30:27Z

It removes the 5–10 second stack overhead that happened every time the bazel daemon was started, right?

That's right, to be precise it removes any calls to stack for stack_snapshot that use the lock file and, for now, don't use git or hg dependencies.

dpulls · 2020-07-01T07:14:52Z

🎉 All dependencies have been resolved !

hanshoglund

This looks like a great improvement to me. I have not reviewed the implementation in haskell/cabal.bzl in detail, but the switch to pinned Stackage in the rules_haskell workspace is reassuring enough for a new feature in my opinion.

It would be good to add more regression tests for the unpinned use case, as this is what current projects will see.

Otherwise my concerns are mostly about documentation. I suggest opening an issue to track improvements there.

examples/WORKSPACE

start

Profpatsch

Looks fairly straight-forward, just a lot of footwork to implement everything. LGTM

haskell/cabal.bzl

Addressing review comment #1376 (comment)

Adds an optional attribute `stack_snapshot_json` to `stack_snapshot` which pins the result of `stack ls dependencies json` and additional reproducibility information. If specified, then `stack_snapshot` will not need to invoke `stack` to resolve package versions and dependencies or to fetch Hackage or archive dependencies. - This makes re-fetching `stack_snapshot` repositories faster as Hackage and archive dependencies can be cached in Bazel's repository cache, and `stack` does not need to update it's database to determine package versions and dependencies. - Additionally, this avoids issues due to concurrent invocations of `stack` which can fail due to a race condition on the Hackage security lock. The lock file can be generated or updated by invoking the script `@<workspace>-unpinned//:pin`. If the `stack_snapshot_json` attribute is not specified, then `stack_snapshot` will behave as before and resolve and fetch external packages using `stack` on every Bazel fetch. The resulting `stack_snapshot` workspaces are identical with the exception of revisions of Cabal files. `stack ls dependencies json` does not expose the Hackage revision of any given package used in the snapshot, only the version. However, Cabal file downloads are not reproducible as Hackage revisions may change them over time and a revision >0 may be required for a given snapshot to avoid issues with version ranges. The pinning implementation pins the current git revision of the [all-cabal-hashes][all-cabal-hashes] repository and downloads Cabal files from that revision. This may potentially use a later revision than used in a particular snapshot. [all-cabal-hashes]: https://github.com/commercialhaskell/all-cabal-hashes `stack ls dependencies json` does not provide subdirectory information of `archive` dependencies. This change includes a heuristic which expects the Cabal file at the archive root or underneath a top-level directory and defines `stripPrefix` accordingly.

Co-authored-by: Hans Höglund <hanshoglund@users.noreply.github.com>

The Stackage snapshot is already pinned by the `snapshot` or `local_snapshot` attribute. It's the `stack_snapshot` specific package metadata that is unpinned and requires invoking `stack` on every build.

Addressing review comment #1376 (comment)

aherrmann · 2020-07-02T10:02:50Z

It would be good to add more regression tests for the unpinned use case, as this is what current projects will see.

Otherwise my concerns are mostly about documentation. I suggest opening an issue to track improvements there.

That's a good idea, I've opened #1385 to that end. Feel free to edit and add to the list.

aherrmann force-pushed the stack-lock branch 2 times, most recently from c1e46ac to a64128f Compare June 29, 2020 17:38

aherrmann changed the base branch from master to refactor June 29, 2020 17:39

aherrmann mentioned this pull request Jun 30, 2020

Refactor implementation of stack_snapshot #1379

Merged

aherrmann force-pushed the stack-lock branch from a64128f to 8c1a2fe Compare June 30, 2020 15:41

aherrmann marked this pull request as ready for review June 30, 2020 16:00

aherrmann requested review from mboes, hanshoglund and Profpatsch June 30, 2020 16:00

Base automatically changed from refactor to master July 1, 2020 07:14

hanshoglund approved these changes Jul 1, 2020

View reviewed changes

examples/WORKSPACE Outdated Show resolved Hide resolved

start Outdated Show resolved Hide resolved

Profpatsch approved these changes Jul 1, 2020

View reviewed changes

haskell/cabal.bzl Outdated Show resolved Hide resolved

haskell/cabal.bzl Outdated Show resolved Hide resolved

haskell/cabal.bzl Show resolved Hide resolved

aherrmann added a commit that referenced this pull request Jul 2, 2020

_parse_json_field - use explicit keyword arguments

89c164c

Addressing review comment #1376 (comment)

aherrmann added a commit that referenced this pull request Jul 2, 2020

Clarify target of _write_snapshot_json

0992597

Addressing review comment #1376 (comment)

aherrmann added a commit that referenced this pull request Jul 2, 2020

Expand stack_snapshot_json documentation

5f1650f

Addressing review comment #1376 (comment)

aherrmann added a commit that referenced this pull request Jul 2, 2020

_parse_json_field - use explicit keyword arguments

ec7380c

Addressing review comment #1376 (comment)

aherrmann added a commit that referenced this pull request Jul 2, 2020

Clarify target of _write_snapshot_json

0ddf37e

Addressing review comment #1376 (comment)

aherrmann added a commit that referenced this pull request Jul 2, 2020

Expand stack_snapshot_json documentation

e37450b

Addressing review comment #1376 (comment)

aherrmann force-pushed the stack-lock branch from 5f1650f to e37450b Compare July 2, 2020 09:13

aherrmann and others added 8 commits July 2, 2020 11:56

Pin rules_haskell stack_snapshot repositories

fa5aa3c

Show stack_snapshot pinning in start and examples

f0076d8

Update examples/WORKSPACE

eda2f21

Co-authored-by: Hans Höglund <hanshoglund@users.noreply.github.com>

Clarify stack_snapshot pinning comment

7606102

The Stackage snapshot is already pinned by the `snapshot` or `local_snapshot` attribute. It's the `stack_snapshot` specific package metadata that is unpinned and requires invoking `stack` on every build.

_parse_json_field - use explicit keyword arguments

9903805

Addressing review comment #1376 (comment)

Clarify target of _write_snapshot_json

07bd966

Addressing review comment #1376 (comment)

Expand stack_snapshot_json documentation

e0c6c09

Addressing review comment #1376 (comment)

aherrmann force-pushed the stack-lock branch from e37450b to e0c6c09 Compare July 2, 2020 09:56

aherrmann mentioned this pull request Jul 2, 2020

stack_snapshot pinning improvements #1385

Open

2 tasks

aherrmann added the merge-queue merge on green CI label Jul 2, 2020

mergify bot merged commit a4bf003 into master Jul 2, 2020

mergify bot deleted the stack-lock branch July 2, 2020 10:47

mergify bot removed the merge-queue merge on green CI label Jul 2, 2020

aherrmann mentioned this pull request Jul 13, 2020

Suboptimal caching on Actions CI #1391

Closed

patrickt mentioned this pull request Jul 17, 2020

Add stackage_snapshot.json lockfile to save build time. github/semantic#610

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable pinning with stack_snapshot #1376

Enable pinning with stack_snapshot #1376

aherrmann commented Jun 26, 2020 •

edited

Profpatsch commented Jun 30, 2020

aherrmann commented Jun 30, 2020

dpulls bot commented Jul 1, 2020

hanshoglund left a comment

Profpatsch left a comment

aherrmann commented Jul 2, 2020

Enable pinning with stack_snapshot #1376

Enable pinning with stack_snapshot #1376

Conversation

aherrmann commented Jun 26, 2020 • edited

Profpatsch commented Jun 30, 2020

aherrmann commented Jun 30, 2020

dpulls bot commented Jul 1, 2020

hanshoglund left a comment

Choose a reason for hiding this comment

Profpatsch left a comment

Choose a reason for hiding this comment

aherrmann commented Jul 2, 2020

aherrmann commented Jun 26, 2020 •

edited