Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stack_snapshot: Precise extra dependencies #1068

Merged
merged 4 commits into from Sep 18, 2019
Merged

Conversation

aherrmann
Copy link
Member

Precisely control which packages to add additional dependencies to.

This PR changes stack_snapshot so that the deps attribute no longer takes a list of targets to add as dependencies to all packages in the snapshot. Instead, it takes a dictionary mapping Stack package names to lists of targets to precisely control which dependencies to add to which package in the snapshot.

Motivation

This change is motivated by #1062. Simply adding all deps to all packages in the snapshot has the effect that the library-dirs and include-dirs of these deps will be added to the package-db entry of each package in the snapshot. GHC doesn't deduplicate the include-dirs and this leads to overflowing the maximum command-line length, even on Linux.

Unfortunately, we cannot avoid adding to the library-dirs and include-dirs of snapshot packages that actually depend on extra libraries. Their cabal files will list extra-libraries and includes, and Setup.hs enforces that these can be found in the library-dirs and include-dirs of the resulting package configuration.

An advantage is that this thins the dependency graph and gives the user more control. The downside is that stack_snapshot becomes more cumbersome to use.

Alternatives

An alternative approach would be to automatically detect which snapshot packages depend on which extra libraries and only add the extra dependencies where necessary. This would require either parsing the Cabal files, or for stack dot (or stack ls dependencies json) to list extra-libraries dependencies. However, that would also require some user input or heuristics to map from extra-libraries to Bazel targets, and would assume that all includes are provided by targets providing extra-libraries.

Closes #1062

@aherrmann aherrmann requested a review from mboes September 3, 2019 09:28
@thufschmitt
Copy link
Member

Really nice! This fixes a similar error on my side.
But for some reason, the error reappears if I apply this patch on top of 844a4e2, any idea what might cause the incompatibility?

@thufschmitt
Copy link
Member

Really nice! This fixes a similar error on my side.
But for some reason, the error reappears if I apply this patch on top of 844a4e2, any idea what might cause the incompatibility?

This is actually fixed by #1081, so it's not too much of an issue

@aherrmann
Copy link
Member Author

Really nice! This fixes a similar error on my side.
But for some reason, the error reappears if I apply this patch on top of 844a4e2, any idea what might cause the incompatibility?

GHC doesn't normalize paths. E.g. due to package environment files and ${pkgroot} we might get a path like path/to/pkg-env/../../../path/to/pkgroot/../../../path/to/lib. These can grow longer than the absolute paths, so that's how #1063 can introduce that issue. #1039 would work around that by normalizing the paths.

@aherrmann
Copy link
Member Author

rebased on master

Copy link
Member

@thufschmitt thufschmitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one minor comment, but apart from that, 👍

haskell/cabal.bzl Show resolved Hide resolved
@mboes
Copy link
Member

mboes commented Sep 17, 2019

Unfortunately, we cannot avoid adding to the library-dirs and include-dirs of snapshot packages that actually depend on extra libraries. Their cabal files will list extra-libraries and includes, and Setup.hs enforces that these can be found in the library-dirs and include-dirs of the resulting package configuration.

The problem is that --extra-include-dirs seems to be cumulative. They will appear in the include-dirs line of the package's .conf file, and it appears that downstream packages are allowed to include headers from directories listed there. Take for example integer-gmp. The include-dirs will list the include folder where gmp.h can be found. Any package that transitively depends on integer-gmp is allowed to #include <gmp.h> in source files. So if the user says deps = {"integer-gmp": "@gmp//:include"} we must allow #include <gmp.h> in downstream packages. We would otherwise not be compatible with Cabal in that case. I can't remember in which package I first noticed this, but packages in the wild do exploit this Cabal "feature".

@mboes
Copy link
Member

mboes commented Sep 17, 2019

#385 and #1008 showed how dangerous it can be to invoke precedent to name fields, but to use a different attribute type. It's usually a sign that the intended semantics is different from the one in precedents, in which case we should really use a different attribute name. So given your new semantics, I'm wary of keeping deps as the attribute name.

Since include dirs (and probably library dirs) are transitive, as noted in my previous comment, we're not gaining much cachability by letting the user state dependencies in a more fine grained way. The original motivation for this change was the absence of dedup was leading to long command-line length. Couldn't we instead try deduping ourselves in some place appropriate, instead of changing the API?

@aherrmann
Copy link
Member Author

Couldn't we instead try deduping ourselves in some place appropriate, instead of changing the API?

We can try to deduplicate the list in the cc_wrapper. The downside is that the include paths semantics (-I/-isystem/...) are a bit complicated, so we need to be careful not to change the meaning of the command-line.

So given your new semantics, I'm wary of keeping deps as the attribute name.

I'm happy to choose a different name. How about extra_deps?

The problem is that --extra-include-dirs seems to be cumulative. They will appear in the include-dirs line of the package's .conf file, and it appears that downstream packages are allowed to include headers from directories listed there. [...] So if the user says deps = {"integer-gmp": "@gmp//:include"} we must allow #include <gmp.h> in downstream packages. We would otherwise not be compatible with Cabal in that case.

Note, this PR does not change those semantics. Cabal will still accumulate the include paths from all transitive package dependencies.

Since include dirs (and probably library dirs) are transitive, as noted in my previous comment, we're not gaining much cachability by letting the user state dependencies in a more fine grained way.

That depends on the particular library dependency. E.g. grpc-haskell-core depends on libgrpc and appears relatively close to the leaves of the dependency graph. With more finely grained dependencies changing libgrpc would not cause the rest of the stack snapshot to be rebuilt.

Granted, most use-cases don't depend on this feature, and if it's really required the same can be achieved using vendored_packages.

@mboes
Copy link
Member

mboes commented Sep 18, 2019

Granted, most use-cases don't depend on this feature, and if it's really required the same can be achieved using vendored_packages.

Hm, that's a good point. Now that we have vendored_packages, what about deprecating stack_snapshot.deps altogether?

@aherrmann
Copy link
Member Author

Now that we have vendored_packages, what about deprecating stack_snapshot.deps altogether?

That's an interesting idea. It places stack_snapshot further away from what a similar stack.yaml would look like. Defining C library dependencies is a common use-case and currently vendored_packages is a bit verbose, due to the http_arcive side. So, I think if we want to go that way we'd need to provide some helpers to reduce boilerplate. In particular, fetching the correct version according to the chosen snapshot and dependency resolution. We could define stack_unpack as a replacement for http_archive which takes care of fetching the correct version. That could look something like this:

stack_unpack(
    name = "zlib",
    snapshot = "lts-14.1",  # snapshot determines package version
    deps_snapshot = "stackage",  # loads dependencies of `zlib` from this `stack_snapshot`
    extra_deps = ["@z//:z"],
)
stack_snapshot(
    name = "stackage",
    snapshot = "lts-14.1",
    packages = [
        ...
    ]
    vendored_packages = { "zlib": ["@zlib//:zlib"] },
)

@mboes
Copy link
Member

mboes commented Sep 18, 2019

We could define stack_unpack as a replacement for http_archive which takes care of fetching the correct version.

I like the idea. But it looks like we'll run into other difficulties if we take that route. We have to state the snapshot twice. But worse, we need a circular dependency between the stack_unpack rule and the stack_snapshot one. That's pretty horrid. The way you work around it is by cunningly not making the deps_snapshot field a true label. But that's not very intuitive.

I guess extra_deps (of type dictionary), is the least bad option.

@aherrmann
Copy link
Member Author

But worse, we need a circular dependency between the stack_unpack rule and the stack_snapshot one. That's pretty horrid. The way you work around it is by cunningly not making the deps_snapshot field a true label. But that's not very intuitive.

Agreed, that's a bit of a hack.

I guess extra_deps (of type dictionary), is the least bad option.

Okay, I've renamed the deps attribute to extra_deps.

@mboes mboes added the merge-queue merge on green CI label Sep 18, 2019
@mergify mergify bot merged commit dafc0f8 into master Sep 18, 2019
@mergify mergify bot deleted the stack-precise-deps branch September 18, 2019 16:18
@mergify mergify bot removed the merge-queue merge on green CI label Sep 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

stack_snapshot can exceed command-line length limit on C pre-processor with include paths
3 participants