-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support fine grained module level build definitions #1552
Comments
The implementation of this feature has progressed (#1553, #1564, #1565, #1568, #1579, #1590) and we have discovered some design questions with the API that need to be resolved. Haskell module compilation is trickier than, say, C module compilation, because we need metadata about the final library to compile the individual module, most often ExtremesIt's a large design space, but, I think it can help to consider two extremes in-between which
ExplicitThe explicit extreme could look something like this:
Where Note, many attributes are repeated across modules and library, e.g. package name or AbstractThe abstract extreme could look something like this instead:
Where In this case Pros & ConsExplicitA clear advantage of the explicit approach is that it is very simple to understand. A clear downside is that it requires a lot of very mechanical duplication. This is perhaps less of a concern when using a Gazelle extension to generate the Another downside is that the It's also easy to make mistakes, such as using a AbstractThe abstract approach on the other hand is much less verbose and the The downside is of course that it is less explicit: It's not the Targets that don't themselves produce anything seem a bit strange. This is not without precedent though, Current StatusThe current implementation lies a little in between both approaches. It largely follows the explicit approach, but, the package name is implicitly forwarded from the Going ForwardI'd argue that we should commit more closely to one of those approaches. Either, take the explicit approach and incur the need for duplication of the package name. But, have Or make @Radvendii @facundominguez thoughts? |
How do you avoid compiling twice if the module needs to be built with different Also, I have some recollections of |
Well, if it needs to be built with different unit-ids then you can't avoid building it multiple times. But, the point is that, right now, it is being built twice even if it is only needed with one unit-id.
If you build a |
Another option is to modify what we currently have slightly by having the Pros:
Cons:
There is another option which combines all three approaches which has the advantage of enabling both approaches depending on need and the disadvantage of being more complex. This would be that Now that I write it out that seems not worth the complexity |
Do |
That's a good point. We could make them do that, but that might be even more confusing. I'm not sure I quite understand the relationship between binaries and modules. Why not bundle modules up in a library and have the binary depend on that? |
It looks feasible, but more bureaucratic than defining a binary with a single rule. |
As I understand it, this falls into the abstract side. Building a Or phrased as a question: Is there an observable difference between this approach and just never returning an object in
Agreed, it seems unnecessary to enforce an intermediate
Yeah, that sounds like adding a lot of complexity without clear benefit. |
Ohh I didn't realize it was this simple. This would be my vote, then, because it's the way I think I'd prefer to use On the other hand, if the way people are expected to use Do we know how it's going to be used in practice? I don't have a sense for how ubiquetous gazelle is. |
At first, I expected the abstract approach to help avoid setting I would expect the abstract approach to be simpler, because all the information is gathered in |
The way I intended that distinction this would fall closer to the "abstract" side. With the "explicit" vs. "abstract" distinction I was referring to the exposed API, not to the implementation. Whether we use transitions to achieve this or not is an implementation choice. But, as far as the user is concerned, if That's what I meant above by
As stated above, I think this now discusses the implementation more than the API design. I assume that you mean an implementation where the build actions are not emitted by |
I don't see much benefit in the explicit API. If a module needs to be compiled with specific |
Plugins come to mind, as discussed in #1566 (comment) in the past. That said, the "abstract" approach does not preclude that option. |
IIUC, the advantage of the explicit approach is that it is simpler to understand and implement, meaning (potentially) fewer bugs. Is that right? Is there anything else to it? |
At this point I'm totally confused about whether we are discussing the implementation or the interface. :) If we are discussing the interface, then the "abstract" approach is to allow Both interfaces can be implemented by deferring or not deferring compilation actions to the
I think the easiest to use is probably (3), and the simplest to implement is (2). |
Yes, I agree. It is the only option that allows the
Yes, the interface. Thanks for good summary, I agree. |
Yeah, good summary @facundominguez that helped clear things up in my head. For reference, (4) is what we currently have implemented. I agree that (2) and (3) have significant advantages though. I don't really have a good grasp on using |
I think the abstract interface is preferable. From a user's perspective it provides a higher level and more declarative interface: The user defines the module dependency graph, not the mechanics behind building individual modules. The explicit approach not just introduces more tedious duplication, it also increases the potential for user errors, e.g. by trying to bundle modules with different package names into the same library target - something that rules_haskell would then have to detect to produce a meaningful error message. Regarding implementation, i.e. the choice between (3) or (4), I think (3) is preferable. I did a test to see if we are affected by bazelbuild/bazel#13281, and we are indeed. I've pushed an example that triggers the issue in acb030b. This constructs a diamond shaped dependency graph like this
Meaning In this case we'd want
Meaning the library is actually built three times. That's precisely the issue described in bazelbuild/bazel#13281. So, transitions are not a good approach in this case. EDIT
(Tested with Nix and direnv installed and direnv enabled in the repository.) |
I'm currently trying to implement (3), however, it feels like going against the current so far. The task is creating the compile actions for the modules, given the descriptions/targets of the haskell_module rules that have been deferred. My first try was making a recursive function, for every The above is very simple, but it doesn't work because bazel starlark doesn't allow recursive functions! My next attempt was to encode the recursion with an explicit stack. This seems to work, but since starlark doesn't offer Is this the right way to go? Here's the function: rules_haskell/haskell/experimental/private/module.bzl Lines 236 to 298 in a7d412e
|
And here's another question: if a module is used in two This is something that affects erasing library boundaries when defining |
Correct, Starlark does not allow recursion, or unbounded loops in general. That said, I don't think this is required in this case. One difficulty seems to be to calculate the set of transitive module dependencies in Another important feature of Bazel is that the build actions don't need to be emitted in any particular order. So, you could do something along the lines of this (just a sketch):
Yes, that is true. Given the |
When trying to decide what data structure to use to collect transitive dependencies of This is But the other alternatives also seem to impose some merging overhead as explained in the documentation. https://docs.bazel.build/versions/main/skylark/depsets.html Looks like the most efficient data structure would be Any thought's on how to proceed here? |
A The more concerning part, performance-wise, is that we need to flatten the |
We have to flatten the transitive dependencies once per compile action for a module in order to produce the list of input interface files. This does look quadratic on the amount of modules in a library. Aren't we better in this respect with the stack-encoded recursion? |
Ah, yes, you're right. I was looking at it the wrong way. This reminds me that we have a somewhat similar situation with the Haskell toolchain libraries, implemented here. In that case we exploit the fact that a I think something like this could be used here as well. If we can iterate over the modules in the right order, such that A appears before B if B depends on A, then we can avoid iterating over each module's transitive dependencies. Another sketch illustrates what I have in mind:
This flattens the |
I'm impressed. That's a clever way to copy the structure of a depset. However, I'm failing to see the essential difference with the previous sketch. Your last sketch still flattens the transitive inputs once per action, only that it is deferred until the very moment the action is executed. Is not it still quadratic? Which also brings to the front the fact that once we require transitive inputs for all actions, there is no approach to implement it that isn't potentially quadratic, not even running the actions in the |
The difference is that it benefits from the sharing of
Yes, if we need those inputs, then we can't avoid that. But, by using |
Should dependencies be inherited from libraries and binaries? |
Good question. I can see two ideals there:
Option 1. is not optimal for incremental builds and parallelism, because a module may be rebuilt when an unused library dependency changes or its build may need to wait for an unused library to finish building. The Option 2. is better for incremental builds and parallelism. But, it's harder to get right with I don't see an obvious winner between the two. I would err on the side of option 2 and consider option 1 if we know that we can make |
How about: I think option (1) could be preferable if we can make |
Agreed, let's do that.
It's essentially letting the user switch between option 1 or 2. Instead of solving the problems of 1 with |
I had an idea regarding cross package module dependencies that came to my mind while looking at #1623. I'm not sure if this is a good idea, but I thought I'll put it out here for consideration. In this approach
Let me know what you think. |
Could work. One aspect that I'd like us to figure out is what is the story when using template haskell. Right now a module which uses TH can't be built until the libraries that need to be loaded are built. This means that compiling the module must wait on entire libraries to be built and linked. Could we only load the object files of the imported modules (and their transitive module dependencies) instead of libraries in these cases? |
We solved the discussion of the abstract vs explicit interface, but it looks like this issue is to stay open until we have implemented all aspects of |
A recap of what else needs to be done to complete this issue would be helpful. |
Agreed, I think in terms of defining and building modules we've covered all that comes to mind:
There are some things that may work already, but we should have tests for to be sure:
And there are integrations with other rules_haskell features:
@facundominguez Let me know if something else comes to mind. |
Additionally, we should cast |
Goal
Add a rule
haskell_module
to support fine grained build definitions at the level of individual Haskell modules.Motivation
The existing
haskell_library|binary|test
rules operate on the level of Haskell packages. In these rules building consist of two main steps with corresponding Bazel build actions: Compile all modules of a package in one build action and link all modules together into a library in a separate action (one for static and one for dynamic linking). Additionally, there is an action to construct the package's package database and several supporting actions.This means that all modules of such a package need to be recompiled even if only a single module in a package changed compared to a previous cached build.
With a
haskell_module
rule we could instead issue one compile action per Haskell module. This would allow us to benefit from Bazel's caching and parallel builds on the level of individual modules.Outline
Not all the details are worked out at this stage, however, the idea is that the
haskell_module
rule may be used like this:In this example the
haskell_module
rules perform the compilation of modules to object and interface files, and thehaskell_library
rule performs the linking of these compiled modules into a static archive or shared object and the generation of a package database for the package.Alternatives considered
The goal is to provide what we called "sub-target incremental builds" in the context of the current
haskell_library|binary|test
rules. I.e. avoid recompiling modules that have not changed between builds even if another module that is part of the same library or executabe target did change. An alternative approach that we considered is to use Bazel's persistent worker mechanism to provide a persistent GHC session that caches previous compilation outputs (object and interface files). Downsides of this approach are that:Anticipated problems
Slower uncached builds
Starting one GHC session per module may well be slower than starting a single GHC session that compiles multiple modules at once. Meaning there is a tradeoff between uncached, builds from scratch and incremental, cached builds. For everyday development the incremental, cached build is likely to be the more common case than uncached, builds from scratch. Meaning, the benefit may outweigh in the common case. Furthermore, we could still use a persistent worker to save on the repeated start-up of GHC sessions similar to how it is already done for Java or Scala in Bazel. In this case we would not need to maintain a cache of compilation outputs in the persistent worker, reducing the downside of this approach.
rules_haskell
already contains an implementation of a persistent worker that could be adapted to this use-case.More verbose build definitions
Haskell compilation has to occur in dependency order: The Haskell compiler requires interface files of modules that the current module depends on as an input in order to compile the current module. This means that
haskell_module
rules need to spell out the Haskell module dependency graph in the BUILD files. Manually maintaining this dependency graph in both the Haskell sources and the BUILD files is tedious. BUILD file generation, e.g. a Gazelle extension for Haskell, can solve this issue.The text was updated successfully, but these errors were encountered: