Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for hrepl. #1210

Merged
merged 9 commits into from Jan 27, 2020
Merged

Add support for hrepl. #1210

merged 9 commits into from Jan 27, 2020

Conversation

judah
Copy link
Collaborator

@judah judah commented Jan 12, 2020

hrepl is a standalone binary that runs REPLS for Bazel Haskell targets.
Its repository is located at: https://github.com/google/hrepl

hrel takes a list of labels on the command-line, in contrast to haskell_repl()
which requires editing BUILD files maually. For example:

$ hrepl tests/binary-with-lib:lib

This change adds specific output groups to rules_haskell which allow hrepl
to understand its rule types. Internally, hrepl runs the following steps:

  • Run bazel build --output_groups=... haskell:toolchain_info and read the resulting
    protobuf file to get the path to GHC as well as any relevant command-line options.
  • Use bazel query to figure out exactly what targets to build.
  • "bazel build --output_groups=..." to compile the immediate dependencies of
    the targets.
  • Read the "LibraryInfo" protobuf file which contains information about
    them (e.g., the paths of the package DBs).
  • "bazel build --output_groups=..." the targets to be interpreted. This makes
    sure their srcs, runfiles, C++ dependencies, etc are available, but doesn't
    actually compile them. Instead, it writes the necessary information to a "CompileInfo"
    protobuf file.
  • Run ghc --interactive, combining the flags obtained from all the previous steps.

So this change adds two kinds of output groups:

  • Library info: For targets that hrepl should recognize as dependencies, such as
    haskell_library, haskell_toolchain_library and haskell_cabal_library.
  • Compilation info: For targets that hrepl should load directly, such as haskell_library
    and haskell_binary.

This change also makes the protoc binary a dependency of all Haskell build rules,
rather than just haskell_proto_library. It writes files in the human-readable
text format, and then uses the protoc binary to convert them to the binary proto format.
The text format itself doesn't have a good forwards/backwards compatibility story,
which is important since hrepl is a separate, standalone executable. (For example,
a text format parser will fail if it encounters a field name that it doesn't understand,
which makes it more difficult to deprecate old fields.)

`hrepl` is a standalone binary that runs REPLS for Bazel Haskell targets.
Its repository is located at: https://github.com/google/hrepl

`hrel` takes a list of labels on the command-line, in contrast to `haskell_repl()`
which requires editing BUILD files maually.  For example:

```
$ hrepl tests/binary-with-lib:lib
```

This change adds specific output groups to `rules_haskell` which allow `hrepl`
to understand its rule types. Internally, `hrepl` runs the following steps:

- Run `bazel build --output_groups=... haskell:toolchain_info` and read the resulting
  protobuf file to get the path to GHC as well as any relevant command-line options.
- Use `bazel query` to figure out exactly what targets to build.
- "bazel build --output_groups=..." to compile the immediate dependencies of
  the targets.
- Read the "LibraryInfo" protobuf file which contains information about
  them (e.g., the paths of the package DBs).
- "bazel build --output_groups=..." the targets to be interpreted.  This makes
  sure their `srcs`, runfiles, C++ dependencies, etc are available, but doesn't
  actually compile them.  Instead, it writes the necessary information to a "CompileInfo"
  protobuf file.
- Run `ghc --interactive`, combining the flags obtained from all the previous steps.

So this change adds two kinds of output groups:

- Library info: For targets that `hrepl` should recognize as dependencies, such as
  `haskell_library`, `haskell_toolchain_library` and `haskell_cabal_library`.
- Compilation info: For targets that `hrepl` should load directly, such as `haskell_library`
  and `haskell_binary`.

This change also makes the `protoc` binary a dependency of all Haskell build rules,
rather than just `haskell_proto_library`.  It writes files in the human-readable
text format, and then uses the `protoc` binary to convert them to the binary proto format.
The text format itself doesn't have a good forwards/backwards compatibility story,
which is important since `hrepl` is a separate, standalone executable.  (For example,
a text format parser will fail if it encounters a field name that it doesn't understand,
which makes it more difficult to deprecate old fields.)
@judah judah requested a review from aherrmann January 12, 2020 04:07
Copy link
Member

@aherrmann aherrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for open sourcing hrepl and for opening this PR!

I've added a couple review comments for questions or change-requests.

rule_info/rule_info.proto Outdated Show resolved Hide resolved
}

// Information about a built library.
message LibraryInfo {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LibraryInfo only mentions shared libraries. How does hrepl handle static libraries?

There are two situations where this is relevant.

  1. A static-mode GHC toolchain (e.g. on Windows) where Haskell libraries are only compiled to static archives. GHCi can load these, provided they are built with -fPIC (and -fexternal-dynamic-refs on Unix).
  2. A C library dependency that only provides a pic_static_library. GHCi's loader can usually load these.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't currently handle them. We originally used static archives, but switched a year ago to dynamically-linked GHC to work around bugs in the static loader. It seems pretty reasonable to me that it could support both.

I filed google/hrepl#7 to track this, though as you've noted it would also require changes to rules_haskell itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@judah do you remember what bugs those were? We're likewise interested in making the static linker work better for everyone. It would be quite a win: in the common case we could avoid building dynamic libraries altogether.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though as you've noted it would also require changes to rules_haskell itself.

@judah which required changes are you referring to?

rules_haskell supports static rts GHC and sets -fPIC and -fexternal-dynamic-refs as required and uses pic_static_library of dependencies where available. On Windows that's the default, as the Windows bindist uses static rts mode. On Unix it's optional, currently exposed for nixpkgs provisioned GHC through the is_static attribute, since there are no static rts GHC bindists available for Unix.

For example, Digital Asset's open source daml repository uses static linking in this way. As a test, I've run hrepl with this PR's patches on that repository and was able to load a large target with static C library dependencies into GHCi using

LIBRARY_PATH=$PWD/bazel-out/k8-opt/bin/compiler/damlc/_ghci_libs-damlc hrepl damlc

Looking at cc_libs in compile_info_output_groups, it doesn't seem to distinguish static and dynamic libraries, so transitive_cc_shared_libs will end up holding both. So, maybe just the name is inaccurate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didn't realize that static libraries got passed through the same field of cc_libs. I've renamed the field to transitive_cc_libs. (I don't have access to a static GHC at the moment, so we may need to leave that to future work.)

We got hit by a few bugs around the static linker in the transition from 8.0->8.4 and 8.4->8.6, including multiple instances of segfaults on ANN pragmas. One example was:
https://gitlab.haskell.org/ghc/ghc/issues/14675
But there were a couple others we never tracked down well enough to file a useful bug report (in particular we weren't able to rule out effects of our internal toolchain).

At the time (early 2019), the existing comments on the GHC issues didn't give us a lot of confidence in the static RTS. But it's good to hear you've taken it up in earnest, and your use of -fexternal-dynamic-refs sounds promising; we never discovered that flag, and the problem mentioned in that link (R_X86_64_PC32) does sound familiar. I'll make a note to try using that in our codebase.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW my understanding is that interest in the static linker from GHC HQ has varied over the years. When support from dynamic libraries was first introduced, I suppose the vision was to eventually retire the static linker, because it's duplicating so much OS functionality. However, the story around dynamic libraries turned out less than ideal. And at this point, I believe even Facebook (who employs @simonmar) might be using a static toolchain, or so https://engineering.fb.com/security/fighting-spam-with-haskell/ would have me believe.

At any rate, the static toolchain is what Digital Asset uses too.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I believe we're not using the static runtime linker any more at Facebook. The static runtime linker requires all linked objects to fit in the low 2GB of the address space, and we eventually ran into that limit and had to switch to building a standard .so and using the system dyanmic linker to load it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonmar very interesting. And thanks for chiming in! So out of curiosity, do you now only use standard .so during your build, or do you still use -dynamic-too when building? If the former, that would be nice to speed up the build, but it's not an option for our macOS users, where the Mach-O header size limits are intentionally very low (hence, there's only so many dynamic dependencies one can have).

rule_info/rule_info.proto Outdated Show resolved Hide resolved
haskell/repositories.bzl Outdated Show resolved Hide resolved
haskell/private/haskell_impl.bzl Outdated Show resolved Hide resolved
haskell/private/haskell_impl.bzl Outdated Show resolved Hide resolved
haskell/private/actions/info.bzl Outdated Show resolved Hide resolved
haskell/private/actions/info.bzl Outdated Show resolved Hide resolved
haskell/private/actions/info.bzl Outdated Show resolved Hide resolved
haskell/private/actions/info.bzl Outdated Show resolved Hide resolved
@Profpatsch
Copy link
Contributor

This change also makes the protoc binary a dependency of all Haskell build rules,
rather than just haskell_proto_library

This sounds weird to me, can’t we make that optional?

@aherrmann
Copy link
Member

This change also makes the protoc binary a dependency of all Haskell build rules,
rather than just haskell_proto_library

This sounds weird to me, can’t we make that optional?

One way that comes to mind is to factor out the rule info output groups into an aspect. That aspect would then depend on protoc, and the Haskell rules wouldn't have to. @judah What to you think?


Though, my understanding is that protoc will only have to be built if the rule info output groups are actually requested. To test this, I ran the following two commands:

$ bazel clean --expunge && bazel build --execution_log_json_file=build_execlog.json //tests/binary-simple
$ bazel clean --expunge && bazel build --execution_log_json_file=info_execlog.json //tests/binary-simple --output_groups=haskell_compile_info,haskell_library_info

Looking at the logs, build_execlog.json contains no references to protoc. So, as I understand, this confirms that protoc is only built if the output groups are requested. Is that correct @judah?

@Profpatsch
Copy link
Contributor

If it’s only lazily requested, then it’s fine.

@judah
Copy link
Collaborator Author

judah commented Jan 16, 2020

Yeah, protoc is built lazily as @aherrmann observed. I worded my original comment poorly; it's more that the com_google_protobuf repo (which defines how to build protoc) now needs to be included in haskell/repositories.bzl, whereas before it was only needed if you used haskell_proto_library. But that seems pretty low-overhead, especially since most of the necessary dependencies were already included (like rules_python, as pointed out in the review).

My preference would be to keep using output groups rather than aspects. The former is a little more straightforward, while the latter would also require us to make new providers to pass the relevant information between the regular rules and the aspect(s). (It could also be a bit tricky to integrate with haskell_proto_aspect, which itself needs to emit some information and currently does so via the same output groups as the other rules.)

We'll pass those automatically in hrepl itself.
@aherrmann
Copy link
Member

My preference would be to keep using output groups rather than aspects. The former is a little more straightforward, while the latter would also require us to make new providers to pass the relevant information between the regular rules and the aspect(s).

That's a good point. Let's stick to the output groups then. I'm also fine with a lazy dependency on protoc.

(@judah Sorry, I somehow managed to edit your comment instead of replying. I tried to fix it again.)

@aherrmann
Copy link
Member

The CI failures should be fixed by rebase on master (see #1214)

It may contain either static or shared libraries.
Copy link
Member

@aherrmann aherrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@judah It would be good if we could also test the hrepl integration in rules_haskell, so that we don't accidentally break it. What do you think would be a good way to do this? Do you think there's a way to do this in a bazel test, or will that have to be tested outside of Bazel?

That doesn't have to be included in this PR, but can go into a separate one. Just wanted to start the discussion.

haskell/private/actions/info.bzl Outdated Show resolved Hide resolved
haskell/private/actions/info.bzl Outdated Show resolved Hide resolved
Since it will also hold static libraries too (google/hrepl#7).
@aherrmann
Copy link
Member

/azp run

@judah
Copy link
Collaborator Author

judah commented Jan 22, 2020

@aherrmann yes, let's do the testing in a follow-up PR. The hrepl repo itself has tests which are (sort of) driven by bazel:
https://github.com/google/hrepl/blob/master/hrepl/tests/run_tests.sh
https://github.com/google/hrepl/blob/d946c54471fa5f1ca7dc5b8baf0db6a19429b319/hrepl/tests/BUILD.bazel#L5

They work by pointing bazel to the current directory but with --output_base pointing to a temporary directory. It uses --test_env to inject those values into a given bazel test invocation. However, because they need to download and install the GHC bindist, etc. in a new location (due to different output_base), they take ~10m to run on my laptop, and probably more in CircleCI.

I can think of a few possiblities:

  1. Run hrepl's tests in CI (or maybe a subset of them).
  2. Run a simplified version of hrepl's tests in CI that doesn't try to be as hermetic, and reuses the same output_base as the rest of the build. It should be much faster as a result. It would still miss testing some edge case situations, but we'd get that covered by the hrepl repo's own testing.
  3. Do something much simpler, similar to what RunTests.hs does now. For example, instead of bazel run //...@repl -- ... do bazel build @hrepl//hrepl; bazel-bin/external/hrepl/hrepl/hrepl //... ...

`protoc` depends on `@zlib` and that collides with the `@zlib` workspace
in the `examples`. We cannot directly use `protoc`'s `@zlib` as that
library's name is mangled and Cabal won't recognize it.
@aherrmann
Copy link
Member

I took a look at CI

  • the Azure failure is unrelated to this PR and fixed by Update rules_sh 0.1.0 --> 0.1.1 #1220.
  • on buildkite linting failed and is fixed by bazel run //:buildifier-fix.
  • on buildkite the "example" test-case failed, because it collided with the @zip external workspace due to the protoc dependency.

I've pushed fixes to this PR.

Copy link
Member

@aherrmann aherrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@judah Once again, thanks for open sourcing hrepl and upstreaming support to rules_haskell.

I think this is ready to merge. I've opened #1221 to continue the testing discussion there.

@aherrmann aherrmann added the merge-queue merge on green CI label Jan 24, 2020
@aherrmann aherrmann merged commit 9be6214 into tweag:master Jan 27, 2020
@mergify mergify bot removed the merge-queue merge on green CI label Jan 27, 2020
aherrmann added a commit that referenced this pull request Jan 27, 2020
The buildkite pipeline is either called
```
buildkite/rules-haskell/pr
```
or
```
buildkite/rules-haskell
```

See for example #1215
or #1210.

The latter seems to be the case for external PRs where the buildkite
pipeline needs to be initiated manually by pushing to a branch within
the repository. However, I have also observed it for internal PRs, e.g.
#1214.

In any case, we want to accept both pipeline names as mergify merge
criteria. I have tested the configuration in the [mergify
simulator](https://dashboard.mergify.io/).
@judah judah deleted the hrepl branch January 27, 2020 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants