Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combining efforts with dream2nix #1

Open
DavHau opened this issue Nov 9, 2021 · 4 comments
Open

Combining efforts with dream2nix #1

DavHau opened this issue Nov 9, 2021 · 4 comments

Comments

@DavHau
Copy link

DavHau commented Nov 9, 2021

Hello, I'm the maintainer of dream2nix.
I just learned about that project via nix-community/dream2nix#49
It seems like we're working on very similar things. Should we combine our efforts?
Cheers, David.

@timbertson
Copy link
Owner

Hi! There's definitely similarities in the goals, although it does seem like our implementation ideas are a bit different. I'm aiming to have one big (rust) program for coercing lockfiles -> nix expressions, and then individual nix APIs (with a shared core) for building each kind of library. I'm also focussing on "just do everything consistently (where possible)", where you're aiming to give the user more flexibility.

From what I gather your approach is a lot more decomposed into modular pieces. I'm not sure how well that'll work in practice, but I haven't dug in too much yet. Implementation-wise I'm trying to put as much logic as I can in rust (at codegen time), to keep the build time dependencies and nix logic minimal. I'm a huge fan of types, and the build logic for my original project (opam2nix) got complex from doing too much at build time.

One thing I'd love to reuse though is nix build logic. I'm using nixpkgs' buildCargoPackage for my cargo backend, and buildRubyGem for the bundler backend. Those things seem like a good candidate for sharing since the API isn't that large, and the effort to get the implementation right is much higher than the effort required to massage arguments into the right structure. They also don't really affect the user experience of fetlock itself, as long as they work correctly. On the other hand the build logic for node2nix seems to build a full packageset, and I'm really trying to stick to "one derivation per package".

The other thing that would be good to share is to have a collection of backend-specific tricks / techniques. e.g. how do you deal with cyclical dependencies in yarn? (I was thinking about this yesterday, and haven't found a satisfactory approach). I will take a look at https://github.com/DavHau/dream2nix/blob/main/src/builders/nodejs/granular/default.nix . On first blush it looks like you're using node2nix when there's a cyclic dep involved, but otherwise producing your own flat derivations.

FYI you may be interested in https://github.com/timbertson/fetlock/blob/master/nix/core.nix - it's the base nix overlay underneath each backend, and it's the basis for the consistency in how you override things, regardless of backend (as well as shared code between all backends).

@DavHau
Copy link
Author

DavHau commented Nov 11, 2021

From what I gather your approach is a lot more decomposed into modular pieces. I'm not sure how well that'll work in practice

I haven't digged into your code much, but from reading your description the way we both split the process into modules is already quite similar.

You described the following stages:

  1. rust-program for coercing lockfiles
  2. nix expressions (lock.nix)
  3. and then individual nix APIs

Translated into dream2nix terms this is equivalent to:

  1. Translator
  2. Generic Lock (dream-lock)
  3. Builders

... So to me it doesn't seem like there are a lot more modular pieces in dream2nix.

Your rust program is translating arbitrary upstream lock files into you own lock format (lock.nix) which is shared across all languages/frameworks and in a third stage you read your lock.nix with a nix library that generates derivations.

This is quite exactly the idea behind dream2nix.

Though there are some differences:

  • You restrict the translation phase to be done by rust only. Why? If your lock format is specified well enough, it shouldn't really matter which language is used to generate that lock, as long as it follows the format. Why not leave it up to the maintainer of a certain translator to choose the language?
    Reasons why other languages than rust can be useful:
    • using nix to parse the lock-file enables the ability for an on-the-fly build (no code-gen or IFD necessary)
    • often it is easier to process metadata in the language they are intended for: For example parsing python requirement constraints is best done in python as python already provides libraries to do that.
  • Your lock file is expressed in nix vs. the one from dream2nix in json. I assume the original reason why you have two different nix files (lock.nix and default.nix) for each package is because you want to separate logic from data, so that the lock.nix only expresses data and default.nix expresses the logic. Assuming this is your intention, there is no need to have the lock in nix format. It could as well be in json.
    Your calls to fetchurl could be replaced by type = "fetchurl" which is is interpreted by your build API as a fetchurl call. Then you would actually strictly separate logic from data.
    Also I believe JSON is better because:
    • It has the benefit that it can be read and modified easily by any tool written in any programming language. In a scenario where this ends up in big repositories like nixpkgs, this has many advantages, I believe. It provides a much better basis for all kind of automation. Parsing and modifying nix code is hard and it is hard to reason about relations between packages, as you'd have to evaluate or parse nix code. Expressing package relations in json allows to build all kind of intelligence on top of it.
    • By having nix logic in your lock file, you just end up with massive amounts of duplicated logic living along each generated package. Massively duplicated logic is of no good use in any programming language, I believe.

Implementation-wise I'm trying to put as much logic as I can in rust (at codegen time), to keep the build time dependencies and nix logic minimal

I agree with that point of view, but let's explore for a moment how minimal minimal eventually turns out to be.
As you want to have a generic lock format (lock.nix), it cannot possibly be the perfect format for all languages by design. The format will be the most common denomintor between every language that you support. So you will have to do some post processing to translate that format into nix derivations for the specific languages and that logic needs to be in nix.
That is exactly the concept behind builders in dream2nix. They simply translate the generic lock format to nix derivation. How exatly those builders do that and how much of nixpkgs infrastructur is re-used, that totally depends on the implementation of each builder.

The other thing that would be good to share is to have a collection of backend-specific tricks / techniques. e.g. how do you deal with cyclical dependencies in yarn?

I do not consider cyclic dependencies a backend specific problem. It is a general problem in package management which is seen in multiple ecosystems. Therefore detecting cyclic dependencies is a generic feature in dream2nix which part of the core framework and doesn't need to be dealt with by the individual subsystem. Cyclic dependencies are an extra field in our lock file format. Of course the builder still needs to handle those correctly.
I totally agree that we should share approaches. Please see my comments below on how I believe that could be achieved.

On first blush it looks like you're using node2nix when there's a cyclic dep involved, but otherwise producing your own flat derivations.

I actually just got rid of any usage of node2nix usage (it's not merged to master yet). Not sure what you mean by flat. The nodejs derivaitons we generate ar granular. So if there are 1000 dependencies, then there will be 1000 deriavations.

On the other hand the build logic for node2nix seems to build a full packageset, and I'm really trying to stick to "one derivation per package".

Dream2nix has exactly the same goals and that granular appoach is already implemented in the current nodejs builder.

The core idea of dream2nix is to make all those approaches re-usable.
Separating the framework into translators, data (lock-file), and builders, is what I came up with to achieve that re-usability.

For example, we're putting a lot of effort currently into making the nodejs build backend work and writing overrides for it (follow our progress in matrix if you like: #dream2nix:nixos.org).
If you made your lock-file parser produce the dream-lock.json format instead of lock.nix, you would be able to just re-use our builder and wouldn't have to implement it yourself.

I believe that our approaches are similar enough that we should combine our efforts. That doesn't necessarily mean that we have to contribute to the same repository, but if we agree on some common standards, for example a common lock file format, we can integrate each others work more easily.

If you have any suggestions on how I'd have to change dream2nix' interfaces to improve collaboration, let me know.

@timbertson
Copy link
Owner

Though there are some differences:

You restrict the translation phase to be done by rust only. Why? If your lock format is specified well enough, it shouldn't really matter which language is used to generate that lock, as long as it follows the format. Why not leave it up to the maintainer of a certain translator to choose the language?

Mainly for sharing functionality. In practice, the majority of shared functionality so far revolves around sources:
 - computing sha256 digests
 - getting a local copy of a source archive
 - listing / reading individual files within a source archive

The other big shared part is the structure representing (output) nix syntax, although this is only necessary since I'm outputting nix, not JSON.

Reasons why other languages than rust can be useful:

using nix to parse the lock-file enables the ability for an on-the-fly build (no code-gen or IFD necessary)

I don't understand how you'd ever get the correct sha256 digests without codegen? No lockfile I know of happens to contain nix-compatible digests.

often it is easier to process metadata in the language they are intended for: For example parsing python requirement constraints is best done in python as python already provides libraries to do that.

This is true, and I've ended up doing this already for bundler & ocaml. Although in these cases it's encapsulated within fetlock - the rust code spawns the language-specific translator, which contains the bare minimum language-specific logic. The result is parsed by rust so that most of the logic remains in rust (and can be shared).

Your lock file is expressed in nix vs. the one from dream2nix in json. I assume the original reason why you have two different nix files (lock.nix and default.nix) for each package is because you want to separate logic from data, so that the lock.nix only expresses data and default.nix expresses the logic. Assuming this is your intention, there is no need to have the lock in nix format. It could as well be in json.

Not quite. The separation is really machine-generated vs handwritten. In particular, some backends do make use of expressions which you can't write in JSON, e.g referencing other derivations in build/install scripts, and including OS-specific logic:

fetlock/examples/esy/lock.nix

Lines 2397 to 2403 in 93fbed3

installPhase = ''
esy-installer Oni2.install
bash -c "${if final.os == "windows" then "cp /usr/x86_64-w64-mingw32/sys-root/mingw/bin/*.dll '$cur__bin'" else "echo"}"
bash -c "cp ${(final.getDrv "esy-sdl2@2.0.10008@d41d8cd9")}/bin/*.dll '$cur__bin' ${if final.os == "windows" then "" else "2>/dev/null || true"}"
bash -c "cp ${(final.getDrv "esy-skia@github:revery-ui/esy-skia#91c98f6@d41d8cd9")}/bin/skia.dll '$cur__bin' ${if final.os == "windows" then "" else "2>/dev/null || true"}"
bash -c "cp ${(final.getDrv "esy-angle-prebuilt@1.0.0@d41d8cd9")}/bin/*.dll '$cur__bin' ${if final.os == "windows" then "" else "2>/dev/null || true"}"
'';

(it's ugly, but this expressiveness is required to support opam/esy)

It also allows direct references things like pkgs.python, for package managers like opam which contain information about system dependencies that the package manager itself doesn't provide.

You could represent all of this with JSON, but it'd be more cumbersome.

Also I believe JSON is better because:

It has the benefit that it can be read and modified easily by any tool written in any programming language. In a scenario where this ends up in big repositories like nixpkgs, this has many advantages, I believe. It provides a much better basis for all kind of automation. Parsing and modifying nix code is hard and it is hard to reason about relations between packages, as you'd have to evaluate or parse nix code. Expressing package relations in json allows to build all kind of intelligence on top of it.

This is true, and I've wished nix were more machine-editable as well. But I'd rather use nix to support complex expressions (above) than the still-theoretical benefits from having other tools edit my already-machine-generated data.

By having nix logic in your lock file, you just end up with massive amounts of duplicated logic living along each generated package. Massively duplicated logic is of no good use in any programming language, I believe.

The fact that the lock is in nix doesn't lead to more duplication, it's the same amount of data you'd put in JSON. It just lets you represent some things that JSON can't. The repetitive logic is still in (my equivalent of) the builder.

Implementation-wise I'm trying to put as much logic as I can in rust (at codegen time), to keep the build time dependencies and nix logic minimal

I agree with that point of view, but let's explore for a moment how minimal minimal eventually turns out to be.
As you want to have a generic lock format (lock.nix), it cannot possibly be the perfect format for all languages by design. The format will be the most common denomintor between every language that you support. So you will have to do some post processing to translate that format into nix derivations for the specific languages and that logic needs to be in nix.
That is exactly the concept behind builders in dream2nix. They simply translate the generic lock format to nix derivation. How exatly those builders do that and how much of nixpkgs infrastructur is re-used, that totally depends on the implementation of each builder.

Yep, that's essentially the same as my nix backends. And I assume that your lock format is also extensible - e.g. a given transformer will add language-specific metadata which the builder then expects / handles. The difference is perhaps more in how much logic goes into the builders, vs how much is done by the transformers. I'm trying to do as much as possible in the transformer.

As a specific example, I have now implemented cyclic dependencies for yarn. I do that detection in rust, and output a list of "dependencies that need to be added to NODE_PATH" in the root package:

nodePathDeps = [
("babel-core@6.26.3")
];

(putting cyclic deps on NODE_PATH means they can be found when not present in node_modules directly)

You could put much more of this logic in nix (and I believe you have), it's just a matter of where you want to put the complexity. I believe that sharing a common language for the transformers should result in more code reuse and less complexity in the builders, but it's too early to tell if that's true.

The other thing that would be good to share is to have a collection of backend-specific tricks / techniques. e.g. how do you deal with cyclical dependencies in yarn?

I do not consider cyclic dependencies a backend specific problem. It is a general problem in package management which is seen in multiple ecosystems.

Interesting. I mostly use compiled languages these days, where it makes compilation impossible. So I've only seen it in node so far. Where else have you seen it?

On first blush it looks like you're using node2nix when there's a cyclic dep involved, but otherwise producing your own flat derivations.

I actually just got rid of any usage of node2nix usage (it's not merged to master yet).

Got a branch I can look at? I'm curious :)

On the other hand the build logic for node2nix seems to build a full packageset, and I'm really trying to stick to "one derivation per package".

Dream2nix has exactly the same goals and that granular appoach is already implemented in the current nodejs builder.

Great! I think I saw you state that the user would be in control of things like whether you want many granular derivations or one big one, is that still a goal? It sounds hard to support both at the same time.

If you made your lock-file parser produce the dream-lock.json format instead of lock.nix, you would be able to just re-use our builder and wouldn't have to implement it yourself.

I believe that our approaches are similar enough that we should combine our efforts. That doesn't necessarily mean that we have to contribute to the same repository, but if we agree on some common standards, for example a common lock file format, we can integrate each others work more easily.

Yep, I'm definitely open to it. I wouldn't be rewriting any rust in python (or nix), but sharing ideas and maybe concrete builders could work. Do you think any of your builders would make sense to upstream into nixpkgs as standalone functionality, or would they only make sense as part of upstreaming the whole of dream2nix itself?

If you have any suggestions on how I'd have to change dream2nix' interfaces to improve collaboration, let me know.

One immediate thing that strikes me is the use of "name + version" pairs. I've keyed implementations by an opaque backend-specific "key" which is typically name+version, but doesn't have to be. In some cases you may want multiple copies of a given name + version (in cargo you can have the same crate with different compile-time features enabled). Each derivation still contains a name & version, it's just not (necessarily) keyed that way. That's just a surface level thing though, I haven't tried it enough to get a good understanding of the internals yet.

@DavHau
Copy link
Author

DavHau commented Nov 25, 2021

I don't understand how you'd ever get the correct sha256 digests without codegen? No lockfile I know of happens to contain nix-compatible digests.

A few examples for lock file which contain nix compatible digests:

  • nodejs: package-lock.json, yarn.lock
  • python: poetry.lock
  • rust: Cargo.lock

I'd say most lock files use hashes like sha1, sha256, sha512 and those are all nix compatible. An exception would be go.sum where a custom hash algo is used AFAIK.

Interesting. I mostly use compiled languages these days, where it makes compilation impossible. So I've only seen it in node so far. Where else have you seen it?

It is quite common in python packages as well. So far I have mainly worked on python and nodejs packaging, therefore I'm not sure how it is with most other build systems.

Got a branch I can look at? I'm curious :)

My main branch is always the best version to look at (Just last time I pointed you to dev as some change was still being worked on). For the nodejs builder, see here: https://github.com/nix-community/dream2nix/blob/main/src/builders/nodejs/granular/default.nix

Great! I think I saw you state that the user would be in control of things like whether you want many granular derivations or one big one, is that still a goal? It sounds hard to support both at the same time.

It is not a goal to have a generic abstraction over whether to use granular or aggregated building. I just want to have an abstract concept builder, so we can maintain different implementations of how to realize the dream lock to nix derivations. I just mentioned granular vs combined building as an example of why that could be useful. Usually it should be the goal to have only one builder for each build system, but I want to leave the possibility open to add additional ones. One extreme example of how this could potentially be used would be to implement a builder that doesn't create nix derivations but something else, like guix packages for example.
Having this abstraction already allowed me to start with just re-using the build logic of node2nix and slowly transition away to my own implementation. Even now, where the new builder is the default, people could still switch back to using node2nix, if they want to for some reason.

Yep, I'm definitely open to it. I wouldn't be rewriting any rust in python (or nix), but sharing ideas and maybe concrete builders could work. Do you think any of your builders would make sense to upstream into nixpkgs as standalone functionality, or would they only make sense as part of upstreaming the whole of dream2nix itself?

First I'd like to clarify that none of the relevant core logic of the framework is implemented in python. All the important interfaces are in nix, so you can just call dream2nix from within a nix expression and have the full functionality. Python right now is only used as a CLI, wrapping the nix functionality in a user friendly and interactive way, but using that is optional. I also use python as a scripting language during build time for some things that are just super inefficient in bash, like creating thousands of symlinks.

Regarding re-using builders in other projects like nixpkgs:
Since recently I re-designed the architecture a bit (this is not necessarily final), so now it roughly works like this:
dream-lock.json -> dream-lock-utils -> builder

Originally the dream-lock-utils didn't exist and builders were directly reading from the dream-lock, but I introduced this layer, as it allows us to make future changes to the lock format without breaking all existing builders and also it allows us to simplify implementing new builders by adding some useful helper functions to dream-lock-utils.

To make our current builders re-usable, the dream-lock-utils would need to be carved out from dream2nix and provided as an independent library. That shouldn't be too hard I guess.

I see two possible ways how the builders then could be used in other contexts:

  1. Express your data in the dream-lock format and use the existing dream-lock-utils + the builder.
  2. Express your data in any other format and provide your own implementation of dream-lock-utils.

If you have any ideas on that, please let me know.
In general I have the feeling that agreeing on a lock file format which only contains data and no logic will make it significantly simpler to share approaches for building. I believe, having a common data format where builders can just read from is the most simple way of making them exchangeable and thereby compatible. If the lock file already contains blocks of logic, it is much harder to make whatever layer comes afterwards exchangeable, because then the builder needs to be able to deal with exactly that existing hard coded logic and cannot freely implement it's own.
I understand that you have good reasons to put as much logic as possible in your translation layer. Of course it comes with down sides having more logic in nix, but in the end it is all a trade-off. I feel this trade-off is worth it, due to the flexibility gained by it.

To give another example. I might be interested in re-using some of your parsers / translators, or whatever we call those. Since you are outputting nix logic directly, there isn't really any place where I can hook into and re-use the result, as long as I am not compatible with exactly that logic you are outputting. If you would instead output data only (JSON, TOML, ...), I could read that and use my own logic to build stuff. It would allow me to just treat your translator as a black box.

One immediate thing that strikes me is the use of "name + version" pairs. I've keyed implementations by an opaque backend-specific "key" which is typically name+version, but doesn't have to be. In some cases you may want multiple copies of a given name + version (in cargo you can have the same crate with different compile-time features enabled). Each derivation still contains a name & version, it's just not (necessarily) keyed that way. That's just a surface level thing though, I haven't tried it enough to get a good understanding of the internals yet.

I think we don't need to express different variations of a software inside its key necessarily. We can have interfaces where there compile time options can be configured by the user. I don't see the need of expressing that inside the key. If we would, then the questions is, where do we start and where do we stop. There is an indefinite amount of different configurations a software can be built with.

If users want to create package collections, with different keys for different variations, they can create that manually ontop of dream2nix without any issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants