package manager #943
package manager #943
Comments
My thoughts on Package Managers:
Thoughts that might not be great for Zig:
|
This is a good reference for avoiding the complexity of package managers like cargo, minimal version selection is a unique approach that avoids lockfiles, .modverify avoids deps being changed out from under you. https://research.swtch.com/vgo The features around verifiable builds and library verification are also really neat. Also around staged upgrades of libraries and allowing multiple major versions of the same package to be in a program at once. |
I assume you mean authors can't unpublish without admin intervention. True immutability conflicts with the hoster's legal responsibilities in most jurisdictions.
I'd wait a few years to see how that pans out for Go. |
Note that by minimal, they mean minimal that the authors said was okay. i.e. the version they actually tested. The author of the root module is always free to increase the minimum. It is just that the minimum isn't some arbitrary thing that changes over time when other people make releases. |
My top three things are;
A good package manager can break/make a language, one of the reasons why Go has ditched atleast one of its official package managers and completely redid it (it may even be two, I haven't kept up to date with that scene). |
The first thing I'm going to explore is a decentralized solution. For example, this is what package dependencies might look like: const Builder = @import("std").build.Builder;
const builtin = @import("builtin");
pub fn build(b: &Builder) void {
const mode = b.standardReleaseOptions();
var exe = b.addExecutable("tetris", "src/main.zig");
exe.setBuildMode(mode);
exe.addGitPackage("clap", "https://github.com/Hejsil/zig-clap",
"0.2.0", "76c50794004b5300a620ed71ef58e4444455fd72e7f7e8f70b7d930a040210ff");
exe.addUrlPackage("png", "http://example.com/zig-png.tar.gz",
"00e27a29ead4267e3de8111fcaa59b132d0533cdfdbdddf4b0604279acbcf4f4");
b.default_step.dependOn(&exe.step);
} Here we provide a mapping of a name and a way for zig to download or otherwise acquire the source files of the package to depend on. Since the build system is declarative, zig can run it and query the set of build artifacts and their dependencies, and then fetch them in parallel. Dependencies are even stricter than version locking - they are source-locked. In both examples we provide a SHA-256 hash, so that even a compromised third party provider cannot compromise your build. When you depend on a package, you trust it. It will run Running |
although you might argue
in that case you'd have to check all the reps of all your reps recursively (manually?) on each shape change though to be really sure |
This is already true about all software dependencies. |
I've been considering how one could do this for the past few days, here is what I generally came up with (this is based off @andrewrk 's idea), I've kept out hashes to make it easier, I'm more talking about architecture then implementation details here;
This would also solve the issue of security fixes as most users would keep the second option which is intended for small bug fixes that don't introduce any new things, whereas the major version is for breaking changes and the minor is for new changes that are typically non-breaking. Your build file would have something like this in your 'build' function; ...
builder.addDependency(builder.Dependency.Git, "github.com.au", "BraedonWooding", "ZigJSON", builder.Versions.NonMajor);
// Or maybe
builder.addDependency(builder.Dependency.Git, "github.com.au/BraedonWooding/ZigJSON", builder.Versions.NonMajor);
// Or just
builder.addGitDependency("github.com.au/BraedonWooding/ZigJSON", builder.Versions.NonMajor);
... Keeping in mind that svn and mercurial (as well as plenty more) are also used quite a bit :). We could either use just a folder system of naming to detect what we have downloaded, or have a simple file storing information about all the files downloaded (note: NOT a lock file, just a file with information on what things have been downloaded). Would use tags to determine versions but could also have a simple central repository of versions linking to locations like I believe what other things have. |
How would you handle multiple definitions of the same function? I find this to be the most difficult part of C/C++ package management. Or does Zig use some sort of package name prefixing? |
@isaachier Well you can't have multiple definitions of a function in Zig, function overloads aren't a thing (intended). You would import a package like; const Json = @Import("JSON/index.zig");
fn main() void {
Json.parse(...);
// And whatever
} When you 'include' things in your source Zig file they are exist under a variable kinda like a namespace (but simpler), this means that you should generally never run into multiple definitions :). If you want to 'use' an import like If for some reason you 'use' two 'libraries' that have a dual function definition you'll get an error and will most likely have to put one under a namespace/variable, very rarely should you use |
I don't expect a clash in the language necessarily, but in the linker aren't there duplicate definitions for |
@isaachier If you don't define your functions as |
OK that makes sense. About package managers, I'm sure I'm dealing with experts here
|
These are important questions. The first question brings up an even more fundamental question which we have to ask ourselves if we go down the decentralized package route: how do you even know that a given package is the same one as another version? For example, if FancyPantsJson library is mirrored on GitHub and BitBucket, and you have this:
Here, we know that the library is the same because the sha-256 matches, and that means we can use the same code for both dependencies. However, consider if one was on a slightly newer version:
Because this is decentralized, the name "fancypantsjson" does not uniquely identify the package. It's just a name mapped to code so that you can do But we want to know if this situation occurs. Here's my proposal for how this will work: comptime {
// these are random bytes to uniquely identify this package
// developers compute these once when they create a new package and then
// never change it
const package_id = "\xfb\xaf\x7f\x45\x86\x08\x10\xec\xdb\x3c\xea\xb4\xb3\x66\xf9\x47";
const package_info = @declarePackage(package_id, builtin.SemVer {
.major = 1,
.minor = 0,
.revision = 1,
});
// these are the other packages that were not even analyzed because they
// called @declarePackage with an older, but API-compatible version number.
for (package_info.superseded) |ver| {
@compileLog("using 1.0.1 instead of", ver.major, ver.minor, ver.revision);
}
// these are the other packages that have matching package ids, but
// will additionally be compiled in because they do not have compatible
// APIs according to semver
for (package_info.coexisting) |pkg| {
@compileLog("in addition to 1.0.1 this version is present",
pkg.sem_ver.major, pkg.sem_ver.minor, pkg.sem_ver.revision);
}
} The prototype of this function would be:
Packages would be free to omit a package declaration. In this case, multiple copies of the Multiple package declarations would be a compile error, as well as Let us consider for a moment, that one programmer could use someone else's package id, and then At first this may seem like a problem, but consider:
Really, I think this is a benefit of a decentralized approach. Going back to the API of const encoding_table = blk: {
const package_id = "\xfb\xaf\x7f\x45\x86\x08\x10\xec\xdb\x3c\xea\xb4\xb3\x66\xf9\x47";
const package_info = @declarePackage(package_id, builtin.SemVer {
.major = 2,
.minor = 0,
.revision = 0,
});
for (package_info.coexisting) |pkg| {
if (pkg.sem_ver.major == 1) {
break :blk pkg.namespace.FLAC_ENCODING_TABLE;
}
}
break :blk @import("flac.zig").ENCODING_TABLE;
};
// ...
pub fn lookup(i: usize) u32 {
return encoding_table[i];
} Here, even though we have bumped the major version of this package from 1 to 2, we know that the FLAC ENCODING TABLE is unchanged, and perhaps it is 32 MB of data, so best to not duplicate it unnecessarily. Now even versions 1 and 2 which coexist, at least share this table. You could also use this to do something such as: if (package_info.coexisting.len != 0) {
@compileError("this package does not support coexisting with other versions of itself");
} And then users would be forced to upgrade some of their dependencies until they could all agree on a compatible version. However for this particular use case it would be usually recommended to not do this, since there would be a general Zig command line option to make all coexisting libraries a compile error, for those who want a squeaky clean dependency chain. ReleaseSmall would probably turn this flag on by default. As for your second question,
Package caching will happen like this:
Caching is an important topic in the near future of zig, but it does not yet exist in any form. Rest assured that we will not get caching wrong. My goal is: 0 bugs filed in the lifetime of zig's existence where the cause was a false positive cache usage. |
One more note I want to make: In the example above I have: exe.addGitPackage("fancypantsjson", "https://github.com/mrfancypants/zig-fancypantsjson",
"1.0.2", "dea956b9f5f44e38342ee1dff85fb5fc8c7a604a7143521f3130a6337ed90708"); Note however that the "1.0.2" only tells Zig how to download from a git repository ("download the commit referenced by So the package dependency can be satisfied by any semver-compatible version indirectly or directly depended on. With that in mind, this decentralized strategy with
You can also force your dependency's dependency's dependency (and so on) to upgrade, simply by adding a direct dependency on the same package id with a minor or revision bump. And to top it off you can purposefully inject code into your dependency's dependency's dependency (and so on), by:
This strategy could be used, for example, to add |
Another note: this proposal does not actually depend on the self hosted compiler. There is nothing big blocking us from starting to implement it. It looks like:
|
maybe worth considering p2p distribution and content addressing with ipfs? see https://github.com/whyrusleeping/gx for example just a thought |
One important thing to note, especially for adoption by larger organization: think about a packaging format and a repo structure that is proxy/caching/mirroring friendly and that also allows an offline mode. That way the organization can easily centralize their dependencies instead of having everyone going everywhere on the internet (a big no-no for places such as banks). Play around a bit with Maven and Artifactory/Nexus if you haven't already |
The decentralized proposal I made above is especially friendly to p2p distribution, ipfs, offline modes, mirroring, and all that stuff. The sha-256 hash ensures that software is built according to expectations, and the matter of where to fetch the resources can be provided by any number of "plugins" for how to download something:
|
Looks good but I'd have to try it out in practice before I can say for sure I'd have one suggestion: for naming purposes, maybe it would be a good idea to also have a "group" or "groupId" concept? In many situations it's useful to see the umbrella organization from which the dependency comes. Made up Java examples:
Otherwise what happens is that people basically overload the name to include the group, everyone in their own way (apache-httpclient, regexutils-apache). Or they just don't include it and you end up with super generic names (httpclient). It also prevents or minimizes "name squatting". I.e. the first comers get the best names and then they abandon them... |
Structs provide the encapsulation you are looking for @costincaralvan. They seem to act as namespaces would in C++. |
I agree with @costincaraivan. npm has scoped packages for example: https://docs.npmjs.com/getting-started/scoped-packages. In addition to minimizing name squatting and its practical usefulness (being able to more easily depend on a package if it is coming from an established organization or a well-known developer), honoring the creators of a package besides their creation sounds more respectful in general, and may incentivize people to publish more of their stuff :). On the other hand, generic package names also come in handy because there is one less thing to remember when installing them. |
I didn't want to clutter the issue anymore but just today I bumped into something which is in my opinion relevant for the part I posted about groups (or scoped packages in NPM parlance): http://bitprophet.org/blog/2012/06/07/on-vendorizing/ Look at their dilemma regarding the options, one of the solutions is forking the library:
This would be easily solvable with another bit of metadata, the group. In Java world their issue would be solved by forking the library and then publishing it under the new group. Because of the group it's immediately obvious that the library was forked. Even easier to figure out in a repository browser of sorts since the original version would have presumably many versions while the fork will probably have 1 or 2. |
From the above link to Deno's docs: import { assertEquals } from "https://deno.land/std@0.73.0/testing/asserts.ts"; I don't think it's a good idea to bake URLs into Zig source code. Deno is a JS/TS runtime -- a web technology. Allowing URLs in source imports makes sense for it. Zig, on the other hand is a general purpose programming language, and needs 1st-class support for offline builds. |
@FlyingWombat I am not saying it should mimic Deno, just use as a source of inspiration since they have solved similar problems. I am not sure what you mean with offline builds. Deno is just a runtime for server side code, it has the exact same requirements for building offline as Zig. If you read the documentation you will see that all dependencies are cached the first time you run (build) your application. |
This I take no issue with. But I felt that many would just look at Deno's import syntax as the primary feature (as I did); and I disagree with that syntax for Zig. How Deno implemented it's package management, yes could have some valuable insight. If you have a specific feature or implementation detail from Deno's package management that you would like to highlight, please mention it. What I mean by 1st-class support for offline builds is this: it should be just as easy to build a Zig project on an isolated system as it is on a connected one. It must be straightforward to recursively download all dependencies on a connected machine. And likewise to transfer and vendor them locally on the isolated machine -- which will be the one performing the build. This is one reason why I've been advocating to keep the package manager and compiler as separate entities. All the connected machine would need in order to collect build dependencies is a small static-linked executable. |
Offline builds and URL-namespaced dependencies are not mutually exclusive, but it would require intelligent caching or a vendoring mechanism. The Go toolchain is an example of this. |
I agree with everything in this comment: #943 (comment) An anecdote: Part of my job involves maintaining a .Net project. We use NuGet for that project. One of our direct dependencies had a dependency on some specific version of a library. Another one of our direct dependencies had a dependency on some other version of the same library. The standard "solution" for this problem in the .Net ecosystem is a binding redirect, which (I believe) basically means choosing one version of the library, linking it to an assembly that expects a different version, and then praying that nothing goes wrong at runtime. Using binding redirects wasn't an option for us (it's complicated) so we had to refactor the project. The rest of this comment is basically an attempt to apply the dependency inversion principle. The dependency inversion principle says that software components should depend on abstractions, not on concrete implementations. In object-oriented programming, the dependency inversion principle gives us dependency injection: Any IO class that has dependencies should express those dependencies with some set of interface types in the constructor parameter list. Before anything else happens in the application, the concrete dependencies for a given IO class are instantiated, and then the IO class itself is instantiated. IO classes do not choose their own concrete dependencies. The type system for the programming language is able to validate this process because each IO class declares which interfaces it implements, and any invalid declaration will cause an error. In package-management, I think, the dependency inversion principle gives us an analogous goal: Any library that has dependencies should express those dependencies with some set of abstract specifications in the package import list. Before anything else happens in the build process, the concrete dependencies for a given library are resolved, and then the library itself is resolved. Libraries do not choose their own concrete dependencies. The type system for the package manager is able to validate this process because each library declares which abstract specifications it implements, and any invalid declaration will cause an error. Obviously you would need some way to express an "abstract specification" that covers everything important about the API. In a language like C, you would probably want to use header files. And header files would probably be good enough, if you don't need the minor version distinction that SemVer tries to encode. But if you do need the minor version distinction, then you probably need some concept of subtyping in the type system for the package manager, and I'm not sure how viable that is outside of an object-oriented language. Some notes: I talk about "IO classes" because I don't really use dependency injection for datatypes or utility classes. This raises some questions, and I'm not sure what the answers are. I'm also not sure if this comment really makes sense, but I figured I might as well share my perspective. |
I think zig can already do "dependency injection" for your use case: |
Hi. I just learned about Zig a few days ago and have been learning about it for a bit. It looks like a really awesome language. I'd like to voice support for decentralized Deno-style package management: Pros:
Cons:
Ultimately I think the arguments for / against decentralized package management come down to trust. With a centralized package manager, you trust its maintainers to keep packages immutable and mostly always available. With a decentralized approach, you trust your chosen package hosts and proxy, and you get the advantage of being able to choose who you trust to fill those roles. Additionally, Deno optionally allows for import maps to let the user decide which alias they want to use in their code. Theoretically, I suppose this could allow the user to depend on two different versions of the same package under separate aliases ("my-package", "my-package-v2") which is a pretty cool plus. I like the approach proposed in #943 (comment) as it seems to map closely to what I mentioned. But I'm curious: What's the disadvantage of putting the SHAs in a separate file like a lockfile? Maybe one could optionally omit those values in the Thanks for the amazing work! |
great comment to point out. All current package managers are terrible in this regard, they let me upgrade packages and then later at build time I get a errors that technically the package manager could have known about. It knows which functions are exported. It knows which functions I use. It could tell me if an upgprade will let me build or not. I think I can guess though why nobody ever implemented this... it's a lot of work for a quality of life improvement that is perceived as tiny or irrelevant by most people who dont work in very large projects with interconnected dependent libraries. |
Hey guys. Just wanted to mention that a nodejs package manager: Yarn 2 (codenamed It has a plugin-centric architecture, and there has already been experimentation for making it work with C/C++: Might be pretty easy to get an MVP ziglang package-manager quickly going, at least as an interim solution. |
Honestly, an official, language-wide standard for managing dependencies would be a big win over C/C++ and their hodgepodge of warring factions. I request that a section be devoted to development-time dependencies, such as linters and test frameworks, and an equivalent to |
I would like to remind you about the deps
Same first note + how about my private libs ? should we pay for private repos ??!!!; i suggest use the
Why there's something called Additional ideas :
Thoughts that might be so horrible for Zig:
|
This doesn't work when you want one package to be able to access another one. Like if you have a "logging" package and "auth" package, auth package cannot access the logging package, because the addPackagePath resolution only applies to the root source file. (I am using a workaround in one of my projects: instead of using addPackagePath in build.zig, have the main (root) file import specific implementations of each "package" and make them |
Hm; @CantrellD's comment, and @419928194516's comment it builds upon, reminded me somewhat of the approaches explored in:
As such, I'm wondering if it could make some sense to explore some similar API<->implementation decoupling in the package manager. Notably, if yes, I would imagine:
Just some ideas that came to my mind after reading a huge chunk of the thread above (though I can't 100% guarantee if I managed to internalize all of it). I'm not even sure if that's at all doable, notwithstanding whether this should be done at all; but I'm interpreting that this is still mostly a brainstorm phase, so throwing in my brainstorm contribution. Cheers and good luck! |
For managing C dependencies (and maybe Zig dependencies too, eventually?) it might be useful to be able to request the system-provided version of a given library. At least on Linux. Let's say you have a library that uses Curl and OpenSSL and you want to link it against the system-provided versions of those libraries. It'd be nice if the Zig package manager could be configured to skip downloading them if a compatible version of the development library is already installed on your system in the traditional way e.g. Of course, that can get tricky with the differences between families (Debian, Fedora, Arch) and within families (Debian/Ubuntu, Fedora/RHEL/CentOS). You might need to list the package names for a couple of different distros and have some way of constraining the acceptable versions. I can see how the cost/benefit could be questioned, it just strikes me as an interesting idea, especially since Zig is so nicely suited to being used with dynamic linking. |
What are your thoughts on PNPM? It makes sure (among other things) that the same package (with same version) only has 1 copy of it in the system instead of the same package being used in multiple projects on the system in multiple node_modules folders. While this is great for JS, I wonder if this is possible for a compiled language. (Apologies in advance if it has been mentioned before in this thread I couldn't go through all of it) |
Go announced a security vulnerability today regarding malicious code execution when running This is something to keep in mind for the package manager, and perhaps more broadly, the build system. |
Yep that's one of the main reasons build scripts use a declarative API. Idea being that the package manager and build system could still run the build.zig script in a sandbox and collect a serialized description of how to build things. Packages which use the standard build API and do not try to run arbitrary code would be eligible for this kind of security, making it practical to make it a permissions flag you'd have to enable: to allow native code execution from a given dependency's build script. Until we have such an advanced feature, however, it should be known that Ultimately I think what we will end up with is that some projects will want to execute arbitrary code as a part of the build script, but for most packages it will not be necessary, so that it can be no big deal to explicitly allow an "arbitrary code execution flag" on those dependencies that need it. |
What if build.zig had to work completely at comptime? That way it'd already be sandboxed to whatever is possible at comptime.... (i.e. side-effect free) |
Is it really neccessary to have the dependencies inside the I would expect building a package to be an step unrelated to downloading the sources and its dependencies. That is, similarly to what one does when using sources to generate binary packages in a distribution. For example in debian, after you have downloaded some debian-aware sources for a program, you use "dpkg-checkbuilddeps" to check if the dependencies are satisfied, and "apt-get build-dep" to request the download of the dependencies. And then you can start the build/test/fix/rebuild cycle (be it with make, fakeroot debian/rules binary, dpkg-buildpackage or debuild). I say this because it would make me nervous that doing a compilation could download and update a dependency (which would be bad if there are issues I want replicate and debug, as the production code and my code would be using different versions of a dependency). Note that I say this as an outsider that has recently discovered Zig and is evaluating doing some project with it, so I may be missing something or misunderstood the proposal. |
Requiring people who publish packages to own the domain they publish to is a basic security check that is worth it in my opinion. Say somebody publishes a zig package called "com.microsoft.uriparser". |
If you want inspiration, I have not dug in how it works internally but Dart Pub https://dart.dev/guides/packages is the best package manager I have work with, and I work daily with npm, go, maven, pods and some others. It just works seamesly, I'm all the time switching flutter and dart versions and pub reinstall all dependencies so quickly with zero troubles everytime I switch between flutter channels. |
@kidandcat would you elaborate on why Dart Pub your best package manager compares to others? |
Maybe I'm just stupid, but wouldn't immutable packages make it really hard to push bug fixes and updates? Or would each version just be its own frozen entity? |
the latter @ElectricCoffee |
I really recommend this talk by Rich Hickey (the author of Clojure) on dependency management https://github.com/matthiasn/talk-transcripts/blob/master/Hickey_Rich/Spec_ulation.md. You'll learn that semantic versioning is no panacea. |
Related to what @gphilipp wrote, Leiningen, the most popular build tool for Clojure, can be extended with plugins. One of these plugins takes care of versioning following the approach 1 git commit = 1 version
|
Latest Proposal
Zig needs to make it so that people can effortlessly and confidently depend on each other's code.
Depends on #89The text was updated successfully, but these errors were encountered: