Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

package manager #943

Open
andrewrk opened this issue Apr 22, 2018 · 97 comments
Open

package manager #943

andrewrk opened this issue Apr 22, 2018 · 97 comments

Comments

@andrewrk
Copy link
Member

@andrewrk andrewrk commented Apr 22, 2018

Latest Proposal


Zig needs to make it so that people can effortlessly and confidently depend on each other's code.

Depends on #89

@phase
Copy link
Contributor

@phase phase commented Apr 23, 2018

My thoughts on Package Managers:

  • Packages should be immutable in the package repository (so the NPM problem doesn't arise).

  • Making a new release with a nice changelog should be simple. Maybe integration for reading releases from GitHub or other popular source hosts?

  • Packages should only depend on packages that are in the package repository (no GitHub urls like Go uses).

  • Private instances of the package repository should be supported from the get go (I believe this was a problem for Rust or some other newer language).


Thoughts that might not be great for Zig:

  • Enforcing Semver is an interesting concept, and projects like Elm have done it.
@andrewchambers
Copy link

@andrewchambers andrewchambers commented Apr 23, 2018

This is a good reference for avoiding the complexity of package managers like cargo, minimal version selection is a unique approach that avoids lockfiles, .modverify avoids deps being changed out from under you.

https://research.swtch.com/vgo

The features around verifiable builds and library verification are also really neat. Also around staged upgrades of libraries and allowing multiple major versions of the same package to be in a program at once.

@bnoordhuis
Copy link
Contributor

@bnoordhuis bnoordhuis commented Apr 23, 2018

Packages should be immutable in the package repository (so the NPM problem doesn't arise).

I assume you mean authors can't unpublish without admin intervention. True immutability conflicts with the hoster's legal responsibilities in most jurisdictions.

minimal version selection

I'd wait a few years to see how that pans out for Go.

@andrewchambers
Copy link

@andrewchambers andrewchambers commented Apr 23, 2018

Note that by minimal, they mean minimal that the authors said was okay. i.e. the version they actually tested. The author of the root module is always free to increase the minimum. It is just that the minimum isn't some arbitrary thing that changes over time when other people make releases.

@BraedonWooding
Copy link
Contributor

@BraedonWooding BraedonWooding commented Apr 23, 2018

My top three things are;

  • No lockfiles
  • KISS, I don't want to fight with the package manager, also should fully be integrated into build.zig no external programs.
  • Allow a file to be on each dependency to allow that dependency to go and download its own dependencies, this file should also maintain a change-log.

A good package manager can break/make a language, one of the reasons why Go has ditched atleast one of its official package managers and completely redid it (it may even be two, I haven't kept up to date with that scene).

@andrewrk
Copy link
Member Author

@andrewrk andrewrk commented Apr 23, 2018

The first thing I'm going to explore is a decentralized solution. For example, this is what package dependencies might look like:

const Builder = @import("std").build.Builder;
const builtin = @import("builtin");

pub fn build(b: &Builder) void {
    const mode = b.standardReleaseOptions();

    var exe = b.addExecutable("tetris", "src/main.zig");
    exe.setBuildMode(mode);

    exe.addGitPackage("clap", "https://github.com/Hejsil/zig-clap",
        "0.2.0", "76c50794004b5300a620ed71ef58e4444455fd72e7f7e8f70b7d930a040210ff");
    exe.addUrlPackage("png", "http://example.com/zig-png.tar.gz",
        "00e27a29ead4267e3de8111fcaa59b132d0533cdfdbdddf4b0604279acbcf4f4");

    b.default_step.dependOn(&exe.step);
}

Here we provide a mapping of a name and a way for zig to download or otherwise acquire the source files of the package to depend on.

Since the build system is declarative, zig can run it and query the set of build artifacts and their dependencies, and then fetch them in parallel.

Dependencies are even stricter than version locking - they are source-locked. In both examples we provide a SHA-256 hash, so that even a compromised third party provider cannot compromise your build.

When you depend on a package, you trust it. It will run zig build on the dependency to recursively find all of its dependencies, and so on. However, by providing a hash, you trust only the version you intend to; if the author updates the code and you want the updates, you will have to update the hash and potentially the URL.

Running zig build on dependencies is desirable because it provides a package the ability to query the system, depend on installed system libraries, and potentially run the C/C++ compiler. This would allow us to create Zig package wrappers for C projects, such as ffmpeg. You would even potentially use this feature for a purely C project - a build tool that downloads and builds dependencies for you.

@ghost
Copy link

@ghost ghost commented Apr 23, 2018

and potentially run the C/C++ compiler
cmd/go: arbitrary code execution during “go get” #23672

although you might argue

Dependencies are even stricter than version locking - they are source-locked. In both examples we provide a SHA-256 hash, so that even a compromised third party provider cannot compromise your build.

in that case you'd have to check all the reps of all your reps recursively (manually?) on each shape change though to be really sure

@andrewrk
Copy link
Member Author

@andrewrk andrewrk commented Apr 23, 2018

in that case you'd have to check all the reps of all your reps recursively (manually?) on each shape change though to be really sure

This is already true about all software dependencies.

@BraedonWooding
Copy link
Contributor

@BraedonWooding BraedonWooding commented Apr 26, 2018

I've been considering how one could do this for the past few days, here is what I generally came up with (this is based off @andrewrk 's idea), I've kept out hashes to make it easier, I'm more talking about architecture then implementation details here;

  • No lock files, purely source driven
  • Have a central repository of all downloaded files (like under /usr/local/zig-vendor and have a built in to access them like @vImport("BraedonWooding/ZigJSON"), or have a unique vendor location for each 'zig' build file or rather each project, in which case we autogenerate a nice index.zig file for you to access like const vendor = @import("vendor/index.zig"); const json = vendor.ZigJSON.
  • Utilise them during building, we can go and download any new dependencies.
  • Automatically download the latest dependency as per the user requirements that is either (for structure x.y.z);
    • Keep major version i.e. upgrade y and z but not x
    • Keep minor and major i.e. upgrade all 'z' changes.
    • Keep version explicitly stated
    • Always upgrade to latest

This would also solve the issue of security fixes as most users would keep the second option which is intended for small bug fixes that don't introduce any new things, whereas the major version is for breaking changes and the minor is for new changes that are typically non-breaking.

Your build file would have something like this in your 'build' function;

...
builder.addDependency(builder.Dependency.Git, "github.com.au", "BraedonWooding", "ZigJSON", builder.Versions.NonMajor);
// Or maybe
builder.addDependency(builder.Dependency.Git, "github.com.au/BraedonWooding/ZigJSON", builder.Versions.NonMajor);
// Or just
builder.addGitDependency("github.com.au/BraedonWooding/ZigJSON", builder.Versions.NonMajor);
...

Keeping in mind that svn and mercurial (as well as plenty more) are also used quite a bit :). We could either use just a folder system of naming to detect what we have downloaded, or have a simple file storing information about all the files downloaded (note: NOT a lock file, just a file with information on what things have been downloaded). Would use tags to determine versions but could also have a simple central repository of versions linking to locations like I believe what other things have.

@isaachier
Copy link
Contributor

@isaachier isaachier commented May 2, 2018

How would you handle multiple definitions of the same function? I find this to be the most difficult part of C/C++ package management. Or does Zig use some sort of package name prefixing?

@BraedonWooding
Copy link
Contributor

@BraedonWooding BraedonWooding commented May 2, 2018

@isaachier Well you can't have multiple definitions of a function in Zig, function overloads aren't a thing (intended).

You would import a package like;

const Json = @Import("JSON/index.zig");

fn main() void {
    Json.parse(...);
    // And whatever
}

When you 'include' things in your source Zig file they are exist under a variable kinda like a namespace (but simpler), this means that you should generally never run into multiple definitions :). If you want to 'use' an import like using in C++ you can do something like use Json; which will let you use the contents without having to refer to Json for example in the above example it would just be parse(...) instead of Json.parse(...) if you used use, you still can't use private functions however.

If for some reason you 'use' two 'libraries' that have a dual function definition you'll get an error and will most likely have to put one under a namespace/variable, very rarely should you use use :).

@isaachier
Copy link
Contributor

@isaachier isaachier commented May 2, 2018

I don't expect a clash in the language necessarily, but in the linker aren't there duplicate definitions for parse if multiple packages define it? Or is it automatically made into Json_parse?

@Hejsil
Copy link
Member

@Hejsil Hejsil commented May 2, 2018

@isaachier If you don't define your functions as export fn a() void, then Zig is allowed to rename the functions to avoid collisions.

@isaachier
Copy link
Contributor

@isaachier isaachier commented May 3, 2018

OK that makes sense. About package managers, I'm sure I'm dealing with experts here 😄, but wanted to make sure a few points are addressed for completeness.

  • Should multiple versions of the same library be allowed? This can occur when library A relies on libraries B and C. A needs C version 2 and B needs C version 1. How do you handle that scenario? I'm not sure about the symbol exports, but that might be an issue if you intend to link in both versions.
  • Are the packages downloaded independently for each project or cached on the local disk (like maven and Hunter). In the latter case, you have to consider the use of build flags and their effect on the shared build.
@andrewrk
Copy link
Member Author

@andrewrk andrewrk commented May 3, 2018

These are important questions.

The first question brings up an even more fundamental question which we have to ask ourselves if we go down the decentralized package route: how do you even know that a given package is the same one as another version?

For example, if FancyPantsJson library is mirrored on GitHub and BitBucket, and you have this:

// in main package
exe.addGitPackage("fancypantsjson", "https://github.com/mrfancypants/zig-fancypantsjson",
    "1.0.1", "76c50794004b5300a620ed71ef58e4444455fd72e7f7e8f70b7d930a040210ff");

// in a nested package
exe.addGitPackage("fancypantsjson", "https://bitbucket.org/mirrors-r-us/zig-fancypants.git",
    "1.0.1", "76c50794004b5300a620ed71ef58e4444455fd72e7f7e8f70b7d930a040210ff");

Here, we know that the library is the same because the sha-256 matches, and that means we can use the same code for both dependencies. However, consider if one was on a slightly newer version:

// in main package
exe.addGitPackage("fancypantsjson", "https://github.com/mrfancypants/zig-fancypantsjson",
    "1.0.2", "dea956b9f5f44e38342ee1dff85fb5fc8c7a604a7143521f3130a6337ed90708");

// in a nested package
exe.addGitPackage("fancypantsjson", "https://bitbucket.org/mirrors-r-us/zig-fancypants.git",
    "1.0.1", "76c50794004b5300a620ed71ef58e4444455fd72e7f7e8f70b7d930a040210ff");

Because this is decentralized, the name "fancypantsjson" does not uniquely identify the package. It's just a name mapped to code so that you can do @import("fancypantsjson") inside the package that depends on it.

But we want to know if this situation occurs. Here's my proposal for how this will work:

comptime {
    // these are random bytes to uniquely identify this package
    // developers compute these once when they create a new package and then
    // never change it
    const package_id = "\xfb\xaf\x7f\x45\x86\x08\x10\xec\xdb\x3c\xea\xb4\xb3\x66\xf9\x47";

    const package_info = @declarePackage(package_id, builtin.SemVer {
        .major = 1,
        .minor = 0,
        .revision = 1,
    });

    // these are the other packages that were not even analyzed because they
    // called @declarePackage with an older, but API-compatible version number.
    for (package_info.superseded) |ver| {
        @compileLog("using 1.0.1 instead of", ver.major, ver.minor, ver.revision);
    }

    // these are the other packages that have matching package ids, but
    // will additionally be compiled in because they do not have compatible
    // APIs according to semver
    for (package_info.coexisting) |pkg| {
        @compileLog("in addition to 1.0.1 this version is present",
            pkg.sem_ver.major, pkg.sem_ver.minor, pkg.sem_ver.revision);
    }
}

The prototype of this function would be:

// thes structs declared in @import("builtin");
pub const SemVer = struct {
    major: @typeOf(1),
    minor: @typeOf(1),
    revision: @typeOf(1),
};
const Namespace = @typeOf(this);
pub const Package = struct {
    namespace: Namespace,
    sem_ver: SemVer,
};
pub const PackageUsage = struct {
    /// This is the list of packages that have declared an older,
    /// but API-compatible version number. So zig stopped analyzing
    /// these versions when it hit the @declarePackage.
    superseded: []SemVer,

    /// This is the list of packages that share a package id, but
    /// due to incompatible versions, will coexist with this version.
    coexisting: []Package,
};

@declarePackage(comptime package_id: [16]u8, comptime version: &const SemVer) PackageUsage

Packages would be free to omit a package declaration. In this case, multiple copies of the
package would always coexist, and zig package manager would be providing no more than
automatic downloading of a resource, verification of its checksum, and caching.

Multiple package declarations would be a compile error, as well as @declarePackage somewhere
other than the first Top Level Declaration in a Namespace.

Let us consider for a moment, that one programmer could use someone else's package id, and then
use a minor version greater than the existing one. Via indirect dependency, they could "hijack"
the other package because theirs would supersede it.

At first this may seem like a problem, but consider:

  • Most importantly, when a Zig programmer adds a dependency on a package and verifies the checksum,
    they are trusting that version of that package. So the hijacker has been approved.
  • If the hijacker provides a compatible API, they are intentionally trying to create a drop-in replacement
    of the package, which may be a reasonable thing to do in some cases. An open source maintainer
    can essentially take over maintenance of an abandoned project, without permission, and offer
    an upgrade path to downstream users as simple as changing their dependency URL (and checksum).
  • In practice, if they fail to provide a compatible API, runtime errors and compile time errors
    will point back to the hijacker's code, not the superseded code.

Really, I think this is a benefit of a decentralized approach.

Going back to the API of @declarePackage, here's an example of power this proposal gives you:

const encoding_table = blk: {
    const package_id = "\xfb\xaf\x7f\x45\x86\x08\x10\xec\xdb\x3c\xea\xb4\xb3\x66\xf9\x47";

    const package_info = @declarePackage(package_id, builtin.SemVer {
        .major = 2,
        .minor = 0,
        .revision = 0,
    });

    for (package_info.coexisting) |pkg| {
        if (pkg.sem_ver.major == 1) {
            break :blk pkg.namespace.FLAC_ENCODING_TABLE;
        }
    }

    break :blk @import("flac.zig").ENCODING_TABLE;
};

// ...

pub fn lookup(i: usize) u32 {
    return encoding_table[i];
}

Here, even though we have bumped the major version of this package from 1 to 2, we know that the FLAC ENCODING TABLE is unchanged, and perhaps it is 32 MB of data, so best to not duplicate it unnecessarily. Now even versions 1 and 2 which coexist, at least share this table.

You could also use this to do something such as:

if (package_info.coexisting.len != 0) {
    @compileError("this package does not support coexisting with other versions of itself");
}

And then users would be forced to upgrade some of their dependencies until they could all agree on a compatible version.

However for this particular use case it would be usually recommended to not do this, since there would be a general Zig command line option to make all coexisting libraries a compile error, for those who want a squeaky clean dependency chain. ReleaseSmall would probably turn this flag on by default.


As for your second question,

Are the packages downloaded independently for each project or cached on the local disk (like maven and Hunter). In the latter case, you have to consider the use of build flags and their effect on the shared build.

Package caching will happen like this:

  • Caching of the source download (e.g. this .tar.gz has this sha-256, so we can skip downloading it if we already have it)
  • The same binary caching strategy that zig uses for every file in every project

Caching is an important topic in the near future of zig, but it does not yet exist in any form. Rest assured that we will not get caching wrong. My goal is: 0 bugs filed in the lifetime of zig's existence where the cause was a false positive cache usage.

@andrewrk
Copy link
Member Author

@andrewrk andrewrk commented May 3, 2018

One more note I want to make:

In the example above I have:

exe.addGitPackage("fancypantsjson", "https://github.com/mrfancypants/zig-fancypantsjson",
    "1.0.2", "dea956b9f5f44e38342ee1dff85fb5fc8c7a604a7143521f3130a6337ed90708");

Note however that the "1.0.2" only tells Zig how to download from a git repository ("download the commit referenced by1.0.2"). The actual version you are depending on is the one that is set with @declarePackage in the code that matches the SHA-256.

So the package dependency can be satisfied by any semver-compatible version indirectly or directly depended on.

With that in mind, this decentralized strategy with @declarePackage even works if you do any of the following things:

  • use a git submodule for the package
    • you can use addDirPackage("name", "/path/to/dir", "a3951217c609a5a9c5a100e5f3c37a4e8b14796642138ee613db46daca7d43c7").
  • just copy+paste the package into your own source code
    • same thing, use addDirPackage

You can also force your dependency's dependency's dependency (and so on) to upgrade, simply by adding a direct dependency on the same package id with a minor or revision bump.

And to top it off you can purposefully inject code into your dependency's dependency's dependency (and so on), by:

  • forking or otherwise copy pasting the package in question
  • bumping the minor version
  • adding a direct dependency on your fork
  • do your code edits in the fork

This strategy could be used, for example, to add @optimizeFor(.Debug) in some tricky areas you're trying to troubleshoot in a third party library, or perhaps you found a bottleneck in a third party library and you want to add @optimizeFor(.ReleaseFast) to disable safety in the bottleneck. Or maybe you want to apply a patch while you're waiting for upstream to review and accept it, or a patch that will be coming out in the next version but isn't released yet.

@andrewrk
Copy link
Member Author

@andrewrk andrewrk commented May 4, 2018

Another note: this proposal does not actually depend on the self hosted compiler. There is nothing big blocking us from starting to implement it. It looks like:

  • Implementation of @declarePackage in the c++ compiler. At this point you could test package management with the CLI.
  • Add the API to zig build system, e.g. addDirPackage. At this point we have a working package manager that you could use with git submodule or copy+pasting the package you want to depend on into your codebase. (But read on - this can be further improved)
  • Networking in the standard library. Probably best to use coroutines and async I/O for this. Related: #910. I also need to document coroutines (#367)
  • Add support for .tar.gz .tar.xz, .zip, etc to standard library
  • Now we can implement addUrlPackage
  • Add support for downloading a particular revision via git / svn
  • Now we can implement addGitPackage / addSvnPackage
@clownpriest
Copy link
Contributor

@clownpriest clownpriest commented Jun 7, 2018

maybe worth considering p2p distribution and content addressing with ipfs?

see https://github.com/whyrusleeping/gx for example

just a thought

@costincaraivan
Copy link

@costincaraivan costincaraivan commented Jun 7, 2018

One important thing to note, especially for adoption by larger organization: think about a packaging format and a repo structure that is proxy/caching/mirroring friendly and that also allows an offline mode.

That way the organization can easily centralize their dependencies instead of having everyone going everywhere on the internet (a big no-no for places such as banks).

Play around a bit with Maven and Artifactory/Nexus if you haven't already 😉

@andrewrk
Copy link
Member Author

@andrewrk andrewrk commented Jun 7, 2018

The decentralized proposal I made above is especially friendly to p2p distribution, ipfs, offline modes, mirroring, and all that stuff. The sha-256 hash ensures that software is built according to expectations, and the matter of where to fetch the resources can be provided by any number of "plugins" for how to download something:

  • http (don't even need https because of the sha-256)
  • git
  • svn
  • torrent
  • ipfs
  • nfs
  • some custom thing, it doesn't matter, as long as the bytes can be delivered
@costincaraivan
Copy link

@costincaraivan costincaraivan commented Jun 8, 2018

Looks good but I'd have to try it out in practice before I can say for sure 😄

I'd have one suggestion: for naming purposes, maybe it would be a good idea to also have a "group" or "groupId" concept?

In many situations it's useful to see the umbrella organization from which the dependency comes. Made up Java examples:

  1. group: org.apache, name: httpclient.
  2. group: org.apache, name: regexutils.

Otherwise what happens is that people basically overload the name to include the group, everyone in their own way (apache-httpclient, regexutils-apache). Or they just don't include it and you end up with super generic names (httpclient).

It also prevents or minimizes "name squatting". I.e. the first comers get the best names and then they abandon them...

@isaachier
Copy link
Contributor

@isaachier isaachier commented Jun 8, 2018

Structs provide the encapsulation you are looking for @costincaralvan. They seem to act as namespaces would in C++.

@demircancelebi
Copy link

@demircancelebi demircancelebi commented Jun 8, 2018

I agree with @costincaraivan. npm has scoped packages for example: https://docs.npmjs.com/getting-started/scoped-packages.

In addition to minimizing name squatting and its practical usefulness (being able to more easily depend on a package if it is coming from an established organization or a well-known developer), honoring the creators of a package besides their creation sounds more respectful in general, and may incentivize people to publish more of their stuff :).

On the other hand, generic package names also come in handy because there is one less thing to remember when installing them.

@costincaraivan
Copy link

@costincaraivan costincaraivan commented Jun 8, 2018

I didn't want to clutter the issue anymore but just today I bumped into something which is in my opinion relevant for the part I posted about groups (or scoped packages in NPM parlance):

http://bitprophet.org/blog/2012/06/07/on-vendorizing/

Look at their dilemma regarding the options, one of the solutions is forking the library:

Fork and release our own package on PyPI as e.g. fluidity-invoke.

  • This works, but has many the drawbacks of the vendorizing option and offers few of the benefits.
  • It also confuses things re: project ownership and who should receive/act on bug reports. Users new to the space might focus on your fork instead of upstream, forcing you to either handle their problems, or redirect them.

This would be easily solvable with another bit of metadata, the group. In Java world their issue would be solved by forking the library and then publishing it under the new group. Because of the group it's immediately obvious that the library was forked. Even easier to figure out in a repository browser of sorts since the original version would have presumably many versions while the fork will probably have 1 or 2.

@FlyingWombat
Copy link

@FlyingWombat FlyingWombat commented Oct 12, 2020

From the above link to Deno's docs:

import { assertEquals } from "https://deno.land/std@0.73.0/testing/asserts.ts";

I don't think it's a good idea to bake URLs into Zig source code.

Deno is a JS/TS runtime -- a web technology. Allowing URLs in source imports makes sense for it.

Zig, on the other hand is a general purpose programming language, and needs 1st-class support for offline builds.

@manast
Copy link

@manast manast commented Oct 13, 2020

@FlyingWombat I am not saying it should mimic Deno, just use as a source of inspiration since they have solved similar problems.

I am not sure what you mean with offline builds. Deno is just a runtime for server side code, it has the exact same requirements for building offline as Zig. If you read the documentation you will see that all dependencies are cached the first time you run (build) your application.

@FlyingWombat
Copy link

@FlyingWombat FlyingWombat commented Oct 14, 2020

... just use as a source of inspiration ...

This I take no issue with. But I felt that many would just look at Deno's import syntax as the primary feature (as I did); and I disagree with that syntax for Zig. How Deno implemented it's package management, yes could have some valuable insight. If you have a specific feature or implementation detail from Deno's package management that you would like to highlight, please mention it.

What I mean by 1st-class support for offline builds is this: it should be just as easy to build a Zig project on an isolated system as it is on a connected one.

It must be straightforward to recursively download all dependencies on a connected machine. And likewise to transfer and vendor them locally on the isolated machine -- which will be the one performing the build. This is one reason why I've been advocating to keep the package manager and compiler as separate entities. All the connected machine would need in order to collect build dependencies is a small static-linked executable.

@jayschwa
Copy link
Contributor

@jayschwa jayschwa commented Oct 16, 2020

Offline builds and URL-namespaced dependencies are not mutually exclusive, but it would require intelligent caching or a vendoring mechanism. The Go toolchain is an example of this.

@CantrellD
Copy link

@CantrellD CantrellD commented Oct 18, 2020

I agree with everything in this comment: #943 (comment)

An anecdote: Part of my job involves maintaining a .Net project. We use NuGet for that project. One of our direct dependencies had a dependency on some specific version of a library. Another one of our direct dependencies had a dependency on some other version of the same library. The standard "solution" for this problem in the .Net ecosystem is a binding redirect, which (I believe) basically means choosing one version of the library, linking it to an assembly that expects a different version, and then praying that nothing goes wrong at runtime. Using binding redirects wasn't an option for us (it's complicated) so we had to refactor the project.

The rest of this comment is basically an attempt to apply the dependency inversion principle. The dependency inversion principle says that software components should depend on abstractions, not on concrete implementations.

In object-oriented programming, the dependency inversion principle gives us dependency injection: Any IO class that has dependencies should express those dependencies with some set of interface types in the constructor parameter list. Before anything else happens in the application, the concrete dependencies for a given IO class are instantiated, and then the IO class itself is instantiated. IO classes do not choose their own concrete dependencies. The type system for the programming language is able to validate this process because each IO class declares which interfaces it implements, and any invalid declaration will cause an error.

In package-management, I think, the dependency inversion principle gives us an analogous goal: Any library that has dependencies should express those dependencies with some set of abstract specifications in the package import list. Before anything else happens in the build process, the concrete dependencies for a given library are resolved, and then the library itself is resolved. Libraries do not choose their own concrete dependencies. The type system for the package manager is able to validate this process because each library declares which abstract specifications it implements, and any invalid declaration will cause an error.

Obviously you would need some way to express an "abstract specification" that covers everything important about the API. In a language like C, you would probably want to use header files. And header files would probably be good enough, if you don't need the minor version distinction that SemVer tries to encode. But if you do need the minor version distinction, then you probably need some concept of subtyping in the type system for the package manager, and I'm not sure how viable that is outside of an object-oriented language.

Some notes: I talk about "IO classes" because I don't really use dependency injection for datatypes or utility classes. This raises some questions, and I'm not sure what the answers are. I'm also not sure if this comment really makes sense, but I figured I might as well share my perspective.

@MasterQ32
Copy link
Contributor

@MasterQ32 MasterQ32 commented Oct 18, 2020

I think zig can already do "dependency injection" for your use case:
Just provide a different file under the package identifier which needs to fulfil the same public API ("header file") and it would work, as long as the API is truly compatible. The package manager should be able to allow such overrides

@andrewrothman
Copy link

@andrewrothman andrewrothman commented Oct 25, 2020

Hi. I just learned about Zig a few days ago and have been learning about it for a bit. It looks like a really awesome language.

I'd like to voice support for decentralized Deno-style package management:

Pros:

  • decentralized
  • identifying imports by URLs allows for adding support for new protocols later, or dynamically via registering code to fetch packages with specific URL protocols at compile time (ie. "git", "ipfs", "ssb", "dat, "magnet", etc.)

Cons:

  • cannot enforce immutability - this can be addressed this stored checksum verification and an offline cache which can be committed to source control if desired
  • cannot enforce availability - this can be addressed with a proxy, either public or private, and offline cache. Note that this can also be an issue with centralized repos: See the unpublishing of left-pad from NPM.

Ultimately I think the arguments for / against decentralized package management come down to trust. With a centralized package manager, you trust its maintainers to keep packages immutable and mostly always available. With a decentralized approach, you trust your chosen package hosts and proxy, and you get the advantage of being able to choose who you trust to fill those roles.

Additionally, Deno optionally allows for import maps to let the user decide which alias they want to use in their code. Theoretically, I suppose this could allow the user to depend on two different versions of the same package under separate aliases ("my-package", "my-package-v2") which is a pretty cool plus.

I like the approach proposed in #943 (comment) as it seems to map closely to what I mentioned. But I'm curious: What's the disadvantage of putting the SHAs in a separate file like a lockfile? Maybe one could optionally omit those values in the addUrlPackage and in that case they will be automatically added to an optional lockfile or replaced in the build.zig inline? While working, I'd like to be able to specify a URL and have the language tooling calculate and store the SHA for me. Another small benefit of this would be the ability to specify URLs directly in import statements, without adding them to the build.zig file, which would make testing out new packages incredibly easy. Any thoughts?

Thanks for the amazing work!

@Meai
Copy link

@Meai Meai commented Oct 25, 2020

I agree with everything in this comment: #943 (comment)

great comment to point out. All current package managers are terrible in this regard, they let me upgrade packages and then later at build time I get a errors that technically the package manager could have known about. It knows which functions are exported. It knows which functions I use. It could tell me if an upgprade will let me build or not. I think I can guess though why nobody ever implemented this... it's a lot of work for a quality of life improvement that is perceived as tiny or irrelevant by most people who dont work in very large projects with interconnected dependent libraries.

@andreialecu
Copy link

@andreialecu andreialecu commented Oct 30, 2020

Hey guys. Just wanted to mention that a nodejs package manager: Yarn 2 (codenamed berry) might be worth looking into.

It has a plugin-centric architecture, and there has already been experimentation for making it work with C/C++:
yarnpkg/berry#1697

Might be pretty easy to get an MVP ziglang package-manager quickly going, at least as an interim solution.

@mcandre
Copy link

@mcandre mcandre commented Dec 8, 2020

Honestly, an official, language-wide standard for managing dependencies would be a big win over C/C++ and their hodgepodge of warring factions.

I request that a section be devoted to development-time dependencies, such as linters and test frameworks, and an equivalent to bundle exec / npm bin for managing Zig utilties on a per-project basis.

@nektro
Copy link
Contributor

@nektro nektro commented Dec 9, 2020

@annymosse
Copy link

@annymosse annymosse commented Dec 9, 2020

@andrewchambers

  • Packages should be immutable in the package repository (so the NPM problem doesn't arise).

I would like to remind you about the deps node_modules hell that made by npm(yes there's some solutions to eliminate that hell however why do we need to roaming around of it instead pass it on beginning?) and the best solution is made by go & deno.

  • Packages should only depend on packages that are in the package repository (no GitHub urls like Go uses).

Same first note + how about my private libs ? should we pay for private repos ??!!!; i suggest use the deno & go solution to save disk-space eliminate dependencies folder such as node_modules and save data bandwidth for clients and the repository server (less traffic & high-availability).

Thoughts that might not be great for Zig:

  • Enforcing Semver is an interesting concept, and projects like Elm have done it.

Why there's something called Semver? isn't it to remove the deps compatibility hell ?; as a coder it is so easy to me to know when should i upgrade my lib or not only by reading the x.y.z (BreakingChanges.Features.PatchBugs) without read the change log at all.

Additional ideas :

  • something such as npm scripts to register & reuse the registered scripts (to stop rewrite long commands & args).
  • ability to show the deps graph.
  • ability to configure the cache folder for deps.
  • ability to remove dead deps (orphan libs which are not under usage anymore to keep the disk-space clean).
  • built-in audited deps which are inside the central repository / bot to scan zig files and detect vulnerabilities such as CodeQl.
  • built-in docs builder.
  • built-in unit tests.
  • built-in test coverage.

Thoughts that might be so horrible for Zig:

@dbandstra
Copy link
Contributor

@dbandstra dbandstra commented Dec 24, 2020

I think zig can already do "dependency injection" for your use case:
Just provide a different file under the package identifier which needs to fulfil the same public API ("header file") and it would work, as long as the API is truly compatible. The package manager should be able to allow such overrides

This doesn't work when you want one package to be able to access another one. Like if you have a "logging" package and "auth" package, auth package cannot access the logging package, because the addPackagePath resolution only applies to the root source file.

(I am using a workaround in one of my projects: instead of using addPackagePath in build.zig, have the main (root) file import specific implementations of each "package" and make them pub, and then having other files access them via @import("root"). But this doesn't work in tests.)

@akavel
Copy link

@akavel akavel commented Dec 27, 2020

Hm; @CantrellD's comment, and @419928194516's comment it builds upon, reminded me somewhat of the approaches explored in:

As such, I'm wondering if it could make some sense to explore some similar API<->implementation decoupling in the package manager. Notably, if yes, I would imagine:

  • with static typing, maybe hopefully for some package X, the set of package interfaces it depends on could be automatically generated by some tool? maybe as part of a release process of the package? maybe those could then be published for easy browsing and reviewing?
  • similarly, maybe the exposed interface/API of a published package could also be extracted by an automatic tool to a similarly formatted spec? again, maybe it could also be published for easy browsing and reviewing?
  • then, tools could maybe exist that could e.g.:
    • allow easy checking if particular package's "provided interface" does or does not implement some "depended interface", and thus can be "linked" by end-user as dependency for some other package?
      • if not, it could possibly list the API differences making it impossible, so that I could decide to write a wrapper package or something;
    • allow easy checking what changes does some version of a package have in its API vs. a different version, thus also allowing for "detection of breaking changes in the API"? possibly issuing a warning to package's author when releasing a package that they did so? or just allow package author if they want to check compatibility with some API spec they explicitly want to align with (presumably typically "of old versions of their package", but this being just a loose presumption; maybe e.g. they want to also/only align to API of some thirdparty package)

Just some ideas that came to my mind after reading a huge chunk of the thread above (though I can't 100% guarantee if I managed to internalize all of it). I'm not even sure if that's at all doable, notwithstanding whether this should be done at all; but I'm interpreting that this is still mostly a brainstorm phase, so throwing in my brainstorm contribution. Cheers and good luck!

@dralley
Copy link

@dralley dralley commented Jan 13, 2021

Running zig build on dependencies is desirable because it provides a package the ability to query the system, depend on installed system libraries, and potentially run the C/C++ compiler. This would allow us to create Zig package wrappers for C projects, such as ffmpeg. You would even potentially use this feature for a purely C project - a build tool that downloads and builds dependencies for you.

For managing C dependencies (and maybe Zig dependencies too, eventually?) it might be useful to be able to request the system-provided version of a given library. At least on Linux.

Let's say you have a library that uses Curl and OpenSSL and you want to link it against the system-provided versions of those libraries. It'd be nice if the Zig package manager could be configured to skip downloading them if a compatible version of the development library is already installed on your system in the traditional way e.g. dnf install openssl-devel libcurl-devel, and to make sure that those copies are used.

Of course, that can get tricky with the differences between families (Debian, Fedora, Arch) and within families (Debian/Ubuntu, Fedora/RHEL/CentOS). You might need to list the package names for a couple of different distros and have some way of constraining the acceptable versions. I can see how the cost/benefit could be questioned, it just strikes me as an interesting idea, especially since Zig is so nicely suited to being used with dynamic linking.

@faraazahmad
Copy link

@faraazahmad faraazahmad commented Jan 18, 2021

What are your thoughts on PNPM? It makes sure (among other things) that the same package (with same version) only has 1 copy of it in the system instead of the same package being used in multiple projects on the system in multiple node_modules folders.

While this is great for JS, I wonder if this is possible for a compiled language. (Apologies in advance if it has been mentioned before in this thread I couldn't go through all of it)

@jayschwa
Copy link
Contributor

@jayschwa jayschwa commented Jan 20, 2021

Go announced a security vulnerability today regarding malicious code execution when running go get: https://blog.golang.org/path-security

This is something to keep in mind for the package manager, and perhaps more broadly, the build system.

@andrewrk
Copy link
Member Author

@andrewrk andrewrk commented Jan 20, 2021

Yep that's one of the main reasons build scripts use a declarative API. Idea being that the package manager and build system could still run the build.zig script in a sandbox and collect a serialized description of how to build things. Packages which use the standard build API and do not try to run arbitrary code would be eligible for this kind of security, making it practical to make it a permissions flag you'd have to enable: to allow native code execution from a given dependency's build script.

Until we have such an advanced feature, however, it should be known that zig build is running arbitrary code from the dependency tree's build.zig scripts. And it is planned for build.zig logic to affect package management. However it is also planned that the set of possible dependencies will be purely declarative, so it will be practical to have a "fetch" step of package management that does not execute arbitrary code. However however, it is also planned for the package manager to support "plugin packages" that enable fetching content from new and exotic places, for example ipfs. For this feature to work, again, relies on execution of arbitrary code.

Ultimately I think what we will end up with is that some projects will want to execute arbitrary code as a part of the build script, but for most packages it will not be necessary, so that it can be no big deal to explicitly allow an "arbitrary code execution flag" on those dependencies that need it.

@daurnimator
Copy link
Collaborator

@daurnimator daurnimator commented Jan 20, 2021

Idea being that the package manager and build system could still run the build.zig script in a sandbox and collect a serialized description of how to build things.

What if build.zig had to work completely at comptime? That way it'd already be sandboxed to whatever is possible at comptime.... (i.e. side-effect free)

@antartica
Copy link

@antartica antartica commented Feb 7, 2021

Is it really neccessary to have the dependencies inside the build.zig? Wouldn't be better to have them in a separate dependencies.zig?

I would expect building a package to be an step unrelated to downloading the sources and its dependencies.

That is, similarly to what one does when using sources to generate binary packages in a distribution. For example in debian, after you have downloaded some debian-aware sources for a program, you use "dpkg-checkbuilddeps" to check if the dependencies are satisfied, and "apt-get build-dep" to request the download of the dependencies. And then you can start the build/test/fix/rebuild cycle (be it with make, fakeroot debian/rules binary, dpkg-buildpackage or debuild).

I say this because it would make me nervous that doing a compilation could download and update a dependency (which would be bad if there are issues I want replicate and debug, as the production code and my code would be using different versions of a dependency).

Note that I say this as an outsider that has recently discovered Zig and is evaluating doing some project with it, so I may be missing something or misunderstood the proposal.

@Meai
Copy link

@Meai Meai commented Feb 13, 2021

In many situations it's useful to see the umbrella organization from which the dependency comes. Made up Java examples:

  1. group: org.apache, name: httpclient.
  2. group: org.apache, name: regexutils.

Otherwise what happens is that people basically overload the name to include the group, everyone in their own way (apache-httpclient, regexutils-apache). Or they just don't include it and you end up with super generic names (httpclient).

It also prevents or minimizes "name squatting". I.e. the first comers get the best names and then they abandon them...

Requiring people who publish packages to own the domain they publish to is a basic security check that is worth it in my opinion. Say somebody publishes a zig package called "com.microsoft.uriparser".
Seen this suggestion here as Maven apparently is doing: https://www.reddit.com/r/programming/comments/lhu44g/researcher_hacks_over_35_tech_firms_by_creating/gn11fwj?utm_source=share&utm_medium=web2x&context=3

@kidandcat
Copy link

@kidandcat kidandcat commented Feb 15, 2021

If you want inspiration, I have not dug in how it works internally but Dart Pub https://dart.dev/guides/packages is the best package manager I have work with, and I work daily with npm, go, maven, pods and some others. It just works seamesly, I'm all the time switching flutter and dart versions and pub reinstall all dependencies so quickly with zero troubles everytime I switch between flutter channels.

https://dart.dev/tools/pub/dependencies

@DrSensor
Copy link

@DrSensor DrSensor commented Feb 20, 2021

@kidandcat would you elaborate on why Dart Pub your best package manager compares to others?

@ElectricCoffee
Copy link

@ElectricCoffee ElectricCoffee commented Mar 14, 2021

Maybe I'm just stupid, but wouldn't immutable packages make it really hard to push bug fixes and updates? Or would each version just be its own frozen entity?

@nektro
Copy link
Contributor

@nektro nektro commented Mar 14, 2021

the latter @ElectricCoffee

@gphilipp
Copy link

@gphilipp gphilipp commented Mar 16, 2021

I really recommend this talk by Rich Hickey (the author of Clojure) on dependency management https://github.com/matthiasn/talk-transcripts/blob/master/Hickey_Rich/Spec_ulation.md. You'll learn that semantic versioning is no panacea.
Btw, Clojure has a tool called tools.deps which can resolves dependencies that use a git repository + a sha (see https://clojure.org/guides/deps_and_cli#_using_git_libraries).

@jackdbd
Copy link

@jackdbd jackdbd commented Mar 22, 2021

Related to what @gphilipp wrote, Leiningen, the most popular build tool for Clojure, can be extended with plugins. One of these plugins takes care of versioning following the approach 1 git commit = 1 version
https://github.com/roomkey/lein-v

Lein-v uses git metadata to build a unique, reproducible and meaningful version for every commit. Along the way, it adds useful metadata to your project and artifacts (jar and war files) to tie them back to a specific commit. Consequently, it helps ensure that you never release an irreproduceable artifact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Linked pull requests

Successfully merging a pull request may close this issue.

None yet