New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

package manager #943

Open
andrewrk opened this Issue Apr 22, 2018 · 40 comments

Comments

Projects
None yet
@andrewrk
Member

andrewrk commented Apr 22, 2018

Latest Proposal


Zig needs to make it so that people can effortlessly and confidently depend on each other's code.

Depends on #89

@phase

This comment has been minimized.

phase commented Apr 23, 2018

My thoughts on Package Managers:

  • Packages should be immutable in the package repository (so the NPM problem doesn't arise).

  • Making a new release with a nice changelog should be simple. Maybe integration for reading releases from GitHub or other popular source hosts?

  • Packages should only depend on packages that are in the package repository (no GitHub urls like Go uses).

  • Private instances of the package repository should be supported from the get go (I believe this was a problem for Rust or some other newer language).


Thoughts that might not be great for Zig:

  • Enforcing Semver is an interesting concept, and projects like Elm have done it.
@andrewchambers

This comment has been minimized.

andrewchambers commented Apr 23, 2018

This is a good reference for avoiding the complexity of package managers like cargo, minimal version selection is a unique approach that avoids lockfiles, .modverify avoids deps being changed out from under you.

https://research.swtch.com/vgo

The features around verifiable builds and library verification are also really neat. Also around staged upgrades of libraries and allowing multiple major versions of the same package to be in a program at once.

@bnoordhuis

This comment has been minimized.

Member

bnoordhuis commented Apr 23, 2018

Packages should be immutable in the package repository (so the NPM problem doesn't arise).

I assume you mean authors can't unpublish without admin intervention. True immutability conflicts with the hoster's legal responsibilities in most jurisdictions.

minimal version selection

I'd wait a few years to see how that pans out for Go.

@andrewchambers

This comment has been minimized.

andrewchambers commented Apr 23, 2018

Note that by minimal, they mean minimal that the authors said was okay. i.e. the version they actually tested. The author of the root module is always free to increase the minimum. It is just that the minimum isn't some arbitrary thing that changes over time when other people make releases.

@BraedonWooding

This comment has been minimized.

Contributor

BraedonWooding commented Apr 23, 2018

My top three things are;

  • No lockfiles
  • KISS, I don't want to fight with the package manager, also should fully be integrated into build.zig no external programs.
  • Allow a file to be on each dependency to allow that dependency to go and download its own dependencies, this file should also maintain a change-log.

A good package manager can break/make a language, one of the reasons why Go has ditched atleast one of its official package managers and completely redid it (it may even be two, I haven't kept up to date with that scene).

@andrewrk

This comment has been minimized.

Member

andrewrk commented Apr 23, 2018

The first thing I'm going to explore is a decentralized solution. For example, this is what package dependencies might look like:

const Builder = @import("std").build.Builder;
const builtin = @import("builtin");

pub fn build(b: &Builder) void {
    const mode = b.standardReleaseOptions();

    var exe = b.addExecutable("tetris", "src/main.zig");
    exe.setBuildMode(mode);

    exe.addGitPackage("clap", "https://github.com/Hejsil/zig-clap",
        "0.2.0", "76c50794004b5300a620ed71ef58e4444455fd72e7f7e8f70b7d930a040210ff");
    exe.addUrlPackage("png", "http://example.com/zig-png.tar.gz",
        "00e27a29ead4267e3de8111fcaa59b132d0533cdfdbdddf4b0604279acbcf4f4");

    b.default_step.dependOn(&exe.step);
}

Here we provide a mapping of a name and a way for zig to download or otherwise acquire the source files of the package to depend on.

Since the build system is declarative, zig can run it and query the set of build artifacts and their dependencies, and then fetch them in parallel.

Dependencies are even stricter than version locking - they are source-locked. In both examples we provide a SHA-256 hash, so that even a compromised third party provider cannot compromise your build.

When you depend on a package, you trust it. It will run zig build on the dependency to recursively find all of its dependencies, and so on. However, by providing a hash, you trust only the version you intend to; if the author updates the code and you want the updates, you will have to update the hash and potentially the URL.

Running zig build on dependencies is desirable because it provides a package the ability to query the system, depend on installed system libraries, and potentially run the C/C++ compiler. This would allow us to create Zig package wrappers for C projects, such as ffmpeg. You would even potentially use this feature for a purely C project - a build tool that downloads and builds dependencies for you.

@ghost

This comment has been minimized.

ghost commented Apr 23, 2018

and potentially run the C/C++ compiler
cmd/go: arbitrary code execution during “go get” #23672

although you might argue

Dependencies are even stricter than version locking - they are source-locked. In both examples we provide a SHA-256 hash, so that even a compromised third party provider cannot compromise your build.

in that case you'd have to check all the reps of all your reps recursively (manually?) on each shape change though to be really sure

@andrewrk

This comment has been minimized.

Member

andrewrk commented Apr 23, 2018

in that case you'd have to check all the reps of all your reps recursively (manually?) on each shape change though to be really sure

This is already true about all software dependencies.

@BraedonWooding

This comment has been minimized.

Contributor

BraedonWooding commented Apr 26, 2018

I've been considering how one could do this for the past few days, here is what I generally came up with (this is based off @andrewrk 's idea), I've kept out hashes to make it easier, I'm more talking about architecture then implementation details here;

  • No lock files, purely source driven
  • Have a central repository of all downloaded files (like under /usr/local/zig-vendor and have a built in to access them like @vImport("BraedonWooding/ZigJSON"), or have a unique vendor location for each 'zig' build file or rather each project, in which case we autogenerate a nice index.zig file for you to access like const vendor = @import("vendor/index.zig"); const json = vendor.ZigJSON.
  • Utilise them during building, we can go and download any new dependencies.
  • Automatically download the latest dependency as per the user requirements that is either (for structure x.y.z);
    • Keep major version i.e. upgrade y and z but not x
    • Keep minor and major i.e. upgrade all 'z' changes.
    • Keep version explicitly stated
    • Always upgrade to latest

This would also solve the issue of security fixes as most users would keep the second option which is intended for small bug fixes that don't introduce any new things, whereas the major version is for breaking changes and the minor is for new changes that are typically non-breaking.

Your build file would have something like this in your 'build' function;

...
builder.addDependency(builder.Dependency.Git, "github.com.au", "BraedonWooding", "ZigJSON", builder.Versions.NonMajor);
// Or maybe
builder.addDependency(builder.Dependency.Git, "github.com.au/BraedonWooding/ZigJSON", builder.Versions.NonMajor);
// Or just
builder.addGitDependency("github.com.au/BraedonWooding/ZigJSON", builder.Versions.NonMajor);
...

Keeping in mind that svn and mercurial (as well as plenty more) are also used quite a bit :). We could either use just a folder system of naming to detect what we have downloaded, or have a simple file storing information about all the files downloaded (note: NOT a lock file, just a file with information on what things have been downloaded). Would use tags to determine versions but could also have a simple central repository of versions linking to locations like I believe what other things have.

@isaachier

This comment has been minimized.

Contributor

isaachier commented May 2, 2018

How would you handle multiple definitions of the same function? I find this to be the most difficult part of C/C++ package management. Or does Zig use some sort of package name prefixing?

@BraedonWooding

This comment has been minimized.

Contributor

BraedonWooding commented May 2, 2018

@isaachier Well you can't have multiple definitions of a function in Zig, function overloads aren't a thing (intended).

You would import a package like;

const Json = @Import("JSON/index.zig");

fn main() void {
    Json.parse(...);
    // And whatever
}

When you 'include' things in your source Zig file they are exist under a variable kinda like a namespace (but simpler), this means that you should generally never run into multiple definitions :). If you want to 'use' an import like using in C++ you can do something like use Json; which will let you use the contents without having to refer to Json for example in the above example it would just be parse(...) instead of Json.parse(...) if you used use, you still can't use private functions however.

If for some reason you 'use' two 'libraries' that have a dual function definition you'll get an error and will most likely have to put one under a namespace/variable, very rarely should you use use :).

@isaachier

This comment has been minimized.

Contributor

isaachier commented May 2, 2018

I don't expect a clash in the language necessarily, but in the linker aren't there duplicate definitions for parse if multiple packages define it? Or is it automatically made into Json_parse?

@Hejsil

This comment has been minimized.

Member

Hejsil commented May 2, 2018

@isaachier If you don't define your functions as export fn a() void, then Zig is allowed to rename the functions to avoid collisions.

@isaachier

This comment has been minimized.

Contributor

isaachier commented May 3, 2018

OK that makes sense. About package managers, I'm sure I'm dealing with experts here 😄, but wanted to make sure a few points are addressed for completeness.

  • Should multiple versions of the same library be allowed? This can occur when library A relies on libraries B and C. A needs C version 2 and B needs C version 1. How do you handle that scenario? I'm not sure about the symbol exports, but that might be an issue if you intend to link in both versions.
  • Are the packages downloaded independently for each project or cached on the local disk (like maven and Hunter). In the latter case, you have to consider the use of build flags and their effect on the shared build.
@andrewrk

This comment has been minimized.

Member

andrewrk commented May 3, 2018

These are important questions.

The first question brings up an even more fundamental question which we have to ask ourselves if we go down the decentralized package route: how do you even know that a given package is the same one as another version?

For example, if FancyPantsJson library is mirrored on GitHub and BitBucket, and you have this:

// in main package
exe.addGitPackage("fancypantsjson", "https://github.com/mrfancypants/zig-fancypantsjson",
    "1.0.1", "76c50794004b5300a620ed71ef58e4444455fd72e7f7e8f70b7d930a040210ff");

// in a nested package
exe.addGitPackage("fancypantsjson", "https://bitbucket.org/mirrors-r-us/zig-fancypants.git",
    "1.0.1", "76c50794004b5300a620ed71ef58e4444455fd72e7f7e8f70b7d930a040210ff");

Here, we know that the library is the same because the sha-256 matches, and that means we can use the same code for both dependencies. However, consider if one was on a slightly newer version:

// in main package
exe.addGitPackage("fancypantsjson", "https://github.com/mrfancypants/zig-fancypantsjson",
    "1.0.2", "dea956b9f5f44e38342ee1dff85fb5fc8c7a604a7143521f3130a6337ed90708");

// in a nested package
exe.addGitPackage("fancypantsjson", "https://bitbucket.org/mirrors-r-us/zig-fancypants.git",
    "1.0.1", "76c50794004b5300a620ed71ef58e4444455fd72e7f7e8f70b7d930a040210ff");

Because this is decentralized, the name "fancypantsjson" does not uniquely identify the package. It's just a name mapped to code so that you can do @import("fancypantsjson") inside the package that depends on it.

But we want to know if this situation occurs. Here's my proposal for how this will work:

comptime {
    // these are random bytes to uniquely identify this package
    // developers compute these once when they create a new package and then
    // never change it
    const package_id = "\xfb\xaf\x7f\x45\x86\x08\x10\xec\xdb\x3c\xea\xb4\xb3\x66\xf9\x47";

    const package_info = @declarePackage(package_id, builtin.SemVer {
        .major = 1,
        .minor = 0,
        .revision = 1,
    });

    // these are the other packages that were not even analyzed because they
    // called @declarePackage with an older, but API-compatible version number.
    for (package_info.superseded) |ver| {
        @compileLog("using 1.0.1 instead of", ver.major, ver.minor, ver.revision);
    }

    // these are the other packages that have matching package ids, but
    // will additionally be compiled in because they do not have compatible
    // APIs according to semver
    for (package_info.coexisting) |pkg| {
        @compileLog("in addition to 1.0.1 this version is present",
            pkg.sem_ver.major, pkg.sem_ver.minor, pkg.sem_ver.revision);
    }
}

The prototype of this function would be:

// thes structs declared in @import("builtin");
pub const SemVer = struct {
    major: @typeOf(1),
    minor: @typeOf(1),
    revision: @typeOf(1),
};
const Namespace = @typeOf(this);
pub const Package = struct {
    namespace: Namespace,
    sem_ver: SemVer,
};
pub const PackageUsage = struct {
    /// This is the list of packages that have declared an older,
    /// but API-compatible version number. So zig stopped analyzing
    /// these versions when it hit the @declarePackage.
    superseded: []SemVer,

    /// This is the list of packages that share a package id, but
    /// due to incompatible versions, will coexist with this version.
    coexisting: []Package,
};

@declarePackage(comptime package_id: [16]u8, comptime version: &const SemVer) PackageUsage

Packages would be free to omit a package declaration. In this case, multiple copies of the
package would always coexist, and zig package manager would be providing no more than
automatic downloading of a resource, verification of its checksum, and caching.

Multiple package declarations would be a compile error, as well as @declarePackage somewhere
other than the first Top Level Declaration in a Namespace.

Let us consider for a moment, that one programmer could use someone else's package id, and then
use a minor version greater than the existing one. Via indirect dependency, they could "hijack"
the other package because theirs would supersede it.

At first this may seem like a problem, but consider:

  • Most importantly, when a Zig programmer adds a dependency on a package and verifies the checksum,
    they are trusting that version of that package. So the hijacker has been approved.
  • If the hijacker provides a compatible API, they are intentionally trying to create a drop-in replacement
    of the package, which may be a reasonable thing to do in some cases. An open source maintainer
    can essentially take over maintenance of an abandoned project, without permission, and offer
    an upgrade path to downstream users as simple as changing their dependency URL (and checksum).
  • In practice, if they fail to provide a compatible API, runtime errors and compile time errors
    will point back to the hijacker's code, not the superseded code.

Really, I think this is a benefit of a decentralized approach.

Going back to the API of @declarePackage, here's an example of power this proposal gives you:

const encoding_table = blk: {
    const package_id = "\xfb\xaf\x7f\x45\x86\x08\x10\xec\xdb\x3c\xea\xb4\xb3\x66\xf9\x47";

    const package_info = @declarePackage(package_id, builtin.SemVer {
        .major = 2,
        .minor = 0,
        .revision = 0,
    });

    for (package_info.coexisting) |pkg| {
        if (pkg.sem_ver.major == 1) {
            break :blk pkg.namespace.FLAC_ENCODING_TABLE;
        }
    }

    break :blk @import("flac.zig").ENCODING_TABLE;
};

// ...

pub fn lookup(i: usize) u32 {
    return encoding_table[i];
}

Here, even though we have bumped the major version of this package from 1 to 2, we know that the FLAC ENCODING TABLE is unchanged, and perhaps it is 32 MB of data, so best to not duplicate it unnecessarily. Now even versions 1 and 2 which coexist, at least share this table.

You could also use this to do something such as:

if (package_info.coexisting.len != 0) {
    @compileError("this package does not support coexisting with other versions of itself");
}

And then users would be forced to upgrade some of their dependencies until they could all agree on a compatible version.

However for this particular use case it would be usually recommended to not do this, since there would be a general Zig command line option to make all coexisting libraries a compile error, for those who want a squeaky clean dependency chain. ReleaseSmall would probably turn this flag on by default.


As for your second question,

Are the packages downloaded independently for each project or cached on the local disk (like maven and Hunter). In the latter case, you have to consider the use of build flags and their effect on the shared build.

Package caching will happen like this:

  • Caching of the source download (e.g. this .tar.gz has this sha-256, so we can skip downloading it if we already have it)
  • The same binary caching strategy that zig uses for every file in every project

Caching is an important topic in the near future of zig, but it does not yet exist in any form. Rest assured that we will not get caching wrong. My goal is: 0 bugs filed in the lifetime of zig's existence where the cause was a false positive cache usage.

@andrewrk andrewrk added the proposal label May 3, 2018

@andrewrk

This comment has been minimized.

Member

andrewrk commented May 3, 2018

One more note I want to make:

In the example above I have:

exe.addGitPackage("fancypantsjson", "https://github.com/mrfancypants/zig-fancypantsjson",
    "1.0.2", "dea956b9f5f44e38342ee1dff85fb5fc8c7a604a7143521f3130a6337ed90708");

Note however that the "1.0.2" only tells Zig how to download from a git repository ("download the commit referenced by1.0.2"). The actual version you are depending on is the one that is set with @declarePackage in the code that matches the SHA-256.

So the package dependency can be satisfied by any semver-compatible version indirectly or directly depended on.

With that in mind, this decentralized strategy with @declarePackage even works if you do any of the following things:

  • use a git submodule for the package
    • you can use addDirPackage("name", "/path/to/dir", "a3951217c609a5a9c5a100e5f3c37a4e8b14796642138ee613db46daca7d43c7").
  • just copy+paste the package into your own source code
    • same thing, use addDirPackage

You can also force your dependency's dependency's dependency (and so on) to upgrade, simply by adding a direct dependency on the same package id with a minor or revision bump.

And to top it off you can purposefully inject code into your dependency's dependency's dependency (and so on), by:

  • forking or otherwise copy pasting the package in question
  • bumping the minor version
  • adding a direct dependency on your fork
  • do your code edits in the fork

This strategy could be used, for example, to add @optimizeFor(.Debug) in some tricky areas you're trying to troubleshoot in a third party library, or perhaps you found a bottleneck in a third party library and you want to add @optimizeFor(.ReleaseFast) to disable safety in the bottleneck. Or maybe you want to apply a patch while you're waiting for upstream to review and accept it, or a patch that will be coming out in the next version but isn't released yet.

@andrewrk

This comment has been minimized.

Member

andrewrk commented May 4, 2018

Another note: this proposal does not actually depend on the self hosted compiler. There is nothing big blocking us from starting to implement it. It looks like:

  • Implementation of @declarePackage in the c++ compiler. At this point you could test package management with the CLI.
  • Add the API to zig build system, e.g. addDirPackage. At this point we have a working package manager that you could use with git submodule or copy+pasting the package you want to depend on into your codebase. (But read on - this can be further improved)
  • Networking in the standard library. Probably best to use coroutines and async I/O for this. Related: #910. I also need to document coroutines (#367)
  • Add support for .tar.gz .tar.xz, .zip, etc to standard library
  • Now we can implement addUrlPackage
  • Add support for downloading a particular revision via git / svn
  • Now we can implement addGitPackage / addSvnPackage
@clownpriest

This comment has been minimized.

Contributor

clownpriest commented Jun 7, 2018

maybe worth considering p2p distribution and content addressing with ipfs?

see https://github.com/whyrusleeping/gx for example

just a thought

@costincaraivan

This comment has been minimized.

costincaraivan commented Jun 7, 2018

One important thing to note, especially for adoption by larger organization: think about a packaging format and a repo structure that is proxy/caching/mirroring friendly and that also allows an offline mode.

That way the organization can easily centralize their dependencies instead of having everyone going everywhere on the internet (a big no-no for places such as banks).

Play around a bit with Maven and Artifactory/Nexus if you haven't already 😉

@andrewrk

This comment has been minimized.

Member

andrewrk commented Jun 7, 2018

The decentralized proposal I made above is especially friendly to p2p distribution, ipfs, offline modes, mirroring, and all that stuff. The sha-256 hash ensures that software is built according to expectations, and the matter of where to fetch the resources can be provided by any number of "plugins" for how to download something:

  • http (don't even need https because of the sha-256)
  • git
  • svn
  • torrent
  • ipfs
  • nfs
  • some custom thing, it doesn't matter, as long as the bytes can be delivered
@costincaraivan

This comment has been minimized.

costincaraivan commented Jun 8, 2018

Looks good but I'd have to try it out in practice before I can say for sure 😄

I'd have one suggestion: for naming purposes, maybe it would be a good idea to also have a "group" or "groupId" concept?

In many situations it's useful to see the umbrella organization from which the dependency comes. Made up Java examples:

  1. group: org.apache, name: httpclient.
  2. group: org.apache, name: regexutils.

Otherwise what happens is that people basically overload the name to include the group, everyone in their own way (apache-httpclient, regexutils-apache). Or they just don't include it and you end up with super generic names (httpclient).

It also prevents or minimizes "name squatting". I.e. the first comers get the best names and then they abandon them...

@isaachier

This comment has been minimized.

Contributor

isaachier commented Jun 8, 2018

Structs provide the encapsulation you are looking for @costincaralvan. They seem to act as namespaces would in C++.

@demircancelebi

This comment has been minimized.

demircancelebi commented Jun 8, 2018

I agree with @costincaraivan. npm has scoped packages for example: https://docs.npmjs.com/getting-started/scoped-packages.

In addition to minimizing name squatting and its practical usefulness (being able to more easily depend on a package if it is coming from an established organization or a well-known developer), honoring the creators of a package besides their creation sounds more respectful in general, and may incentivize people to publish more of their stuff :).

On the other hand, generic package names also come in handy because there is one less thing to remember when installing them.

@costincaraivan

This comment has been minimized.

costincaraivan commented Jun 8, 2018

I didn't want to clutter the issue anymore but just today I bumped into something which is in my opinion relevant for the part I posted about groups (or scoped packages in NPM parlance):

http://bitprophet.org/blog/2012/06/07/on-vendorizing/

Look at their dilemma regarding the options, one of the solutions is forking the library:

Fork and release our own package on PyPI as e.g. fluidity-invoke.

  • This works, but has many the drawbacks of the vendorizing option and offers few of the benefits.
  • It also confuses things re: project ownership and who should receive/act on bug reports. Users new to the space might focus on your fork instead of upstream, forcing you to either handle their problems, or redirect them.

This would be easily solvable with another bit of metadata, the group. In Java world their issue would be solved by forking the library and then publishing it under the new group. Because of the group it's immediately obvious that the library was forked. Even easier to figure out in a repository browser of sorts since the original version would have presumably many versions while the fork will probably have 1 or 2.

@thejoshwolfe

This comment has been minimized.

Member

thejoshwolfe commented Jun 8, 2018

importers provide the name of the package that they will use to import the package. It's ok to have everyone try to name their module httpclient. When you want to import the module, give it whatever identifier you want. There are no name collisions unless you do it to yourself.

Name squatting is not meaningful in a distributed package manager situation. There is no central registry of names. Even in an application, there's no central registry of names. Each package has its own registry of names that it has full control over.

The only collisions possible in this proposal are collisions on the package id, which is a large randomly generated number used to identify if one package dependency is an updated version of another. You can only get collisions on package id if someone deliberately does so.

@costincaraivan

This comment has been minimized.

costincaraivan commented Jun 8, 2018

A package manager cannot be detached from social issues. Yes, technically things would ideally be fully distributed, you would pull files from everywhere. But in real life, let's take the 3 most popular distributed protocols on the net:

  • email

  • Bittorrent

  • Git

All of them have a higher level that effectively centralizes them or at least makes some nodes in this decentralized stronger much, much "stronger" than the average node, thereby centralizing the system to a great degree.

Email: Gmail, Microsoft, Yahoo. Probably 80+% of public mail goes through a handful of email hosters.

Bittorrent: torrent trackers, see the outcry when The Pirate Bay went down.

Git: Github 😃 Gitlab, Bitbucket.

A package name tells me what the thing is. Generally it isn't unique, sometimes it's even non-descriptive (utils...). A hash is very precise, but far from human friendly. Any kind of other metadata I can get from the source is greatly appreciated.

What I'm saying is: make the package collection fully distributed but have provisions in the package format for centralization. It will happen anyway if the language becomes popular (Maven, npm.js, Pypi, etc.).

@thejoshwolfe

This comment has been minimized.

Member

thejoshwolfe commented Jun 8, 2018

make the package collection fully distributed but have provisions in the package format for centralization.

That's already in the proposal.

I'll work on some more clear documentation on how packages will work in Zig, because there seems to be a lot of confusion and misunderstanding here.

@magicgoose

This comment has been minimized.

magicgoose commented Jun 8, 2018

I think it could be a good thing to also support digital signatures in addition to hashes.
For some software authors, Bob might trust Alice for some reason, but not have time to read every diff of Alice's package, and in this situation Bob may add a requirement that Alice's package must be signed with a key with specific fingerprint.

@419928194516

This comment has been minimized.

419928194516 commented Jun 8, 2018

Hey @andrewrk , I just watched your localhost talk (and backed you, good luck!).
You centered the talk around a notion of making perfect software possible.
I agree with this sentiment.
However, relying on other's work in the way you propose (without additional constraints) leads away from that goal.
It seems you've focused primarily on the "how do we have a decentralized store of packages" part of packages,
and less on "what packages are", and what that means for creating stable software.

The assumption seems to be "semver and a competent maintainer will prevent incompatibilities from arising.".
I am asserting that this is incorrect. Even the most judicious projects break downstream software with upgrades.
These "upgrades" that fail are a source of fear and frustration for programmers and laymen alike.
(See also: the phenonmena of single file, no dep C libs, and other language's dependency free libraries and projects)

When you talk about a package:

exe.addGitPackage("fancypantsjson", "https://github.com/mrfancypants/zig-fancypantsjson", "1.0.1", "76c50794004b5300a620ed71ef58e4444455fd72e7f7e8f70b7d930a040210ff");
You have decided that the identity of a package is:
    ID == (name: str, url: url, version: semver, id: package_id, sha: hash) + other metadata

As the consumer of a package, the identity of the package is relevant only to find the package.
When working with the package, what matters is only the public API it exposes.
For example:

API|1.0.0: {
    const CONST: 1234
    frobnicate: fn(usize) -> !usize  // throws IO
    unused: fn(u8): u8
}

Let's imagine my project only relies on frobnicate and CONST.
It follows that I only care about these two functions.
No other information about the the version, url, or name matters in the slightest.
Whether an upgrade is safe can be determined by observing the signatures of the things that I rely on.
(Ideally we'd rely not on the signatures, but on the "the exact same behavior given the same inputs", but solving the halting problem is out of scope.)

Some time later, the author of the package releases:

API|1.1.0: {
    const CONST: 1234
    frobnicate: fn(usize) -> !usize // throws IO // now 2x faster
    unused: fn(u8): u8
}
API|1.2.0: { // oops breaking minor version bump, but nobody used frobnicate.. right?
    const CONST: 1234
    unused: fn(u8): u8
}
API|1.3.0: { // added frobnicate back
    const CONST: 1234
    frobnicate: fn(usize) -> !usize // throws IO + BlackMagicError
    unused: fn(u8): u8
}

I cannot safely use API 1.2.0 or API 1.3.0
1.2.0 breaks the API contract with the omission of frobnicate
1.3.0 breaks the contract by adding an error that my project (maybe) doesn't know it needs to handle.

Your note here:

// these are the other packages that have matching package ids, but
// will additionally be compiled in because they do not have compatible
// APIs according to semver

implies that I can trust library author to not make mistakes when evaluating how their library upgrades will proceed in my project.
They cannot know that.
They should not have to know that.
It is the job of the compiler/package manager to understand the relationship between what a package provides and what a project requires.
API 1.2.0 and 1.3.0 might as well be completely alien packages from the perspective of frobnicate, the functions just happen to share a name.
However, if I only relied on CONST, all upgrades would have been safe.

What I am proposing is that package upgrading should be a deterministic proccess.
I should be able to ask the compiler: "Will this upgrade succeed without modification to my codebase".
I should also be able to ask the compiler: "What was incompatible" to be able to understand the impact of a breaking upgrade before biting the bullet.
The compiler needs to look at more than the pointer (id + metadata),
it must also look at the value of the package as determined by its API.
This is not the check that I want:

Author determined API 1.2.0 superceeds API 1.1.0:
    all OK using API 1.2.0

This is:

{CONST,frobnicate: fn(usize) -> !usize + throws IO} != {CONST,frobnicate: fn(usize) -> !usize + throws IO + BlackMagicError}
    API 1.2.0 returns a new unhandled error type BlackMagicError which results in (trace of problem), do you wish to proceed? (y/N)

TL;DR:

  • Humans can't be trusted to do semver (I'm not even sure semver should be a thing, it doesn't solve the problem it's supposed to solve)
  • A package's API makes it what it is.
  • Zig users should be able to upgrade without fear.
@tiehuis

This comment has been minimized.

Member

tiehuis commented Jun 8, 2018

@419928194516 See also #404.

@thejoshwolfe

This comment has been minimized.

Member

thejoshwolfe commented Jun 8, 2018

If you're proposing only checking the subset of compatibility that is knowable at comptime, then that sounds like #404. If you're proposing a general distrust of software changes, you can control all the dependencies that go into your application, and only upgrade packages when you choose.

@419928194516

This comment has been minimized.

419928194516 commented Jun 9, 2018

@thejoshwolfe I'm basically proposing what Andrew mentioned on #404 an hour after you comment here.

for example when deciding to upgrade you could query the system to see what API changes - that you actually depend on - happened between your current version and the latest.

Major version bump enforcement just means that the API broke for somebody maybe. And it prevents a certain class of error, but crudely.
What's relevant to the consumer of the library is what changed for them.
And that is inexpressible as a version number, but could be part of a package management system.

Edit: I do mean that subset, and yes I do generally distrust software and people, however well intentioned. If rules are not enforced, they will be broken, and their brokenness will become an unassailable part of the system, barring serious effort. Rust seems to be doing an ok job at undoing mistakes without major breakage, but most other projects and languages don't. See also: the linux kernel's vow to never break userspace, with the attending consequences (positive and negative).
Edit2: weird double paste of Edit1? removed.

@renatoathaydes

This comment has been minimized.

renatoathaydes commented Jun 9, 2018

I think I've seen this discussion before (just joking, but it's a little bit similar) here :D

Given Zig goals of allowing programmers to write reliable software, I agree with @419928194516 's thoughts... I wrote a little bit about the version problem myself, though my own thoughts were and still are rather unpolished, to be honest... anyway, it seems a lot of good ideas coming from many different people and communities are converging... specially, the idea that a version number is really not a good way to handle software evolution (though it still makes sense from a pure "marketing" perspective). I would +1 a proposal to automatically handle version updates and have the compiler (or a compiler plugin?) check that automatically (like japicmp does for Java APIs). This, together with the hash checks, makes Zig capable to offer something quite unique: perfect software evolution ;)

@binary132

This comment has been minimized.

binary132 commented Jun 18, 2018

In case this has not been mentioned yet, I strongly recommend reading this blog series on a better dependency management algorithm for Go.

@isaachier

This comment has been minimized.

Contributor

isaachier commented Jul 15, 2018

Related to @binary132's earlier post, one of the Go package manager developers posted on Medium about his advice for implementing a package manager: https://medium.com/@sdboyer/so-you-want-to-write-a-package-manager-4ae9c17d9527. Old article, but still has some interesting insights.

@ghost

This comment has been minimized.

ghost commented Jul 15, 2018

so sdboyer is actually as far as I followed the discussion the developer of dep (which is not the official go package manager) and if you look at some really long thread he disagrees with the now accepted vgo and minimum version selection from russ which now is becoming the official go version manager in go 1.11.

anyway its probably worth seeing both sdboyer and russ arguments https://sdboyer.io/blog/vgo-and-dep/ although I found sdboyers hard to follow at times.

@xtian

This comment has been minimized.

Contributor

xtian commented Sep 18, 2018

Is there an idea of how package discovery would work with this decentralized model? One of the benefits of a centralized system is having a single source for searching packages, accessing docs, etc.

@andrewrk

This comment has been minimized.

Member

andrewrk commented Sep 18, 2018

Is there an idea of how package discovery would work with this decentralized model? One of the benefits of a centralized system is having a single source for searching packages, accessing docs, etc.

I agree that this is the biggest downside of a decentralized system. It becomes a community effort to solve the problem of package discovery. But maybe it's not so bad. If there becomes a popular package discovery solution made by the community, people will use it, but won't be locked in to it.

I can imagine, for example, one such implementation where each package is audited by hand for security issues and quality assurance. So you know you're getting a certain level of quality if you search this third party index. At the same time, maybe there's another third party package repository that accepts anything, and so it's a good place to look for more obscure implementations of things.

And you could include dependencies from both at the same time in your zig project, no problem.

Anyway, I at least think it's worth exploring a decentralized model.

@binary132

This comment has been minimized.

binary132 commented Sep 20, 2018

I don't think a centralized model is a good idea. Imagine if C had implemented a centralized model in the 1970's or 1980's.

@rishavs

This comment has been minimized.

rishavs commented Sep 25, 2018

One suggestion, the compiler itself should be a package in the repository so that updating the language is as simple as zig update zig.
Haxe does this and I love their implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment