Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new spec for go package URLs #338

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

maceonthompson
Copy link

@maceonthompson maceonthompson commented Nov 4, 2024

The current PURL specification for Go was created before Go 1.11 modules and thus has namespace inconsistencies and lacks semantic versioning.

Although in many cases a module path corresponds directly to the URL of the hosting repository, that is not always true. The URL formed from the module path may be an endpoint that serves a redirect to the true host. This indirection protects projects that for whatever reason must change their hosting provider: their module names will continue to work. Consequently, it is undesirable to encode any aspect of the underlying hosting system as part of the PURL.

In essence, all Go modules form a single namespace. Since it is used by the majority of Go programmers, we propose to represent this namespace by the empty string. Though not included in this commit, other namespaces could be possible and would represent package managers and/or build tools that are alternatives to the go command.

The go type proposed here fixes the current issues by removing the namespace, using valid Go module versions (including pseudoversions), and adds some extra functionality to encode optional information about specific builds (GOOS, GOARCH, etc).

If accepted, all tools maintained by the Go project (such as govulncheck and pkg.go.dev) that surface PURLs will use this new type to provide canonical PURLs for Go modules and packages


Copy link
Contributor

@matt-phylum matt-phylum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See also #196 #294 #308

This is a breaking change that affects all software utilizing PURL for Go. Personally, I don't think there's anything fundamentally wrong with pkg:golang except that the description is outdated, and I'm sure it can be fixed without making this level of breaking change. Maintaining the separation of namespace and name and putting the entire Go package ID into the PURL name makes PURLs difficult for human users to work with.

PURL-TYPES.rst Outdated
------
``go`` for Go modules:

- The ``namespace`` field is empty and implies the go mod proxy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the field empty or does it imply the go mod proxy? It can't be both.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be done now, see the new commit (sorry for that).

PURL-TYPES.rst Outdated
- The ``name`` will be the full module path.
- The ``subpath`` will represent the package path within a module.
- The ``version`` will be a valid go version or pseudoversion, or empty.
- Additional Build information for binaries can be included as ``qualifiers`` (i.e VCS info, go version info, GoArch/GoOS info etc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The additional information should be explicitly defined here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactlty. be specific in the spec, so we all are on the same page.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL in the new commit (sorry for that).

PURL-TYPES.rst Outdated
``go`` for Go modules:

- The ``namespace`` field is empty and implies the go mod proxy.
- The ``name`` will be the full module path.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably specify that it is case sensitive. pkg:golang incorrectly states that it is not case sensitive and must be lowercased.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. this is what the whole #308 is about.
Please don't repeat the mistakes from the past.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be done now, see the new commit (sorry for that).

@maceonthompson
Copy link
Author

See also #196 #294 #308

Thanks for pointing at these! This is essentially a combination of #196 and #308 (with the addition of qualifiers for build info). They go into more detail than this proposal, but especially in the case of namespaces #63 (comment) is a good example as to why dropping name in favor of an entirely coded namespace would be more useful. I understand that having a bunch of %2F in the PURL is ugly for humans, but is (we feel) necessary to ensure that go PURLs are consistent (which is to say that go module -> PURL is injective, a go module cannot be represented by different PURLs).

Say you have a module with the path host.com/maybeuser/module.
With the current type definition, both pkg:golang/host.com/maybeuser/module and pkg:golang/host.com/maybeuser%2Fmodule, could represent that module. In order for PURLs to canonically and uniquely define go modules in the way that they are defined on pkg.go.dev or the go module proxy, they must be unique as well.

@matt-phylum
Copy link
Contributor

Say you have a module with the path host.com/maybeuser/module.
With the current type definition, both pkg:golang/host.com/maybeuser/module and pkg:golang/host.com/maybeuser%2Fmodule, could represent that module. In order for PURLs to canonically and uniquely define go modules in the way that they are defined on pkg.go.dev or the go module proxy, they must be unique as well.

I think the better solution to this problem is that pkg:golang/host.com/maybeuser%2Fmodule stays illegal. It'd be better if the documentation explicitly stated it were illegal, but based on the examples and test cases the correct form is pkg:golang/host.com/maybeuser/module, and based on the reference parsing and formatting algorithms it's clear that these PURLs are distinct.

However, "a go module cannot be represented by different PURLs" is not generally the case:

  • The PURL spec describes a canonical format for PURLs, but users and even commonly used PURL implementations often get this wrong and produce non-canonical PURLs which must still be considered equal. For example, pkg:golang/host%2Ecom/maybeuser/module is a non-canonical, valid, PURL which refers to the same package.
  • A PURL may have qualifiers which may or may not be critical to the PURL. A PURL with a ?goarch is a different PURL which refers to the same module, but a PURL with a ?repository_url (or however the module proxy is specified) is a different PURL which may refer to a different module (probably more likely in other ecosystems).

@jkowalleck jkowalleck added Proposed new type type: golang Proposed new type as well as component discussions labels Nov 8, 2024
@jkowalleck
Copy link
Member

jkowalleck commented Nov 8, 2024

This is a breaking change that affects all software utilizing PURL for Go.

I'd disagree. In fact, it is non-breaking, as it adds a completely new purl type. Therefore, no breaking changes are introduced.

@matt-phylum
Copy link
Contributor

It is breaking because no existing PURL software expects pkg:go, and new PURL software will not expect pkg:golang. This creates a compatibility problem where either the PURL is rejected as an unrecognized type or software on different sides of the breakage don't understand each other. If this is merged, all software that works with Go PURLs will need to be updated to accept both types of Go PURL and convert before they interoperate again.

@jkowalleck
Copy link
Member

It is breaking because no existing PURL software expects pkg:go [...]

this is true to every newly proposed PURL Type :-)
And none of them is a breaking change - neither in spec nor in behaviour.

this PR is trying to add a new type go. the existing golang is not touched at all.

@matt-phylum
Copy link
Contributor

The problem is that this is not a new type. The go type is intended to replace golang.

@jkowalleck
Copy link
Member

The problem is that this is not a new type.

it is not? Could you point me to the existing go type?

The go type is intended to replace golang.

I wonder how you come to this conclusion.
this very PR adds a new type, it does neither obsolete nor deprecate the existing golang type.

@matt-phylum
Copy link
Contributor

I wonder how you come to this conclusion.
this very PR adds a new type, it does neither obsolete nor deprecate the existing golang type.

From the PR description:

If accepted, all tools maintained by the Go project (such as govulncheck and pkg.go.dev) that surface PURLs will use this new type to provide canonical PURLs for Go modules and packages

golang is the type currently used for Go modules and packages. For example: https://github.com/anchore/syft/blob/3c070e0ad9d69c0f2191be52e2f2fb4904bcd558/syft/pkg/cataloger/golang/package_test.go#L24 . This PR is introducing a second, more preferred type for the same purpose.

@jkowalleck
Copy link
Member

I wonder how you come to this conclusion.
this very PR adds a new type, it does neither obsolete nor deprecate the existing golang type.

From the PR description:

If accepted, all tools maintained by the Go project (such as govulncheck and pkg.go.dev) that surface PURLs will use this new type to provide canonical PURLs for Go modules and packages

which is a behavioural change in a downstream application. This is out of scope of this spec, and not in our hands at all - we have no authority there.

golang is the type currently used for Go modules and packages. For example: https://github.com/anchore/syft/blob/3c070e0ad9d69c0f2191be52e2f2fb4904bcd558/syft/pkg/cataloger/golang/package_test.go#L24 . This PR is introducing a second, more preferred type for the same purpose.

exactly this paragraph makes it clear: this is a non-breaking change.

Causing no breaking change is the whole point of introducing a new purl type, instead of modifying an exising one.

@jkowalleck
Copy link
Member

jkowalleck commented Nov 8, 2024

I'm sure it can be fixed without making this level of breaking change.

i don't think so. #308 makes this clear: the existing spec has flaws that require breaking changes to fix them

The only way to fix golang is

  • a) introduce breaking changes in the existing purl-type << undesired !!!
  • b) introduce a new purl-type << feasible
  • c)
    1. have the PURL spec modified to allow versioning of purl-types << burocratic efforts that might lead to nothing
    2. if c)1. was successful: craft a purl-type golang version 2
    3. else fall back to a) or b)

@matt-phylum
Copy link
Contributor

Introducing a new type for an existing type is a breaking change to the PURL ecosystem. Implementations that use golang can continue to use golang and their golang PURLs will still be golang PURLs, but PURL has no negotiation mechanism where all the software that's going to read the PURLs agrees with the software that writes the PURLs on whether to use go or golang to describe Go dependencies.

If you start writing SBOMs that have go, they will be processed incorrectly by software that doesn't support go. If you continue writing SBOMs that have golang, they will be processed incorrectly by software that doesn't support golang. If you combine SBOMs using software that doesn't understand that go and golang are really the same type, the dependencies will be duplicated in the output. If you query go or golang packages against a vulnerability database, you have a 50/50 chance of finding the vulnerabilities unless the database understands both and converts golang to go.

Keeping golang is incompatible with the "a go module cannot be represented by different PURLs" goal of this PR.

You cannot just fix a PURL type by introducing a new type. Even if PURL libraries are updated to support transparently upgrading the old type into the new type on read, any software that is comparing pre-canonicalized PURL strings will need updates.

the existing spec has flaws that require breaking changes to fix them

What are the flaws that require breaking changes? #308 is about the path being incorrectly converted to lowercase, which is much more easily fixed by just not doing that.

@jkowalleck
Copy link
Member

jkowalleck commented Nov 8, 2024

Introducing a new type for an existing type is a breaking change to the PURL ecosystem.

how?

If a tool that produced purls would change it's behaviour by using the new purl-type, where they've used the other one before - this would be a breaking change in that very tool.
This is out of the scope of the purl spec -- we do not have authority there.

Implementations that use golang can continue to use golang and their golang PURLs will still be golang PURLs, but PURL has no negotiation mechanism where all the software that's going to read the PURLs agrees with the software that writes the PURLs on whether to use go or golang to describe Go dependencies.

So?
This is true to every purl type that is added over time.
An implementation written 2 years ago might not know the purl type that was defined yesterday.
This is by design and was never an issue. This is out of the scope of the purl spec -- we do not have authority there.

Keeping golang is incompatible with the "a go module cannot be represented by different PURLs" goal of this PR.

A PR tells a story, and the effective patch gets updated along with the discussions on a PR.
the initial PR description is usually not updated in accordance with the effective patch.

(PS: I review the content of the PR. and at the time of review, I saw no breaking change.
I was starting the "breaking" discussion in expectation that you'd agree that is no longer a breaking change, based on the current state of the PR.
I am happy we are discussing the topic anyway, i might be wrong, and I still need to learn.)

You cannot just fix a PURL type by introducing a new type. Even if PURL libraries are updated to support transparently upgrading the old type into the new type on read, any software that is comparing pre-canonicalized PURL strings will need updates.

how comes?

the existing spec has flaws that require breaking changes to fix them

What are the flaws that require breaking changes? #308 is about the path being incorrectly converted to lowercase, which is much more easily fixed by just not doing that.

the curerent golang spec says: the path MUST be lowercased.
This is wrong in terms of actual go dependency management: the path MUST NOT be lowercased.
Changing MUST to MUST NOT in golang purl-type is a breaking change of the specification.

PURL-TYPES.rst Outdated
``go`` for Go modules:

- The ``namespace`` field is empty and implies the go mod proxy.
- The ``name`` will be the full module path.
Copy link
Member

@jkowalleck jkowalleck Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The ``name`` will be the full module path.
- The ``name`` is the full module path. It MUST be unmodified, and follow the `Go Module Reference <https://go.dev/ref/mod#go-mod-file-ident>`_.

this change would close #308

Copy link
Contributor

@matt-phylum matt-phylum Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- - The ``name`` will be the full module path.
+ - The ``name`` is the full module path. In case of an URL: protocol MUST be lowercased; host-part MUST be lowercased; path-part MUSTbe unmodified, as it is case-sensitive.

this change would close #308

I don't think this is correct.

  1. I don't think it's legal to include a protocol in the module path. Go makes some HTTPS requests to resolve a VCS URL to download the package from (usually this is delegated to the proxy).
  2. The host part is also part of the case sensitive module path. It should not be lowercased. Uppercase characters are currently forbidden by Go for modules. I don't think it's worthwhile or really correct for the PURL spec to be specifying how to convert an invalid module path into a valid module path, I don't think it's worthwhile for the PURL spec to be specifying how to validate Go module paths, this doesn't cover all the restrictions, and this may cause problems if Go ever changes the restrictions for some reason.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re 1: I see. i was wrong there. Adjusted my suggestion for the protocol.
re 2: the host-part is, per URL-spec case-insensitive, and is normalized to lowercase.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as Go is concerned, it's usually a host-part but it has additional restrictions and it is case sensitive: https://go.dev/ref/mod#go-mod-file-ident

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I will modify my change-suggestion accordingly. does it fit better, now?

PURL-TYPES.rst Outdated

- The ``namespace`` field is empty and implies the go mod proxy.
- The ``name`` will be the full module path.
- The ``subpath`` will represent the package path within a module.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The ``subpath`` will represent the package path within a module.
- The ``subpath`` is the unmodified package path within a module.

PURL-TYPES.rst Outdated
- The ``namespace`` field is empty and implies the go mod proxy.
- The ``name`` will be the full module path.
- The ``subpath`` will represent the package path within a module.
- The ``version`` will be a valid go version or pseudoversion, or empty.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The ``version`` will be a valid go version or pseudoversion, or empty.
- The ``version`` may be a valid go version or pseudoversion, omitted when empty.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why may here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because version is optional.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be done now, see the new commit (sorry for that).

@matt-phylum
Copy link
Contributor

Adding a new type for a new type is much different than adding a new type for an existing type. An old tool not recognizing a truly new type is expected, but an old tool not recognizing Go PURLs anymore because a tool producing the data says that golang is now spelled go is a breaking change. You can argue that this isn't a breaking change in the PURL spec itself because it doesn't change golang, but it necessitates a breaking change in every current implementation of Go PURLs and complicates implementations of Go PURL consuming software as long as there are both go and golang PURLs going around.

Changing "MUST be lowercased" to "MUST NOT be lowercased" is a much less impactful change than this. From what I've seen, names with uppercase characters are uncommon, and an outdated implementation that is incorrectly lowercasing is still working correctly for all names that do not contain uppercase characters to lowercase. I would even say that on a larger scale it is not a breaking change because:

  • An outdated PURL producer that incorrectly lowercases an ID containing capitals produces the wrong PURL, but today those producers are producing exactly the same PURL and calling it correct despite referring to the wrong package.
  • An outdated PURL consumer that incorrectly lowercases an ID containing capitals reads the wrong ID, but today those consumers are already reading exactly the same ID and calling it correct despite referring to the wrong package.

In both cases, the PURL is still parsed successfully and the meaning of the PURL is unchanged with respect to the current "MUST be lowercased" spec. The only differences would be that the canonical form changes¹ and a new consumer receiving a PURL from an old producer might be more likely to expect that the ID refers to the correct package, but since there is no good way for an outdated consumer to recover the correct ID after an outdated producer lowercases it, any consumer that relies on getting the correct ID (eg to resolve the package files) is likely already broken and not lowercasing the name can only improve the behavior in that situation.

This causes the same alignment problems as introducing a go type, except that if the correct ID is lowercase, no problem occurs because lowercasing is already producing the correct PURL.

¹ Due to underspecification in the text and tests, I wouldn't trust incoming PURLs to be in the canonical form as my implementation understands it. There are numerous minor differences in which characters are escaped when (and sometimes how), so if you're accepting PURLs from an external source, even if you don't expect user-entered, non-canonical PURLs in that source, you should be canonicalizing those PURLs yourself if your application depends on them all being canonical for the same definition of canonical.

@matt-phylum
Copy link
Contributor

Go isn't the only ecosystem that has this problem of incorrect name normalization rules in this repo. I'm also aware of:

@zpavlinovic
Copy link

zpavlinovic commented Nov 14, 2024

Introducing a new type for an existing type is a breaking change to the PURL ecosystem.

If this is indeed true, then there is something really wrong with PURL: it does not allow for evolution. On the one hand, we cannot add modifications to the existing specification that could introduce breaking changes. On the other hand, we cannot introduce a new type because somehow that is a breaking change as well. So one is pretty much stuck with slight variations of the initial spec. Specs should be allowed to evolve just the way the software does.

There should really be a way to add versioning on top of PURL itself. What is being proposed here might in essence be just that for the go spec.

@pombredanne
Copy link
Member

@maceonthompson Thanks for putting this together! this makes a lot sense, and we have an issue with Go alright. Let me look at the comments in details and come back with my 2 cents!

@pombredanne
Copy link
Member

@matt-phylum re:

Introducing a new type for an existing type is a breaking change to the PURL ecosystem.

I am not sure that's hte case, but a new type vs. updating the existing type demands some careful thinking :)

PURL-TYPES.rst Outdated

pkg:go/google.golang.org%2Fgenproto#googleapis/api/annotations
pkg:go/github.com%2Fjmorion%2Fsqlx@v1.1.2#api
pkg:go/golang.org%2Fx%2Fvuln?goversion=1.23.2&vcs=git&vcs_modified=true#cmd/govulncheck
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a likely problem with the use of subpath: there is no way to determine where the module ends and the package starts in the general case, is there?
For instance, in the path google.golang.org/genproto/googleapis/api/annotations how can I determine safely that google.golang.org/genproto is a module and that googleapis/api/annotations is a package inside this module? I need either a go proxy lookup or a full filesystem to locate a go.mod/go.sum file, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a way, if the module's code is available to you, to determine from a package import where module path ends and the package path begins by making HTTP requests.

I think the use of the subpath here is good because it puts the burden of determining this on whatever generates the PURL, which is likely aware of Go and either has the module paths or is most likely to be able to find the module path from the full package path. Then if you want to use a tool that checks PURLs against a database of information about modules (eg vulnerabilities), the tool already has all the information it needs. Otherwise, either the tool would need to make external API calls to figure out the module path of the PURL or the database would need to have an entry for every package in the module.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to add to @matt-phylum's comment. If a tool is producing a PURL for a Go artifact, then it can use go version, Debug.BuildInfo, or packages.Load to get information about the package and its corresponding module. The encoding proposed here then makes it clear what the modules and packages are.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this is true for the general use case of PURLs. E.g. we do static analysis of binaries and while we can get information about linked packages, there's no indication of which part of the paths correspond to modules

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, if you are looking at a Go symbol from the symbol table, you can get its package. You can get the module correctly by prefix-matching it with module information from debug.Buildinfo of the binary, unless there are several modules that are prefixes of the package. My inclination is that it should not affect what is proposed here. (Arguably, there should be a way to get module information for a symbol in the binary, just the way one can do it for the source analysis.)

@pombredanne
Copy link
Member

BTW, an elephant in the room is whether the distinction between a namespace and name makes sense not only here, but also in the whole spec, globally.

I found myself using a variable with a "namespace/name" substring more often than not.
Then, how to split this in optional namespace and name could become a type-specific distinction, but the general concept would be that of "namespace/name", which could look like:

  • pkg:golang/google.golang.org/genproto/googleapis/api/annotations@v1.2.1

With this the whole google.golang.org/genproto/googleapis/api/annotations would be the namespace/name and would not have a specific split in Go, all would be in the name?
(and the same could apply where relevant to other package types)

It could have a minimal impact on the spec.

PURL-TYPES.rst Outdated

pkg:go/google.golang.org%2Fgenproto#googleapis/api/annotations
pkg:go/github.com%2Fjmorion%2Fsqlx@v1.1.2#api
pkg:go/golang.org%2Fx%2Fvuln?goversion=1.23.2&vcs=git&vcs_modified=true#cmd/govulncheck
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the plan to include all the buildinfo structure as qualifiers?
If so, this would only apply in a built binary?

Copy link
Member

@jkowalleck jkowalleck Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point.
If so, all the qualifiers MUST be documented in the type-spec.

currently it reads:

Additional Build information for binaries can be included as qualifiers (i.e VCS info, go version info, GoArch/GoOS info etc)

I am afraid this documentation is insufficient.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will expand on this.

PURL-TYPES.rst Outdated
pkg:go/google.golang.org%2Fgenproto#googleapis/api/annotations
pkg:go/github.com%2Fjmorion%2Fsqlx@v1.1.2#api
pkg:go/golang.org%2Fx%2Fvuln?goversion=1.23.2&vcs=git&vcs_modified=true#cmd/govulncheck
pkg:go/golang.org%2Fx%2Fvuln@v1.1.3?goversion=1.23.2#cmd/govulncheck
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the Go module versions always to be prefixed with a v?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A version identifies an immutable snapshot of a module, which may be either a release or a pre-release. Each version starts with the letter v, followed by a semantic version.
-- https://go.dev/ref/mod#versions

version could also be a pseudo-version -- a git-tag, a git-commit-hash, or something like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A pseudoversion is a special kind of version that also starts with a v: https://go.dev/doc/modules/version-numbers#pseudo-version-number

I think for Go modules, including when using the Go module system to refer to something that predates modules, the version always starts with a v. In which case, versions that don't start with v would only be used with older tools like Dep?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a version exists, it should be a valid Go module version. It should start with a v.

Note that hashes should not be permitted, they are not a valid Go version (resolution of hash commits in go tooling is a convenience feature).

@pombredanne
Copy link
Member

@matt-phylum you wrote

See also:

This is a breaking change that affects all software utilizing PURL for Go. Personally, I don't think there's anything fundamentally wrong with pkg:golang except that the description is outdated, and I'm sure it can be fixed without making this level of breaking change.

Thanks for the links! I tend to think along the same lines, and we can likely salvage the golang type.

Maintaining the separation of namespace and name and putting the entire Go package ID into the PURL name makes PURLs difficult for human users to work with.

I need to pounder this. See my other comment wrt. the namespace/name above in #338 (comment)

@zpavlinovic
Copy link

zpavlinovic commented Nov 18, 2024

However, "a go module cannot be represented by different PURLs" is not generally the case:

  • The PURL spec describes a canonical format for PURLs, but users and even commonly used PURL implementations often get this wrong and produce non-canonical PURLs which must still be considered equal. For example, pkg:golang/host%2Ecom/maybeuser/module is a non-canonical, valid, PURL which refers to the same package.
  • A PURL may have qualifiers which may or may not be critical to the PURL. A PURL with a ?goarch is a different PURL which refers to the same module, but a PURL with a ?repository_url (or however the module proxy is specified) is a different PURL which may refer to a different module (probably more likely in other ecosystems).

It is fine that PURL spec allows for more flexibility, but there should be only one way the Go module and package information is encoded. This simplifies the work for clients. It is easy to drop qualifiers from a PURL. It is annoying to generate multiple module+package encodings to see if the incoming PURL applies to your code.

In general, this proposal tries to make it simple and clear to generate and accurately check against PURLs. It might not be the most user-friendly solution, but tools that render PURLs can easily prettify the output. We believe this is worth the sacrifice.

@rhalar
Copy link

rhalar commented Nov 19, 2024

Could it also be clarified how standard library packages are to be represented?

Go has special handling for these, and the 'module' is never explicitly required when using them. But the module does exist for std and cmd
https://github.com/golang/go/blob/master/src/go.mod#L1
https://github.com/golang/go/blob/master/src/cmd/go.mod#L1

Go uses stdlib when reporting vulnerabilities though
https://vuln.go.dev/ID/GO-2024-3105.json

but the exact module name would make more sense we believe.

@jkowalleck
Copy link
Member

jkowalleck commented Mar 19, 2025

@puerco, please read and understand all previously made comments.


repeating myself and others:

we are here in the domain of specification.
we dont care if a change in the specification breaks downstream implementations due to poor design choices, since we are not in their domain.
we only care that a change does not break the existing specification, so that existing downstream works even after our changes are effective.

Yes backporting the go improvements to the original golang is a breaking change, but it only breaks purls which are already broken.

[...] is a breaking change [...]

you see, these changes to go would be breaking the existing spec. so this is why a new type was introduced - to not break the spec.

[...] but it only breaks purls which are already broken.

This is a misconception - their are no "broken" purls.
There are purls that don't reflect the capabilities and needs of nowadays. this is why this PR exists, this is why the feature requests #196 #294 #308 exist ;-)
And you're actually proposing to solve these feature requests by introducing breaking changes?

The biggest problem I see (which is what I'm trying to avoid here) is adding the new type.

You expressed you've read all comments before, maybe read again.
You expressed coming from a tools-builder's background. Then this is for you: #338 (comment)

@matt-phylum
Copy link
Contributor

poor design choices

How could an implementation have been written to avoid the replacement of golang with go being a breaking change?

This proposal, maybe unintentionally, does replace golang with go. Any software that accepts golang PURLs and either understands them to be Go and does something with that or compares them to golang PURLs coming from another source will be broken. It's not a matter of bad design decisions in those implementations. The input to those implementations will some day no longer contain pkg:golang and pkg:go will have no meaning. AFAIK these interactions are normally unidirectional and there's no way for the consuming software to indicate to the producing software that it does or does not support pkg:go. At least all the cases I know of are unidirectional and the representation of Go packages must be agreed upon ahead of time.

The only way I can think of that implementations would not be broken would be if the input data contained both pkg:golang and pkg:go PURLs for each Go package, represented in such a way that the software knows both PURLs belong to the same package. However, I've never seen any schema where a package is specified by multiple PURLs, and it seems counter to the idea of PURL for a package to be intentionally identified by multiple PURLs in the same document. Maybe it's actually a good idea because it could allow for PURL spec evolution or cases where a library is aliased as part of a rename or something, but such schemas are outside the scope of PURL so at best PURL could only recommend doing so.

For example, this usecase from the CycloneDX site will not work until tools and/or databases are updated because the component purl field will change to something meaningless to existing software: https://cyclonedx.org/use-cases/identify-known-vulnerabilities/

The OSV database already uses the corrected pkg:golang with uppercase variant for Go: https://osv.dev/vulnerability/GHSA-9cp9-8gw2-8v7m . I'm not sure this is really a great example because I think I remember their PURL implementation having some other, less beneficial non-conformities which could cause vulnerability lookups to fail in other cases, but any similar software, no matter how well PURL support is implemented, will be broken if pkg:golang is replaced by pkg:go.

@jkowalleck
Copy link
Member

jkowalleck commented Mar 19, 2025

This proposal, maybe unintentionally, does replace golang with go. Any software that accepts golang PURLs and either understands them to be Go and does something with that or compares them to golang PURLs coming from another source will be broken.

again?! Okay, lets play this again...

nope. any implementation that supports golang will not be affected, golang is sill around, and is still working as before - and that is due to this pull request not changing golang.
How will a software be broken after no changes to golang ever happened?
How will this spec be broken, if it adds a new feature?

I will not go into the details of the other points, as they all rely on this fundamental misconception.

PS: please understand, that all the other proposed solutions, like changing golang would be breaking changes, and your downstream implementations definitely will break; this PR's approach effectively prevents breaking changes in the spec (the only domain that matters), and therefore downstream is not effected.
PPS: you asked which poor design choices could lead to such a thing downstream? if you dont check on the purl scheme, but simply assume your input being a purl for golang, thats such a thing ... your argumentation is almost suggesting that this is exactly the poor design that you base your points on, or is it?

@matt-phylum
Copy link
Contributor

You keep denying this and calling it a fundamental misconception, but you clearly do not understand what it is @puerco and I are concerned about. It doesn't matter if pkg:golang is still around. The problem is created by somebody starting to use pkg:go while any of the software that currently supports pkg:golang exists. If the plan wasn't to replace pkg:golang with pkg:go in at least some contexts then there would not be a PR to introduce a pkg:go that must never be used.

I know that fixing golang does not definitely break all downstream implementing applications because I am aware of multiple applications that will not break. In fact, I am not aware of any applications that will break in a way that is worse than how they are already broken for the PURLs that would be affected by fixing pkg:golang. I'm not 100% sure that it won't cause new problems in some cases, but I still don't understand how it could cause more problems than introducing pkg:go PURLs into the input of software that only understands pkg:golang PURLs, and I don't understand how it would be feasible, without changing the schemas of documents that contain PURLs, to introduce pkg:go to the PURL spec without introducing pkg:go PURLs into the input of software that only understands pkg:golang PURLs without either a years-long transition period were implementations are expected to support pkg:go but only output pkg:golang or leaving it to users to figure out when they need to normalize all Go PURLs in a file to pkg:go or pkg:golang.

PPS: you asked which poor design choices could lead to such a thing downstream? if you dont check on the purl scheme, but simply assume your input being a purl for golang, thats such a thing ... your argumentation is almost suggesting that this is exactly the poor design that you base your points on, or is it?

This does not make sense to me. I never said anything about not checking whether the PURL is a Go PURL. If the input has pkg:golang, then it is a Go PURL, and you can resolve it to a package file or look up vulnerabilities or other information about it by using that knowledge. If it the input has pkg:go, then it is currently not a Go PURL, and implementations know nothing about it and can do nothing with it. If you are doing the correct thing and applying Go semantics to only pkg:golang PURLs then your implementation will not work when it starts receiving pkg:go PURLs.

@zpavlinovic
Copy link

You keep denying this and calling it a fundamental misconception, but you clearly do not understand what it is @puerco and I are concerned about. It doesn't matter if pkg:golang is still around. The problem is created by somebody starting to use pkg:go while any of the software that currently supports pkg:golang exists. If the plan wasn't to replace pkg:golang with pkg:go in at least some contexts then there would not be a PR to introduce a pkg:go that must never be used.

I think it would be very useful if you could create some scenarios/examples that are representative of the concerns you have. We could then discuss them. I am not trying to diminish your concerns, I simply don't understand them; they are quite abstract to me at this point. It feels like we are running in circles, so I believe working out concrete scenarios might help here.

@matt-phylum
Copy link
Contributor

Here is an example of the way pkg:go can go wrong.

The user is using an SBOM generation tool to analyze their source code and produce an SBOM. They take that SBOM and feed it into a vulnerability scanner so they can be aware of vulnerabilities in their software. The vulnerability scanner, either directly or indirectly, needs to recognize the Go modules being used. In practice this will be something like reading a CycloneDX file, extracting component PURLs, and either converting those PURLs into ecosystem-native identifiers that can be looked up in an advisory database (eg pkg:golang/github.com/AdguardTeam/AdGuardHome@1.107.52 -> {"ecosystem":"Go", "name": "github.com/AdguardTeam/AdGuardHome", "version": "1.107.52"}) or the other way around, using the PURL to query a database where the advisory database entries have already been mapped to PURLs.

If the user updates the SBOM generation tool to a version that outputs pkg:go PURLs without updating their vulnerability scanner or its database first, either they will get an error that pkg:go isn't supported, and maybe a hint that they should upgrade the service, or they will just get no results for the new PURL pkg:go/github.com%2FAdguardTeam%2FAdGuardHome@1.107.52 and not know that they are vulnerable, a critical failure that can result in a surprise compromise. The second case is what will happen if the vulnerability matching is being offloaded to today's api.osv.dev.

$ curl -d '{"package":{"purl":"pkg:golang/github.com/AdguardTeam/AdGuardHome@1.107.52"}}' "https://api.osv.dev/v1/query"
{"vulns":[{"id":"GO-2024-2924","summary":"AdGuardHome privilege escalation vulnerability in github.com/AdguardTeam/AdGuardHome","details":"AdGuardHome privilege escalation vulnerability in github.com/AdguardTeam/AdGuardHome.\n\nNOTE: The source advisory for this report contains additional versions that could not be automatically mapped to standard Go module versions.\n\n(If this is causing false-positive reports from vulnerability scanners, please suggest an edit to the report.)\n\nThe additional affected modules and versions are: .","aliases":["CVE-2024-36586","GHSA-7jp9-vgmq-c8r5"],"modified":"2024-09-06T20:44:16Z","published":"2024-06-28T15:28:30Z","database_specific":{"url":"https://pkg.go.dev/vuln/GO-2024-2924","review_status":"UNREVIEWED"},"references":[{"type":"ADVISORY","url":"https://github.com/advisories/GHSA-7jp9-vgmq-c8r5"},{"type":"ADVISORY","url":"https://nvd.nist.gov/vuln/detail/CVE-2024-36586"},{"type":"WEB","url":"https://github.com/go-compile/security-advisories/blob/master/vulns/CVE-2024-36586.md"}],"affected":[{"package":{"name":"github.com/AdguardTeam/AdGuardHome","ecosystem":"Go","purl":"pkg:golang/github.com/AdguardTeam/AdGuardHome"},"ranges":[{"type":"SEMVER","events":[{"introduced":"0"}]}],"ecosystem_specific":{"custom_ranges":[{"type":"ECOSYSTEM","events":[{"introduced":"0.93.0"}]}]},"database_specific":{"source":"https://vuln.go.dev/ID/GO-2024-2924.json"}}],"schema_version":"1.6.0"}]}
$ curl -d '{"package":{"purl":"pkg:go/github.com%2FAdguardTeam%2FAdGuardHome@1.107.52"}}' "https://api.osv.dev/v1/query"
{}

This seems like a fairly likely case to me. SBOM generation can happen in CI close to developers while vulnerability scanning may happen in some software like Dependency-Track hosted by IT or a dedicated security team. Even if the software vendors are keeping up and have added support for pkg:go, it's much easier for a developer to update a GitHub action (it could be as easy as clicking "merge" on an unassuming automated dependency update PR) than for somebody to update the deployed version of the service being used to monitor for security vulnerabilities in released products and go through any revalidation processes associated with changing a service involved in safeguarding customer data. Companies may be running very old services on internal networks if they don't perceive a need to upgrade them.

@zpavlinovic
Copy link

Here is an example of the way pkg:go can go wrong.

Thanks for the example, this is really helpful.

The user is using an SBOM generation tool to analyze their source code and produce an SBOM. They take that SBOM and feed it into a vulnerability scanner so they can be aware of vulnerabilities in their software. The vulnerability scanner, either directly or indirectly, needs to recognize the Go modules being used. In practice this will be something like reading a CycloneDX file, extracting component PURLs, and either converting those PURLs into ecosystem-native identifiers that can be looked up in an advisory database (eg pkg:golang/github.com/AdguardTeam/AdGuardHome@1.107.52 -> {"ecosystem":"Go", "name": "github.com/AdguardTeam/AdGuardHome", "version": "1.107.52"}) or the other way around, using the PURL to query a database where the advisory database entries have already been mapped to PURLs.

I believe I understand the scenario, sorry if I don't. It seems to me that the concern you described here, again, can apply to any new PURL package. If the SBOM tool outputs a pkg:cool-new-lang PURL and the vulnerability scanner does not recognize it, the user can be in a proverbial pickle.

If the user updates the SBOM generation tool to a version that outputs pkg:go PURLs without updating their vulnerability scanner or its database first, either they will get an error that pkg:go isn't supported, and maybe a hint that they should upgrade the service, or they will just get no results for the new PURL pkg:go/github.com%2FAdguardTeam%2FAdGuardHome@1.107.52 and not know that they are vulnerable, a critical failure that can result in a surprise compromise. The second case is what will happen if the vulnerability matching is being offloaded to today's api.osv.dev.

They should get that pkg:cool-new-lang, or pkg:go for that matter, is not supported and they should then decide whether to use the vulnerability scanner or call it differently. If the scanner does not issue the unsupported or update-me warning, then this vulnerability scanner has some major issues. I mean, if the scanner just eats the unsupported PURL, proceeds to produce results pretending like nothing happened, and then that results in missed vulnerabilities, then this is definitely not the scanner one should be using.

$ curl -d '{"package":{"purl":"pkg:golang/github.com/AdguardTeam/AdGuardHome@1.107.52"}}' "https://api.osv.dev/v1/query"
{"vulns":[{"id":"GO-2024-2924","summary":"AdGuardHome privilege escalation vulnerability in github.com/AdguardTeam/AdGuardHome","details":"AdGuardHome privilege escalation vulnerability in github.com/AdguardTeam/AdGuardHome.\n\nNOTE: The source advisory for this report contains additional versions that could not be automatically mapped to standard Go module versions.\n\n(If this is causing false-positive reports from vulnerability scanners, please suggest an edit to the report.)\n\nThe additional affected modules and versions are: .","aliases":["CVE-2024-36586","GHSA-7jp9-vgmq-c8r5"],"modified":"2024-09-06T20:44:16Z","published":"2024-06-28T15:28:30Z","database_specific":{"url":"https://pkg.go.dev/vuln/GO-2024-2924","review_status":"UNREVIEWED"},"references":[{"type":"ADVISORY","url":"https://github.com/advisories/GHSA-7jp9-vgmq-c8r5"},{"type":"ADVISORY","url":"https://nvd.nist.gov/vuln/detail/CVE-2024-36586"},{"type":"WEB","url":"https://github.com/go-compile/security-advisories/blob/master/vulns/CVE-2024-36586.md"}],"affected":[{"package":{"name":"github.com/AdguardTeam/AdGuardHome","ecosystem":"Go","purl":"pkg:golang/github.com/AdguardTeam/AdGuardHome"},"ranges":[{"type":"SEMVER","events":[{"introduced":"0"}]}],"ecosystem_specific":{"custom_ranges":[{"type":"ECOSYSTEM","events":[{"introduced":"0.93.0"}]}]},"database_specific":{"source":"https://vuln.go.dev/ID/GO-2024-2924.json"}}],"schema_version":"1.6.0"}]}
$ curl -d '{"package":{"purl":"pkg:go/github.com%2FAdguardTeam%2FAdGuardHome@1.107.52"}}' "https://api.osv.dev/v1/query"
{}

This seems like a fairly likely case to me. SBOM generation can happen in CI close to developers while vulnerability scanning may happen in some software like Dependency-Track hosted by IT or a dedicated security team. Even if the software vendors are keeping up and have added support for pkg:go, it's much easier for a developer to update a GitHub action (it could be as easy as clicking "merge" on an unassuming automated dependency update PR) than for somebody to update the deployed version of the service being used to monitor for security vulnerabilities in released products and go through any revalidation processes associated with changing a service involved in safeguarding customer data. Companies may be running very old services on internal networks if they don't perceive a need to upgrade them.

I agree that this is a likely case. When a new package type is added, I believe it is a reasonable expectation that the users will check if their scanners support it or the scanners will yell that the PURLs are not supported. Perhaps I am expecting too much. Either way, it looks like this problem, if it exists, exists for both pkg:go and pkg:cool-new-lang.

@puerco
Copy link

puerco commented Mar 19, 2025

It would be very useful if you could create some scenarios/examples that are representative of the concerns you have

Happy to. I think the point missing in all of this is to understand purls are embedded in all sorts of documents and databases across the supply chain. It is not just one tool's concern. A software supply chain tool will ingest multitudes of data from different generators and databases.

Some practical examples:

In general, component matching becomes a mess as you need to build - for go specifically - a compatibility layer whenever you want to match components, this is done all the time when working with SBOMs, attestations, databases. The fact that the new type does not break the other does not mean that supply chain security tools don't need to ingest both.

SBOM Enrichment or Augmentation:

One SBOM generator reads component data. Another produces licensing data. You need to merge the output of both. But now you can't. The inputs to both are the same: "A Go module". If one speaks golang: and the other speaks go: you need to translate the whole set of purls. As soon as purls in the new type start showing up, all the tools that augment and enrich will break until they add a compatibility layer.

CycloneDX

CycloneDX has one field, and one field only, for a purl (which IMHO it's the way it should be). When generating an SBOM you need to choose one schema to use. This means that your SBOM needs to pick one type and hope that everything downstream can handle both. If a tool downstream can't translate, you just lost supply chain data.

Asset Management Systems:

Asset management systems cataloging components from SBOMs will need to support both. Imagine when the new log4shell hits a go module and you need to find it:

  • Did the advisory use go: or golang:? You better understand both, or you are screwed.
  • Once read, in which one do you store the data?
  • And then check your asset databases for variants of the same module in go: and golang:. Again if your system understands just one and a supplier handed you an SBOM in the other, you're toast.

Vulnerability Advisories and VEX

Both advisories and VEX documents use the same input: "A go module". If advisories are published in go: but all the vex tooling understands golang then you cannot match the component data between the advisory and the VEX doc. This cannot be worked around in either cdx (see my point above) or OpenVEX (it is also designed to handle just one purl) . The only way to bridge this is hacking a compatibility layer in the ingestion logic.

Deduplication of Component Data

Say you merge two SBOMs and want to de-duplicate component data. Today you can just recurse the SBOM data and match, but now you would need to (only for Go) add another compatibility normalization before deduping components.

Anyway I could go on and on..

@jkowalleck
Copy link
Member

jkowalleck commented Mar 19, 2025

So we are again where we were 3months ago? (and you claimed you've read all comment?! )

Then read this again: #338 (comment)

Tldr: downstream implementations are not our domain. Their poor decisions are not our concern., and there are solutions for that.

@puerco
Copy link

puerco commented Mar 19, 2025

@jkowalleck :

we are here in the domain of specification.
we dont care if a change in the specification breaks downstream implementations due to poor design choices, since we are not in their domain.

downstream implementations are not our domain. Their poor decisions are not our concern.

I think the opposite is true. Steering a specification is a great responsibility. This is why, in general, steering members are elected from senior, experienced community members who grasp the consequences of their every move. You are in charge of fostering the adoption of the spec and ensuring decisions are made to keep a healthy community and ecosystem. In other words, you are responsible for poor implementations just as much as for those you may consider good ones. This change will break the data exchange for both.

Again, note that this is not a single tool's concern. It's about wrecking an ecosystem already exchanging data.

Regardless of what you think of the user's design, the new type will introduce a barrier that will blind tools using one type from the data already produced using the other and in the process, it will make naming go modules unreliable, possibly for years. We've done our part by providing feedback as adopters to fix the current type and not break the whole purl/go ecosystem. Feel free to take it or not.

@jkowalleck
Copy link
Member

(i am so fed up trying to understand your points).
Whatt is your solution, then? Yout last one was introducing breaking chanhes into an existing spec. Still up?

@matt-phylum
Copy link
Contributor

I agree that this is a likely case. When a new package type is added, I believe it is a reasonable expectation that the users will check if their scanners support it or the scanners will yell that the PURLs are not supported. Perhaps I am expecting too much. Either way, it looks like this problem, if it exists, exists for both pkg:go and pkg:cool-new-lang.

Yes, except that the user wasn't getting advisories for pkg:cool-new-lang before it existed and they will continue not getting advisories for pkg:cool-new-lang after its introduction. Users that are today getting advisories for pkg:golang may stop getting advisories for pkg:golang if their SBOM generation process switches to pkg:go without starting to get advisories for pkg:go. You could have installed everything and even tested to ensure that you receive notifications about vulnerabilities--you could even still be testing by periodically submitting a vulnerable SBOM to a test project--and this change could catch you by surprise.

For the software I work on, this would just be annoying. I can safely address it because it's a cloud-hosted solution and I am aware that the change might be happening and I can ensure that pkg:go is handled appropriately if it happens. I'm worried about other cloud solutions that are not watching this repository and particularly self-hosted solutions where it's unlikely anyone is watching or even aware of this repository.

Tldr: downstream implementations are not our domain. Their poor decisions are not our concern., and there are solutions for that.

This comment is confusing to me considering the #purl channel is on the OWASP CycloneDX Slack instance.

Is it a poor decision that CycloneDX doesn't provide a way for indicate that a pkg:go package may also known as a pkg:golang package? Does CycloneDX have a different way to provide that kind of backwards compatibility? Admittedly, I don't know very much about CycloneDX and what it can or can't do, but the component model has room for one PURL and has no alias fields that I can see. If there's no way for CycloneDX to represent that the package could be either PURL, introducing a new PURL for an existing package will cause interoperability problems.

Is it a poor decision that Dependency-Track, also an OWASP project, identifies components by a single PURL and has Go-specific behavior that activates for pkg:golang but not the pkg:go that may come to exist in the future? Does it reject components if it doesn't understand the PURL package type? I haven't tried giving it unknown PURLs to see what it looks like to a user. If it accepts unknown PURLs, the interoperability problems caused by introducing new PURL for an existing package may be difficult for a user to notice.

What is a good decision where this problem doesn't occur and how is that communicated to people designing systems that use PURL so this problem does not affect users? This PR is the most active, but there have been other requests to introduce a new package type for representing packages that are already supported by an existing type and any such PR creates the same potential for costly surprises.

@jkowalleck
Copy link
Member

I don't see much sense in all this back-and-forth, we are just repeating.

You're very much invited to join the fortnightly PURL community meeting to discuss your points: #377 - Otherwise, I don't see a reason to give them any attention.

@idunbarh
Copy link

The concerns raised by @matt-phylum resonate with me. Users not receiving vulnerability notifications is a significant potential issue.

We don’t care if a change in the specification breaks downstream implementations due to poor design choices, since we are not in their domain.

The specification changes will impact the community. While downstream implementations are not the responsibility of the specification, serious consideration should be given to those most at risk from vulnerabilities (e.g., users who aren’t patching regularly and are waiting for alerts).

@jkowalleck @zpavlinovic, do you have any thoughts on what a good mitigation strategy would be for transitioning from golang to go for producers and consumers of SBOMs?

@jkowalleck
Copy link
Member

jkowalleck commented Mar 20, 2025

do you have any thoughts on what a good mitigation strategy would be for transitioning from golang to go for producers and consumers of SBOMs?

just repeating #338 (comment)

lets play this whole evolution, for arbitrary SBOM generators in combination with DependencyTrack(DT), and all its related systems:

  • if it applies: OSS indexes, would add yet another package identifier to their list, for each existing go package. they have this for SWID, SWHID, PURL, so adding another purl is possible.
  • SBOM generators would either switch from golang to go - and call this a breaking change in their domain, or use feature flags to use one or the other in a non-breaking way.
    Until the SBOM-ingesting tools dont support the new go spec, users would not use the breaking version and stick with the old one, or they would simply not use the new feature flag - whatever applies.
    As soon as SBOM-ingesting tools support go, users could switch to the new behavior.
    So as long as DependencyTrack(DT) does not know go, users would generate the SBOMs like before and "ignore" the new feature, until DT supports it.
    Users that depend on other tools might use a differrent behaviour - the one that suites their needs.
  • the PURL-libraries would add support for parsing and canonicalizing go purls, just like they did for any other ecosystem.
    Eventually, the Java library is able to parse a go purl into the parts(namespace, name, ...), and craft a canonicalized purl from these parts, just like it can do for golang - so migrating from one to the other is no issue.
  • SBOM ingesting tools would simply add the capability to understand the new go and act on it in their needed way.
    For DependencyTrack(DT) this would depend on the PURL-Java library. DT crafts and parses purls, and uses them to match with existing OSS indexes.
    Until the OSS indexes dont support the new go purl, DT would convert the purls from go to golang, when matching purls to these indexes.
  • Eventually, we have the whole chain of tools and services capable of ingesting the new go purl.
    And at that point, every purl generator could default to use go. no more feature flags and hold-back needed.

You see, none of these steps required breaking changes, none of these steps are illusionary, all of these steps are how feature development in a stack of independent software worked since ages.
Are these steps not common sense, don't they come to you naturally? I think they do, and i am certain that every maintainer in that chain comes to the same idea, since they deal with change management all the time - they know their peers.

Most importantly, for users that dont patch anything, nothing will change for them. they still will use the old stack and everything will work like before.
But dont underestimate users. They are capable of reading docs, man pages, change logs, and they are able to ask for help, file tickets, etc.
I am sure users will find out when and how to transition from one tool to the other.

P.S.: I noticed in earlier comments that people seem fully aware of how a proper transition from a no-feature state to a fully rolled-out feature would function in their respective fields. Are you concerned that others might not understand this? With all due respect, you're not the only one with insight, so perhaps it's fair to trust that others are equally capable of understanding this.

@idunbarh
Copy link

I'm specifically stating that thought and communication with tool developers should exist to address @matt-phylum concerns in the transition from a golang to go type and the impact that would have with vuln discovery.

E.g. push the messaging that while most tools produce multiple IDs for components it would be even better multiple PURLs to identify components in SBOMs (yes I know that is not always supported in SBOM formats, we're a 100% CDX shop).

  • This enables "grace period" for tool developers to transition

Another example is also include a "official" mapping from golang to go types to support conversions within tooling who might have tooling that has yet to migrate.

With all due respect, you're not the only one with insight, so perhaps it's fair to trust that others are equally capable of understanding this.

Do not confuse my concern in helping improve the user experience and help address a path forward for your proposal with belittling you or end users. What I do see is a lack of addressing the concern brought up by @matt-phylum. Providing contingencies into your PR that address community concerns will help drive agreement.

@jkowalleck
Copy link
Member

jkowalleck commented Mar 21, 2025

Regarding communication: this is open-source, you do not know all your downstream users, but all know you.
The best solution I can think of is creating a "go" help-section in the GitHub discussions, and link this in the type-specs section for go and golang. There, we could kick off some expected questions, and then wait for community participation.

I would not want to suggest/dictate how the transition or adoption is to be made. it should be a natural process driven by the community. They will figure it out.

For example, when a new version of our favorite BOM standard is released, I implement them into libraries in a non-breaking fashion, and i implement them into BOM-generators and other tools using feature flags defaulting to of/false.
After a year or so, when i see that BOM ingesting tools/services have adopted/support the new BOM features, I charge the tool's feature flags to defaulting to on/true and release a new major version of the tools and describe the breaking changes in the release log.
If i can be responsible with my feature-adoption and releases, I think others can be too.

With all due respect, you're not the only one with insight, so perhaps it's fair to trust that others are equally capable of understanding this.

Do not confuse my concern in helping improve the user experience and help address a path forward for your proposal with belittling you or end users. What I do see is a lack of addressing the concern brought up by @matt-phylum. Providing contingencies into your PR that address community concerns will help drive agreement.

Sorry, this was not to you, @idunbarh , this was towards all those naysayers. those people that claimed to know how standards work and how this one will definitely break the community for they would not know how to deal with changes and adoption.

@jkowalleck jkowalleck requested a review from a team March 21, 2025 08:24
@puerco
Copy link

puerco commented Mar 21, 2025

Are these steps not common sense, don't they come to you naturally?

OK, no. They don't. But now I see where you're coming from:

So as long as DependencyTrack(DT) does not know go, users would generate the SBOMs like before and "ignore" the new feature, until DT supports it.

The problem you are not seeing is that not everything works in this simplistic way, where one person generates an SBOM and feeds it to DT (or similar).

Things are much more complex than that, SBOMs will be generated and provided by suppliers and you have no control over the software they use generate them, then you often need to use a mix of tools to achieve the document with the data you want. Also, as noted before, purls are in lots of other places beyond SBOMs such as databases, vulnerability scan results and other supply chain technologies and formats such as attestations, advisories, VEX and so on which will take years to move, if they ever do it.

So there is no magic feature flag that you can just flick to move everything over to the new schema.

@jkowalleck
Copy link
Member

SBOMs will be generated and provided by suppliers.
Suppliers you might have a contract with, which includes what and how they deliver - including the BOM, right?

Also, as noted before, purls are in lots of other places [...] which will take years to move, if they ever do it.

True, and fully understood. Who said that things should change from one day to the other. The more complex, the more management and time it takes. Been there, done that: I've worked in domains where I've accompanied change processes that took around 5 years to be fully effective.
But things will never change to a better, if there is no spec/option to do so. Let's remember why the community came up with the new go type-spec in the first place, and what users are gaining from it.

So there is no magic feature flag that you can just flick to move everything over to the new schema.

Oh, there is a "featue flag" for almost everything. Lets sketch one for you:
lets say you have a contract with some suppliers A and B, and both ships some software/hardware/whatever, and part of that contract is to provide a BOM with golang purls for this. You let A and B know that you would want to change from golang purls to go purls some time in the future, and eventually, when the parties are ready, you will contract that they supply BOM with go purls.
There is no need to ask for go purls when you are not ready, so this change is under your control - no breaking changes to expect here. And the same is for all your other tools and processes that issue/ingest purls - they are probably not gonna change magically themselves - no reason to panic.
Just needs communication and planning ahead. Embrace the change, give people options, they will find out how to transition, if they have a benefit from it.

This situation would not be different, if the breaking changes you suggested were implemented.
Then, you would need to change all your processes to support the new golang purl, and the old golang purl in parallel, until you can fully transition.
This proposal here at least gives you a notice: a different purl type go is kind of a feature flag itself :-)

Anyway, the new go purl-type is a non-breaking enhancement of the type-spec.
It introduces a superset of the existing golang type-spec - golang purls can be looseless migrated to go purls.

@matt-phylum
Copy link
Contributor

I doubt anyone is going to write into a contract that SBOMs use pkg:golang or pkg:go when talking about Go. The receiving company would need to be aware of the problem ahead of time and you'd need to go through legal and contract negotiations to get it fixed.

It'd be much easier in the case where you know you need one form or the other to apply a transform that rewrites the PURLs to all be the required form.

@puerco
Copy link

puerco commented Mar 24, 2025

I doubt anyone is going to write into a contract that SBOMs use pkg:golang or pkg:go when talking about Go.

I agree. Also, ensuring a sound technical proposal should not rely on legal assumptions. "Supplier" in the document exchange can mean any third party producing SBOMs, not necessarily as part of a contractual obligation. This is especially true for open source where there are no obligations whatsoever.

@zpavlinovic
Copy link

It would be very useful if you could create some scenarios/examples that are representative of the concerns you have

Happy to. I think the point missing in all of this is to understand purls are embedded in all sorts of documents and databases across the supply chain. It is not just one tool's concern. A software supply chain tool will ingest multitudes of data from different generators and databases.

I understand, but I also do think that the point missing right now is that the system is fundamentally broken.

Some practical examples:

In general, component matching becomes a mess as you need to build - for go specifically - a compatibility layer whenever you want to match components, this is done all the time when working with SBOMs, attestations, databases. The fact that the new type does not break the other does not mean that supply chain security tools don't need to ingest both.

This is already a huge mess with the current PURL. It allows for different modules to have the same PURL and for the same module to have multiple different PURLs.

SBOM Enrichment or Augmentation:

One SBOM generator reads component data. Another produces licensing data. You need to merge the output of both. But now you can't. The inputs to both are the same: "A Go module". If one speaks golang: and the other speaks go: you need to translate the whole set of purls. As soon as purls in the new type start showing up, all the tools that augment and enrich will break until they add a compatibility layer.

Again, the "translation" already needs to exist due to the mess the current Go PURL is creating. Correct matching is already a mess (theoretically impossible).

CycloneDX

CycloneDX has one field, and one field only, for a purl (which IMHO it's the way it should be). When generating an SBOM you need to choose one schema to use. This means that your SBOM needs to pick one type and hope that everything downstream can handle both. If a tool downstream can't translate, you just lost supply chain data.

I understand, but I still maintain the position that this will also happen when a PURL for a completely new, say, language is introduced.

Asset Management Systems:

Asset management systems cataloging components from SBOMs will need to support both. Imagine when the new log4shell hits a go module and you need to find it:

  • Did the advisory use go: or golang:? You better understand both, or you are screwed.
  • Once read, in which one do you store the data?
  • And then check your asset databases for variants of the same module in go: and golang:. Again if your system understands just one and a supplier handed you an SBOM in the other, you're toast.

Vulnerability Advisories and VEX

Both advisories and VEX documents use the same input: "A go module". If advisories are published in go: but all the vex tooling understands golang then you cannot match the component data between the advisory and the VEX doc. This cannot be worked around in either cdx (see my point above) or OpenVEX (it is also designed to handle just one purl) . The only way to bridge this is hacking a compatibility layer in the ingestion logic.

Deduplication of Component Data

Say you merge two SBOMs and want to de-duplicate component data. Today you can just recurse the SBOM data and match, but now you would need to (only for Go) add another compatibility normalization before deduping components.

Anyway I could go on and on..

See my previous comments on matching.

@zpavlinovic
Copy link

Yes, except that the user wasn't getting advisories for pkg:cool-new-lang before it existed and they will continue not getting advisories for pkg:cool-new-lang after its introduction.

I am not sure I understand the second part of the sentence. How would they continue not getting advisories?

@matt-phylum
Copy link
Contributor

If their software doesn't already support pkg:cool-new-lang, introducing pkg:cool-new-lang in this repository won't make their software pkg:cool-new-lang aware.

@zpavlinovic
Copy link

The concerns raised by @matt-phylum resonate with me. Users not receiving vulnerability notifications is a significant potential issue.

We don’t care if a change in the specification breaks downstream implementations due to poor design choices, since we are not in their domain.

The specification changes will impact the community. While downstream implementations are not the responsibility of the specification, serious consideration should be given to those most at risk from vulnerabilities (e.g., users who aren’t patching regularly and are waiting for alerts).

@jkowalleck @zpavlinovic, do you have any thoughts on what a good mitigation strategy would be for transitioning from golang to go for producers and consumers of SBOMs?

Here is how I see the situation. There are really three options going forward.

  1. Do nothing.
  2. Change the existing specification.
  3. Introduce a new type as in here.

I think option 1 is the worst one. I personally don't mind option 2 (if the definition of golang is what is currently proposed for go), but I also do believe that option is worse than 3. I believe that pretty much all concrete concerns people have here with option 3 (e.g., SBOM enrichment and augmentation example of #338 (comment)) will also manifest with 2. Things will start failing and then it will be even harder to detect how and why they are failing. Option 3 at least makes it clear what is failing: a new type is introduced that is not being recognized.

I would have to think more about the strategy, but here are a few thoughts.

  1. Producers should announce that they will start switching to go. After the grace period is finished, the default output will become go and golang output could be obtained via a flag or configuration.
  2. Consumers should be able to ingest both go and golang.

@zpavlinovic
Copy link

If their software doesn't already support pkg:cool-new-lang, introducing pkg:cool-new-lang in this repository won't make their software pkg:cool-new-lang aware.

Hm, it could be the case it took time for PURL specification to land for an existing software. Out of curiosity, how long did it take for Rust to get supported? I could see this still being a problem, but I also do see how this might be a bigger problem with a new type for an existing language.

@johnmhoran johnmhoran added this to the 1.0-draft milestone Apr 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Proposed new type PURL encoding type: golang Proposed new type as well as component discussions
Projects
None yet
9 participants