Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use SPDX license mapping for SPDX format #281

Open
nishakm opened this issue May 22, 2019 · 7 comments
Open

Use SPDX license mapping for SPDX format #281

nishakm opened this issue May 22, 2019 · 7 comments
Labels
feature new feature spdx Issues related to the SPDX formatting of Tern reports technical-debt Technical Debt - we should have addressed this right away but for "reasons" we deferred

Comments

@nishakm
Copy link
Contributor

nishakm commented May 22, 2019

Describe the Feature
Currently the default way to list the licenses is using the LicenseID and adding some text to it.
This can be augmented my using a mapping of known interpretations of these license strings.
This issue in spdx-tools-python enables this: spdx/tools-python#106

Once resolved, use this module to map out license strings to SPDX license formats.

@nishakm nishakm added feature new feature technical-debt Technical Debt - we should have addressed this right away but for "reasons" we deferred labels May 22, 2019
@nishakm nishakm added this to the Distant Future milestone May 22, 2019
@nishakm
Copy link
Contributor Author

nishakm commented Aug 13, 2019

Tabling until we can figure out if tern can aggregate SPDX documents produced by other tools.

@makefu
Copy link

makefu commented Dec 9, 2019

I am currently having the same problem, i want to import tern to dependency-track but most licenses are not detected correctly. i currently start building a license map which maps LicenseID to the appropriate spdx license field

@nishakm
Copy link
Contributor Author

nishakm commented Dec 9, 2019

@makefu This project was created to address this issue: https://github.com/spdx/package-licenses-mapping. It's going to take a little while to create the mappings to all known licenses. PRs welcome :)

@makefu
Copy link

makefu commented Dec 9, 2019

@nishakm one issue i encountered when i started my own mapping is that a couple of LicenseIDs are not accurate enough to map to a single SPDX identifier (e.g. "gpl","lgpl+","openldap", "cc-by" or even "gplv2 with exceptions" as there are different exceptions possible). At least this is what i encountered rpm-based containers.
in addition to try to map legacy entries, it may be a good idea to contact distribution systems and clean up their database to use SPDX in first place.
Another option could be a license database for packages+versions to their current SPDX license.

As soon as there is some content in the repository i will consider creating PRs to add my findings 👍

@timovandeput
Copy link

The licenses in the SPDX tag-value SBOM output currently still use custom licenses what reference a definition like these examples:

LicenseID: LicenseRef-c66410f
ExtractedText: <text>Original license: GPL-2.0-only</text>
LicenseID: LicenseRef-1eaea05
ExtractedText: <text>Original license: ISC</text>
LicenseID: LicenseRef-f266d93
ExtractedText: <text>Original license: BSD</text>

This makes it quite a challenge for automated tools to interpret package licenses from this SBOM format, although the information appears to be available in the default Tern JSON output.

Are there any plans to address this, or perhaps make life easier for automated tools that rely on SPDX input by providing the approximate SPDX license identifier as "licenseName" field?

@rnjudge
Copy link
Contributor

rnjudge commented Apr 1, 2021

@timovandeput

The licenses in the SPDX tag-value SBOM output currently still use custom licenses what reference a definition like these examples:

LicenseID: LicenseRef-c66410f
ExtractedText: <text>Original license: GPL-2.0-only</text>
LicenseID: LicenseRef-1eaea05
ExtractedText: <text>Original license: ISC</text>
LicenseID: LicenseRef-f266d93
ExtractedText: <text>Original license: BSD</text>

This makes it quite a challenge for automated tools to interpret package licenses from this SBOM format, although the information appears to be available in the default Tern JSON output.

Sorry, can you clarify what information "appears to be available in the default Tern JSON output" that's not available in the SPDX reports?

Are there any plans to address this, or perhaps make life easier for automated tools that rely on SPDX input by providing the approximate SPDX license identifier as "licenseName" field?

See discussion above for challenges surrounding this. A mapping needs to exist before Tern can draw conclusions about what SPDX license might correspond to the custom licenses found.

Is the example you provided an excerpt of licenses from a debian-based image by chance? Debian images get their license info using the debian-inspector library by parsing debian copyrights as debian package licenses are not available to collect using the package manager. Because these licenses are parsed from copyright text, its not always a straightforward task to translate them to SPDX licenses and this is where we see the most variance between the license text and what the corresponding SPDX license might be. However, this is also true for other base images.

Looking at the licenseName field more in the SPDX spec, it seems like this field is appropriate "if license is not on the SPDX license list" which doesn't seem right for the examples you provided because they are all licenses on the SPDX license list (with the exception of BSD, which doesn't specify a version). I think PackageLicenseDeclared is what we would aim for.

Perhaps @pombredanne can weigh in if there's plans/it's possible to map debian licesnes found via debian-inspector to SPDX licenses?

UPDATE: Looks like there's been lots of discussion on this already here: spdx/package-licenses-mapping#1

@pombredanne
Copy link

Perhaps @pombredanne can weigh in if there's plans/it's possible to map debian licenses found via debian-inspector to SPDX licenses?

In the end we ended up dropping most mappings we were using in ScanCode, as they are in most cases not enough.
In particular for Debian copyright files, where they are mostly incorrect because of the nature of these files.

See: nexB/scancode-toolkit#1895 (comment) which I am repasting partially here:

  • Debian copyright files: there the declared license code have no global meaning, therefore a mapping has no value. MIT may mean X11 in one copyright file, MIT/Expat in another file, of some old style MIT in yet another copyright file. Therefore the only practical solution is rather more involved than a mapping and requires parsing, coupled with detection and fine understanding of the structure of these files and this has been implemented in packagedcode/debian_copyright.py ... the only mapping is for the 10 or so common licenses and this is the most trivial part of getting things correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature new feature spdx Issues related to the SPDX formatting of Tern reports technical-debt Technical Debt - we should have addressed this right away but for "reasons" we deferred
Projects
None yet
Development

No branches or pull requests

5 participants