# Third party licences and SBOM

## Table of content (ToC)<a class="anchor" id="TOC"></a>
* <a href="#bullet1">1 - Introduction</a>
* <a href="#bullet2">2 - Generating the license details</a>
    * <a href="#bullet2x1">2.1 - Installing pip-licenses</a>
    * <a href="#bullet2x2">2.2 - Create an overview of licenses</a> 
    * <a href="#bullet2x3">2.3 - Create the full SBOM</a>
    * <a href="#bullet2x4">2.4 - Make SBOM based on requirements.txt</a>
    * <a href="#bullet2x5">2.5 - Verify the dependency graph</a>
* <a href="#bullet4">4 - Attribution and footnotes</a>
* <a href="#bullet5">5 - Required libraries</a>
* <a href="#bullet6">6 - Notebook version</a>


#  1 - Introduction <a class="anchor" id="bullet1"></a>
##### [Back to ToC](#TOC)

This notebook automates the collection of license details for the third party libraries used by Morphkit.

# 2 - Generating the license details<a class="anchor" id="bullet2"></a>
##### [Back to ToC](#TOC)

## 2.1 - Installing pip-licenses <a class="anchor" id="bullet2x1"></a>

In this notebook I do rely on the metadata used by `pip` as being authorative. In order to access it in an automated manner, I do need the package 'pip-licenses' being installed in my active environment. The following cell only requires to be run once within a specific Conda environment.

## 2.2 - Create an overview of licenses <a class="anchor" id="bullet2x2"></a>

Loop over third_party with distribution(); treat stdlib as “Python Software License / PSF License”. This will provide

In [7]:
stdlib = {"typing", "re", "pprint", "copy", "time", "urllib.parse"}
third_party = {"betacode", "requests"}

import sys, importlib.util, importlib.metadata as im

def is_stdlib(module_name):
    return module_name in sys.stdlib_module_names

def record_for(name):
    if is_stdlib(name.split('.')[0]):                 # handle sub-modules
        return {"name": name, "version": sys.version.split()[0],
                "license": "Python Software License"}
    try:
        dist = im.distribution(name.replace('-', '_'))
        return {"name": dist.metadata["Name"],
                "version": dist.version,
                "license": dist.metadata.get("License", "")}
    except im.PackageNotFoundError:
        return {"name": name, "error": "not installed"}


In [1]:
# This is the list of packages that I am interested in.
packages = [
    "beta-code",         # external
    "urllib.parse",      # std-lib sub-module
    "requests",          # external
    "typing", "re",      # std-lib
    "pprint", "copy", "time"
]

import sys, json, textwrap
from importlib import metadata
from pathlib import Path

# ---------------------------------------------------------------------------
def is_stdlib(modname: str) -> bool:
    """True if 'modname' is a standard-library module that ships with CPython.
    I am relying here on stlbib (https://stdlibs.omnilib.dev/en/stable/index.html)
    this lib is part of sys.
    """
    return modname in sys.stdlib_module_names

def read_license_file(dist: metadata.Distribution) -> str:
    """Return the first bundled LICENSE / COPYING / NOTICE text, or ''."""
    for f in dist.files or []:       # pathlib.Path objects
        if f.name.lower().startswith(("license", "copying", "notice")):
            try:
                return (dist.locate_file(f)).read_text(encoding="utf-8")
            except Exception:
                pass
    return ""

records = []

for raw_name in packages:
    top = raw_name.split(".")[0].replace("-", "_")   # strip sub-module & dashes

    # ---------- standard library ------------------------------------------------
    if is_stdlib(top):
        records.append({
            "name":   raw_name,
            "kind":   "stdlib",
            "version": sys.version.split()[0],
            "license": "Python Software License"
        })
        continue

    # ---------- third-party distribution ----------------------------------------
    try:
        dist = metadata.distribution(top)
        meta = dist.metadata
        records.append({
            "name":    meta["Name"],
            "kind":    "third-party",
            "version": dist.version,
            "summary": meta.get("Summary", ""),
            "home_page": meta.get("Home-page", ""),
            "license":  meta.get("License", "") or
                        "; ".join(c for c in meta.get_all("Classifier", [])
                                   if "License ::" in c),
            "license_text": read_license_file(dist)
        })
    except metadata.PackageNotFoundError:
        records.append({
            "name": raw_name,
            "kind": "missing",
            "error": "package not installed in this environment"
        })

# ---------------------------------------------------------------------------
# Save & display
out_path = Path("package_info.json").resolve()
out_path.write_text(json.dumps(records, indent=2, ensure_ascii=False), "utf-8")

print(json.dumps(records, indent=2, ensure_ascii=False))
print(f"\n wrote full JSON to {out_path}")


[
  {
    "name": "beta-code",
    "kind": "third-party",
    "version": "1.1.0",
    "summary": "Converts Greek Beta Code to Greek characters and vice versa",
    "home_page": "https://github.com/perseids-tools/beta-code-py",
    "license": "UNKNOWN",
    "license_text": "The MIT License (MIT)\n\nCopyright (c) 2018\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT 

## 2.3 - Create the full SBOM <a class="anchor" id="bullet2x3"></a>

The following produces the SBOM (**S**oftware **B**ill **o**f **M**aterial) for the full active conda environment this notebook is running in. It will also add all Jupyter related packages.

In [3]:
import subprocess, sys, pathlib, datetime as dt, textwrap, importlib.metadata as im

# Here I define what choose what to scan (“env” = everything installed in this interpreter)
SUBCOMMAND = "env"          # or "venv", "requirements", "poetry", …

ts        = dt.datetime.now().strftime("%Y%m%d-%H%M%S")
sbom_path = pathlib.Path.cwd() / f"morphkit_{ts}.cdx.json"

# First check that to make sure cyclonedx-bom is up to date
subprocess.run(
    [sys.executable, "-m", "pip", "install", "--upgrade", "--quiet", "cyclonedx-bom"],
    check=True
)
print("cyclonedx-bom", im.version("cyclonedx-bom"), "ready\n")

# Now we can run the CLI:  python -m cyclonedx_py  <subcommand>  -o  <file>
cmd = [sys.executable, "-m", "cyclonedx_py", SUBCOMMAND, "-o", str(sbom_path)]
print("Running:", " ".join(cmd), "\n")

result = subprocess.run(cmd, capture_output=True, text=True)

if result.returncode == 0:
    print("SBOM written to", sbom_path.resolve(), "\n")
    head = sbom_path.read_text(encoding="utf-8").splitlines()[:20]
    print("— first 20 lines —")
    print("\n".join(textwrap.shorten(l, width=120) for l in head))
else:
    print("Cylonedx-py failed (exit", result.returncode, ")")
    print("\nSTDERR:\n", result.stderr or "(empty)")


cyclonedx-bom 6.1.1 ready

Running: C:\Users\tonyj\anaconda3\envs\Text-Fabric\python.exe -m cyclonedx_py env -o C:\Users\tonyj\OneDrive\Documents\GitHub\morphkit\morphkit_20250603-131540.cdx.json 

SBOM written to C:\Users\tonyj\OneDrive\Documents\GitHub\morphkit\morphkit_20250603-131540.cdx.json 

— first 20 lines —
{
"components": [
{
"bom-ref": "Babel==2.14.0",
"description": "Internationalization utilities",
"externalReferences": [
{
"comment": "PackageSource: Local",
"type": "distribution",
"url": "file:///home/conda/feedstock_root/build_artifacts/babel_1702422572539/work"
},
{
"comment": "from packaging metadata Project-URL: Source",
"type": "other",
"url": "https://github.com/python-babel/babel"
},
{
"comment": "from packaging metadata: Home-page",
"type": "website",
"url": "https://babel.pocoo.org/"


It is doubtfull if this is relevant to be added to the repository, so I leave this one out.

## 2.4 - Make SBOM based on requirements.txt <a class="anchor" id="bullet2x4"></a>

In the next cell we create  a requirements.txt–driven SBOM, which is MUCH shorter and does probably make much more sense.

In [4]:
import subprocess, sys, pathlib, datetime as dt, textwrap

req_file  = pathlib.Path("requirements.txt")          # runtime deps only
ts        = dt.datetime.now().strftime("%Y%m%d-%H%M%S")
sbom_file = pathlib.Path(f"morphkit_{ts}.cdx.json").resolve()

# (re-)install or upgrade cyclonedx-bom in THIS kernel’s env
subprocess.run(
    [sys.executable, "-m", "pip", "install", "--upgrade", "--quiet", "cyclonedx-bom"],
    check=True
)

cmd = [
    sys.executable, "-m", "cyclonedx_py",
    "requirements", str(req_file),        # ① sub-command + input
    "--of", "json",                       # ② output-format
    "-o", str(sbom_file)                  # ③ output file
]

print("Running:", " ".join(cmd))
result = subprocess.run(cmd, capture_output=True, text=True)

if result.returncode == 0:
    print("CycloneDX SBOM written to", sbom_file)
    print("--- first 15 lines ---")
    print("\n".join(textwrap.shorten(l, width=120)
                    for l in sbom_file.read_text(encoding="utf-8").splitlines()[:15]))
else:   # adding this is realy usefull for debug reasons
    print("cyclonedx-py failed →", result.stderr or "(no stderr)")


Running: C:\Users\tonyj\anaconda3\envs\Text-Fabric\python.exe -m cyclonedx_py requirements requirements.txt --of json -o C:\Users\tonyj\OneDrive\Documents\GitHub\morphkit\morphkit_20250603-131544.cdx.json
CycloneDX SBOM written to C:\Users\tonyj\OneDrive\Documents\GitHub\morphkit\morphkit_20250603-131544.cdx.json
--- first 15 lines ---
{
"components": [
{
"bom-ref": "requirements-L1",
"description": "requirements line 1: beta-code>=1.1.1",
"externalReferences": [
{
"comment": "implicit dist url",
"type": "distribution",
"url": "https://pypi.org/simple/beta-code/"
}
],
"name": "beta-code",
"purl": "pkg:pypi/beta-code",
"type": "library"


## 2.5 - Verify the dependency graph <a class="anchor" id="bullet2x5"></a>

After generation of the SBOM we can open it and check for dependencies:

In [6]:
import json, pathlib, textwrap

sbom_file = pathlib.Path("morphkit_20250603-122958.cdx.json") 
bom       = json.loads(sbom_file.read_text())

# index for quick lookup
by_ref = {c["bom-ref"]: c for c in bom.get("components", [])}

print(f"Components in {sbom_file.name}:")
for ref, comp in by_ref.items():
    print(f"- {comp.get('name', ref)}  {comp.get('version', '')}")
    print(f"  id:  {ref}")
    if comp.get("licenses"):
        lic = comp["licenses"][0]["license"].get("id",
              comp["licenses"][0]["license"].get("name"))
        print("  license:", lic)
    print()
    
print("Dependency graph:")
for edge in bom.get("dependencies", []):
    parent = by_ref.get(edge["ref"], {"name": edge["ref"]})
    print(f"{parent['name']}:")
    for child_ref in edge.get("dependsOn", []):
        child = by_ref.get(child_ref, {"name": child_ref})
        print(f" └─ {child['name']} {child.get('version','')}")


Components in morphkit_20250603-122958.cdx.json:
- beta-code  
  id:  requirements-L1

- pprint  
  id:  requirements-L3

- requests  
  id:  requirements-L2

Dependency graph:
beta-code:
requests:
pprint:


# 4 - Attribution and footnotes <a class="anchor" id="bullet4"></a>
##### [Back to ToC](#TOC)

- [stdlibs: Simple list of top-level packages in Python’s stdlib](https://stdlibs.omnilib.dev/en/stable/index.html)
- [CycloneDX: The International Standard for Bill of Materials (ECMA-424)](https://cyclonedx.org/)

# 5 - Required libraries<a class="anchor" id="bullet5"></a>
##### [Back to ToC](#TOC)

The scripts in this notebook depend on the following libraries installed in the environment:

    cyclonedx_py
    importlib
    pip-licenses
    
You can install any missing library from within Jupyter Notebook using either`pip` or `pip3`.

# 6 - Notebook version<a class="anchor" id="bullet6"></a>
##### [Back to ToC](#TOC)

<div style="float: left;">
  <table>
    <tr>
      <td><strong>Author</strong></td>
      <td>Tony Jurg</td>
    </tr>
    <tr>
      <td><strong>Version</strong></td>
      <td>1.0</td>
    </tr>
    <tr>
      <td><strong>Date</strong></td>
      <td>2 June 2025</td>
    </tr>
  </table>
</div>