## GAP Data Analytics, Distribution Monitoring

This Jupyter Notebook can be used to extract metrics for monitoring purposes, mainly based on the PackageDistro repository from the gap-system organisation on GitHub. As the distribution of GAP is managed through this repository, pulling and analysing data from its current release workflows can be useful in obtaining an overview for the purpose of redistribution. The PackageDistro repository is automatically accessing information on GAP package distribution, as it contains metadata from all the GAP packages. 

In [None]:
# Import required modules and libraries
import os
import sys
import json
import requests

# Get current working directory and append parent directory for module imports
cwd = os.getcwd()
parent_dir = os.path.dirname(cwd)
sys.path.append(parent_dir)

# Import modules from other project scripts
from data_constants import *

### Updates to Package Distribution

To check for detailed updates to packages distributed under GAP, the PackageDistro repository will scan for updates in GAP packages hosted GitHub. As such, extracting, analysing and combining this information can provide data what packages could and should be considered in a new release of the system. By comparing current versions to what is in the repository, the user can obtain some predictions as to what the next released GAP version will look like. Running the script will export the data to a 'distro_data.json' file in the 'collected_data' folder, displaying the results of the generated data per package.

In [None]:
# Define global variables for the Jupyter Notebook
org = g.get_organization(ORG_NAME_SYSTEM)
repo = org.get_repo(DISTRO_REPO)
labels = ["automatic pr", "new package", "update package"]

##### Functions to Retrieve Monitoring Metrics

In [None]:
# Function to get the latest relesase of GAP and the commit for this version
def get_latest_release():
    repo_url = "https://api.github.com/repos/gap-system/PackageDistro/releases/latest"
    response = requests.get(repo_url)
    latest_release = response.json()
    latest_version = latest_release.get("name")
    version_commit = latest_release.get("target_commitish")
    return latest_version, version_commit

In [None]:
# Function to get the listed version from a meta.json file
def get_version_from_meta(meta_json_url):
    response = requests.get(meta_json_url)
    meta_json = response.json()
    version = meta_json.get("Version")
    return version

In [None]:
# Function to get all meta.json files and versions based on the branch
def get_meta(branch):
    api_url = f"https://api.github.com/repos/gap-system/PackageDistro/contents/packages?ref={branch}"
    response = requests.get(api_url)
    package_folders = response.json()

    meta_json_data = []
    for folder in package_folders:
        if folder.get("type") == "dir":
            package_name = folder.get("name")
            meta_json_url = f"https://raw.githubusercontent.com/gap-system/PackageDistro/{branch}/packages/{package_name}/meta.json"
            version = get_version_from_meta(meta_json_url)
            meta_json_data.append((package_name, version))

    return meta_json_data

In [None]:
# Function to retrieve pull requests with specified labels
def get_open_pull_requests(labels):
    api_url = "https://api.github.com/repos/gap-system/PackageDistro/pulls"

    params = {
        "state": "open",
        "labels": ",".join(labels)
    }

    response = requests.get(api_url, params=params)
    pull_requests = response.json()
    return pull_requests

##### Get and Display Monitoring Metrics

In [None]:
# Get the latest relesase of GAP and the commit for this version
latest_gap_release, version_commit = get_latest_release()
print(f"The latest version of GAP is: {latest_gap_release} and it has the commit {version_commit}")

In [None]:
# Get meta.json files and versions for the latest release
latest_meta = get_meta(get_latest_release()[1])

# Get meta.json files and versions for the main branch
main_meta = get_meta("main")

# Compare versions and print package names if they are different
# For the packages with different version, the package in the main branch will be the new version in the next release
packages_with_different_versions = []
for latest_package, latest_version in latest_meta:
    for main_package, main_version in main_meta:
        if latest_package == main_package and latest_version != main_version:
            packages_with_different_versions.append({
                'package_name': latest_package,
                'latest_version': latest_version,
                'main_branch_version': main_version
            })

In [None]:
# Find the packages in unmerged PRs, as these may be in the next release but have not yet been merged
# Only retrieve PRs with specified labels and extract the package names, as these labels indicate release relation
pull_requests = get_open_pull_requests(labels)
package_names = {pr["head"]["ref"].split("/")[1] for pr in pull_requests}

# Get the packages that were in the latest release and might also be in the next, based on PR logic
in_latest_release_and_maybe_next = []
for package_name in package_names:
    for latest_package, latest_version in latest_meta:
        if package_name == latest_package:
            in_latest_release_and_maybe_next.append({
                'package': package_name,
                'latest_version': latest_version,
            })

In [None]:
# Get all packages that might be in the next release based on open PRs, regardless of labels
all_maybe_next = []
for package_name in package_names:
    all_maybe_next.append({
        "package": package_name,
    })

In [None]:
# Export collected data to JSON file to store them for later use and better overview
data_folder = "collected_data"
data = {
    'packages_with_different_versions': packages_with_different_versions,
    'previous_and_maybe_next_labels': in_latest_release_and_maybe_next,
    'all_previous_and_maybe_next': all_maybe_next
}

# Create a file path for the JSON file, and add it to the data folder
file_path = os.path.join(data_folder, "distro_data.json")

# Write the data to the JSON file
with open(file_path, "w", encoding="utf-8") as f:
    json.dump(data, f, ensure_ascii=False, indent=4)

print("Distro data has been exported to the 'distro_data' file in the 'collected_data' folder.")