## GAP Data Analytics, Distribution Monitoring

This Jupyter Notebook can be used to monitor various metrics for the GAP packages available on GitHub, as to ease and automate the process of supervising the packages and their development. The analytical framework is particularly intended to be useful in oversight for the purpose of redistribution. It is based on some key release metrics for current release workflows. It is automatically pulling information based on the GAP package distribution repository from GitHub, through metadata from all GAP packages. 

In [None]:
# Import required libraries and packages
import os
import sys
import json
import requests

# Get current working directory and append parent directory for module imports
cwd = os.getcwd()
parent_dir = os.path.dirname(cwd)
sys.path.append(parent_dir)

# Import modules from other project scripts
from data_constants import *

### Updates to Package Distribution

To check for more detailed updates to packages distributed under GAP, there is a repository under the gap-system organisation on GitHub that will scan for updates in the GitHub hosted packages that fall under GAP. As such, this information can be utilised to analyse what packages should be considered in a new release of the system. By comparing current versions to what is in the repository, analysis can be made to determine what the next version will look like.

In [None]:
# Define repositories that are public for gap-packages organisation on GitHub
org = g.get_organization(ORG_NAME_SYSTEM)
repo = org.get_repo(DISTRO_REPO)
labels = ["automatic pr", "new package", "update package"]

In [None]:
# Function to get the latest relesase of GAP and the commit for this version
def get_latest_release():
    repo_url = "https://api.github.com/repos/gap-system/PackageDistro/releases/latest"
    response = requests.get(repo_url)
    latest_release = response.json()
    latest_version = latest_release.get("name")
    version_commit = latest_release.get("target_commitish")
    return latest_version, version_commit

latest_gap_release, version_commit = get_latest_release()
print(f"The latest version of GAP is: {latest_gap_release} and it has the commit {version_commit}")

In [None]:
# Function to get the version from a meta.json file
def get_version_from_meta(meta_json_url):
    response = requests.get(meta_json_url)
    meta_json = response.json()
    version = meta_json.get("Version")
    return version

In [None]:
# Function to get all meta.json files and versions based on the branch
def get_meta(branch):
    api_url = f"https://api.github.com/repos/gap-system/PackageDistro/contents/packages?ref={branch}"
    response = requests.get(api_url)
    package_folders = response.json()

    meta_json_data = []
    for folder in package_folders:
        if folder.get("type") == "dir":
            package_name = folder.get("name")
            meta_json_url = f"https://raw.githubusercontent.com/gap-system/PackageDistro/{branch}/packages/{package_name}/meta.json"
            version = get_version_from_meta(meta_json_url)
            meta_json_data.append((package_name, version))

    return meta_json_data

In [None]:
# Function to retrieve pull requests with specified labels
def get_open_pull_requests(labels):
    api_url = "https://api.github.com/repos/gap-system/PackageDistro/pulls"

    params = {
        "state": "open",
        "labels": ",".join(labels)
    }

    response = requests.get(api_url, params=params)
    pull_requests = response.json()
    return pull_requests

In [None]:
# Get meta.json files and versions for the latest release
latest_meta = get_meta(get_latest_release()[1])

# Get meta.json files and versions for the main branch
main_meta = get_meta("main")

# Compare versions and print package names if they are different
# For the packages with different version, the package in the main branch will be the new version in the next release
packages_with_different_versions = []
for latest_package, latest_version in latest_meta:
    for main_package, main_version in main_meta:
        if latest_package == main_package and latest_version != main_version:
            packages_with_different_versions.append({
                'package_name': latest_package,
                'latest_version': latest_version,
                'main_branch_version': main_version
            })
            # print(f"Package: {latest_package}")
            # print(f"Latest Version: {latest_version}")
            # print(f"Main Branch Version: {main_version}")
            # print("---")

In [None]:
# Retrieve pull requests with specified labels and extract the package names from pull requests
pull_requests = get_open_pull_requests(labels)
package_names = {pr["head"]["ref"].split("/")[1] for pr in pull_requests}

In [None]:
# Packages in unmerged pull requests from the latest release
# Packages that were in the latest release that might also be in the next
in_latest_release_and_maybe_next = []
for package_name in package_names:
    for latest_package, latest_version in latest_meta:
        if package_name == latest_package:
            in_latest_release_and_maybe_next.append({
                'package': package_name,
                'latest_version': latest_version,
            })

In [None]:
# All packages that might be in the next release based on open PRs
all_maybe_next = []
print("Packages in unmerged pull requests:")
for package_name in package_names:
    all_maybe_next.append({
        "package": package_name,
    })

In [None]:
# Display alternative 1: Export collected data to JSON file to store them for later use and better overview
# This file will also be used to extract information later on in the analysis process

# Define the file path for the JSON file
data_folder = "collected_data"
data = {
    'packages_with_different_versions': packages_with_different_versions,
    'in_previous_and_maybe_next': in_latest_release_and_maybe_next,
    'all_maybe_next': all_maybe_next
}

# Create a file path for the JSON file, and add it to the data folder
file_path = os.path.join(data_folder, "distro_data.json")

# Write the data to the JSON file
with open(file_path, "w", encoding="utf-8") as f:
    json.dump(data, f, ensure_ascii=False, indent=4)

print("Distro data has been exported to the 'collected_data' folder.")