## GAP Data Analytics, Data Retrieval

This Jupyter Notebook can be used to monitor various metrics for the GAP packages available on GitHub, as to ease and automate the process of supervising the packages and their development. The analytical framework is particularly intended to be useful in oversight for the purpose of redistribution.  

In [3]:
# Import required libraries and packages
import os
import sys
import pandas as pd

# Get current working directory and append parent directory for module imports
cwd = os.getcwd()
parent_dir = os.path.dirname(cwd)
sys.path.append(parent_dir)

# Import modules from other project scripts
from data_constants import *

### Package Monitoring for Release Purposes

There are several metrics that can be useful to track in the process of evaluating the state of GAP packages, from a perspective of redistribution. While some give an indication of whether a package is close to release, others can provide useful insight as to the state of the package. For example, ussues, labels and keywords, can be compared and analysed, while open pull requests might be a good indication that a repository is not likely to be released in the immediate future.

In [2]:
# Define repositories that are public for gap-packages organisation on GitHub
org = g.get_organization(ORG_NAME_PACKAGES)
repos = org.get_repos(type="public")

In [None]:
# Function to get information on bugs and enhancement opportunities for the packages
def check_release_status(repo):
    open_issues = repo.get_issues(state='open')
    
    open_issues_count = open_issues.totalCount
    bug_count = 0
    enhancement_count = 0

    for issue in open_issues:
        labels = [label.name for label in issue.labels]
        if 'bug' in labels:
            bug_count += 1
        if 'enhancement' in labels:
            enhancement_count += 1

    if bug_count > 0 or enhancement_count > 0:
        return repo.name, open_issues_count, bug_count, enhancement_count
    else:
        return repo.name, open_issues_count, 0, 0

In [None]:
# Function to get the total number of Pull Requests (PRs), as well as the numbers for open and closed PRs respectively
def pull_request_status(repo):
    pull_requests = repo.get_pulls(state='all')
    
    total_pull_requests = pull_requests.totalCount
    open_pull_requests = repo.get_pulls(state='open').totalCount
    closed_pull_requests = repo.get_pulls(state='closed').totalCount

    return repo.name, total_pull_requests, open_pull_requests, closed_pull_requests

In [None]:
# Display alternative 1: Printing out information
# Generate monitoring information for all repositories managed by the gap-packages organisation on GitHub
for repo in repos:
        # Call function for total issues information
        repo_name, open_issues_count, bug_count, enhancement_count = check_release_status(repo)
        if bug_count > 0 or enhancement_count > 0:
                print(f"The repository {repo_name} has open bug and enhancement issues.")
                print(f"Total open issues: {open_issues_count}")
                print(f"Open bug issues: {bug_count}")
                print(f"Open enhancement issues: {enhancement_count}")
        else:
                print(f"The repository {repo_name} has no open bug or enhancement issues.")
                print(f"Total open issues: {open_issues_count}")
        
        # Call function for total PRs information
        repo_name, total_pull_requests, open_pull_requests, closed_pull_requests = pull_request_status(repo)
        print(f"Total Pull Requests for {repo_name}: {total_pull_requests}")
        print(f"Open Pull Requests for {repo_name}: {open_pull_requests}")
        print(f"Closed Pull Requests for {repo_name}: {closed_pull_requests}")

In [None]:
# Display alternative 2: Using a Pandas DataFrame for more user-friendly formatting
# Generate monitoring information for all repositories managed by the gap-packages organisation on GitHub
repo_info = []

for repo in repos:
    repo_dict = {}
    repo_dict['Name'] = repo.name
    
    # Get and display total issues
    repo_name, open_issues_count, bug_count, enhancement_count = check_release_status(repo)
    repo_dict['Open Issues'] = open_issues_count
    repo_dict['Bugs'] = bug_count
    repo_dict['Enhancements'] = enhancement_count

    # Get and display total PRs
    repo_name, total_pull_requests, open_pull_requests, closed_pull_requests = pull_request_status(repo)
    repo_dict['Total Pull Requests'] = total_pull_requests
    repo_dict['Open Pull Requests'] = open_pull_requests
    repo_dict['Closed Pull Requests'] = closed_pull_requests
 
    # Add the current repo info to the list of repo info
    repo_info.append(repo_dict)

df = pd.DataFrame(repo_info)
df = df.sort_values(by='Open Issues', ascending=False)

pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)

print(df)