### GAP Data Analytics, Data Retrieval

This Jupyter Notebook allows for automation in the process of extracting GitHub statistics relevant for the redistribution of the GAP programming language. To extract data from the PyGithub API, it is first necessary to install the PyGitHub library. This library provides a Python wrapper for the GitHub REST API.

In [None]:
# Import sys module for various system-specific parameters and functions
# Exclude lines that are already satisfied using the grep search command
import sys
!{sys.executable} -m pip install numpy pandas matplotlib seaborn PyGithub | grep -v 'already satisfied'

# Import required libraries and packages
import requests
from datetime import datetime
from github import Github
from bs4 import BeautifulSoup

# Import modules from other project scripts
from utils import get_github_token

### Managing GitHub API Connection

Connecting to GitHub and verifying the user GitHub token is done through storing the access token as an environment variable. This way, the access token is not exposed in the script. The function for getting the token is imported from the utils file in the project. The API has a call limit of 5000 calls per hour, which creates the need to track the usage and remaining calls.

In [None]:
# Get the GitHub access token and create instance of the GitHub class
github_token = get_github_token()
if github_token:
    g = Github(github_token)

In [None]:
# Track the rate limit for GitHub compared to calls used, and see when the limit will reset
remaining_requests, request_limit = g.rate_limiting
print(f"Request Limit for API Calls: {request_limit}")
print(f"Remaining Requests for API Calls: {remaining_requests}")

limit_reset_time = g.rate_limiting_resettime
reset_time = datetime.fromtimestamp(limit_reset_time).strftime('%Y-%m-%d %H:%M:%S')
print(f"Reset Time for API Calls: {reset_time}")

### Generating GitHub GAP Statistics

Core statistical metrics relevant for the management of GAP from GitHub are provided below. These numbers are helpful in providing some foundational understanding of the current sitation of the programming language, in terms of development, distribution and redistribution.

In [None]:
# Number of GAP packages hosted in the gap-packages organisation on GitHub
org_name = "gap-packages"
org = g.get_organization(org_name)

# Get the number of repositories that are public
repos = org.get_repos(type="public")
total_packages = repos.totalCount
print(f"Number of GAP packages fra GAP Respository: {total_packages}")

In [None]:
# Number of GAP packages hosted elsewhere on GitHub
# The information is attempted gathered through the web scraping technique provided by Beautiful Soup
# NB: These numbers are only indicative and not completely accurate due to the webpage listing style, counts per parent list item
url = "https://gap-packages.github.io/"
response = requests.get(url)

# Create a BeautifulSoup object to parse the HTML content
soup = BeautifulSoup(response.text, "html.parser")

# Find the section of the webpage with the packages stored elsewhere on GitHub
section = soup.find("section", id="main-content")
heading = section.find(id="packages-hosted-elsewhere-on-github")
ul = heading.find_next("ul")

# Do not include any child elements that are ul or li, as not to let these increase the count
packages = ul.find_all("li", recursive=False)
count = len(packages)

print(f"Number of GAP packages hosted elsewhere on GitHub: {count}")

In [None]:
# Number of releases per repository managed by the gap-packages organisation on GitHub
for repo in repos:
    total_releases = 0
    releases = repo.get_releases()
    total_releases += releases.totalCount
    print(f"Total Releases for {repo}: {total_releases}")


In [None]:
# Age per respository managed by the gap-packages organisation on GitHub
for repo in repos:
    creation_date = repo.created_at.date()
    print(creation_date)
    current_date = datetime.now().date()
    print(current_date)
    age = (current_date - creation_date)
    print(f"Repository: {repo.name}, Age: {age.days} days")