## GAP Data Analytics, Community Study

This Jupyter Notebook is intended to provide a deeper understanding of the community behind GAP distributed through GitHub, by studying the members developing, releasing and collaborating on GAP packages on GitHub, to gather valuable information on their collaboration trends and patterns. In the interest of privacy, the real values of contributor usernames are hashed upon extraction. The hash value is then the variable used to compute and generate statistical data analysis.

In [None]:
# Import required modules and libraries
import os
import sys
import hashlib

# Get current working directory and append parent directory for module imports
cwd = os.getcwd()
parent_dir = os.path.dirname(cwd)
sys.path.append(parent_dir)

# Import modules from other project scripts
from data_constants import *


### Studying the community

Several variables related to autors and collaborations can provide valuable input on how the community behind GAP functions, and what dependencies might exist. Further investigating the frequency of contributions, who contributes to what and where connections are made yields an understanding of who the people behind the GAP packages are, how the collaborate and what the trends point to.

In [None]:
# Define global variables for the Jupyter Notebook
org = g.get_organization(ORG_NAME_PACKAGES)
repos = org.get_repos(type="public")


##### Functions to Retrieve Community Metrics

In [None]:
def hash_author_name(author_name: str) -> str:
    """Hashes the author name upon retrieval, using the SHA-256 algorithm.

    Args:
        author_name (str): The author name to be hashed.

    Returns:
        str: The hash value of the author name.
    """
    return hashlib.sha256(author_name.encode()).hexdigest()


##### Get and Display Community Metrics

In [None]:
# Get information on collaborators that contributed to more than one repository
# Set to store the hash values of collaborators, to avoid duplicates and make sure values cannot be changed
contributors = set()
repository_counts = {}

for repo in repos:
    repo_contributors = set(contributor.login for contributor in repo.get_contributors())
    for username in repo_contributors:
        hashed_name = hash_author_name(username)
        repository_counts[hashed_name] = repository_counts.get(hashed_name, 0) + 1

count = sum(count > 1 for count in repository_counts.values())

print(f"Number of collaborators who contributed to more than one repository: {count}")
