## GAP Data Analytics, Community Study

This Jupyter Notebook allows for studying the community developing, releasing and collaborating on GAP packages on GitHub more closely, in order to gather valuable information on their collaboration trends and patterns, indicating what areas could be further investigated.

In [None]:
# Import required libraries and packages
import os
import sys
import hashlib

# Get current working directory and append parent directory for module imports
cwd = os.getcwd()
parent_dir = os.path.dirname(cwd)
sys.path.append(parent_dir)

# Import modules from other project scripts
from data_constants import *

### Studying the community

Several variables related to autors and collaborations can provide valuable input on how the community behind GAP functions, and what dependencies might exist. In the interest of preserving the privacy of the authors in the GAP community, and because this notebook aims to provide statistics on trends rather than point to individual autors, the usernames related to GAP packages on GitHub are hashed upon retrieval.

In [None]:
# Define repositories that are public for gap-packages organisation on GitHub
org = g.get_organization(ORG_NAME_PACKAGES)
repos = org.get_repos(type="public")

In [None]:
# Function to hash author names upon retrieval
def hash_author_name(author_name):
    return hashlib.sha256(author_name.encode()).hexdigest()

In [None]:
# Get information on collaborators that contributed to more than one repository
# Set to store the hash values of collaborators, to avoid duplicates and make sure values cannot be changed
contributors = set()
repository_counts = {}

for repo in repos:
    repo_contributors = set(contributor.login for contributor in repo.get_contributors())
    for username in repo_contributors:
        hashed_name = hash_author_name(username)
        repository_counts[hashed_name] = repository_counts.get(hashed_name, 0) + 1

count = sum(count > 1 for count in repository_counts.values())

print(f"Number of collaborators who contributed to more than one repository: {count}")