# GitHub API Tutorial

**Overview:**
In this notebook you'll learn how to:
- Connect to the GitHub API using a Python client.
- Retrieve valuable repository insights such as commit history, pull request statistics, and contributor details.
- Perform analytics on repository activity over a given time frame.

**Why Use This Notebook?**
- Automate repository monitoring for contributions and updates.
- Gain insights into open, closed, and unmerged pull requests.
- Track commit frequency and user contributions.

**Requirements:**
- A valid GitHub Personal Access Token (PAT) with appropriate permissions.

## Setup

Before proceeding with API calls, ensure that your environment is correctly set up.

### 1. Install Dependencies
You need to install `PyGithub`, and `tqdm` to interact with GitHub and track progress.

In [None]:
%pip install PyGithub tqdm


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49m/opt/homebrew/Cellar/jupyterlab/4.2.5_1/libexec/bin/python -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


### 2. Import Required Modules
Import the necessary libraries:

In [None]:
import os
import logging
import utils
import pandas as pd
from github import Github
from datetime import datetime, timedelta

# Enable logging.
logging.basicConfig(level=logging.INFO)
_LOG = logging.getLogger(__name__)

### 3. Set Up GitHub Authentication
Store your **GitHub Personal Access Token (PAT)** as an environment variable for security. You can do this in your terminal:

```sh
export GITHUB_ACCESS_TOKEN="your_personal_access_token"
```

Alternatively, you can set it within the notebook:

In [None]:
# Set your GitHub access token here.
os.environ["GITHUB_ACCESS_TOKEN"] = "your_personal_access_token"

# Retrieve it when needed.
access_token = os.getenv("GITHUB_ACCESS_TOKEN")

# Ensure the token is set correctly.
if not access_token:
    raise ValueError("GitHub Access Token is not set. Please configure it before proceeding.")

Now, you're ready to interact with the GitHub API!

## Define Config
Here we define all parameters in a single `config` dictionary.
You can easily modify:
- The `org_name` to analyze a different GitHub organization.
- The `start_date` and `end_date` to change the timeframe.

In [None]:
# Define the configuration settings.
config = {
    # Replace with actual GitHub organization or username.
    "org_name": "causify-ai",  
    "start_date": (datetime(2025, 1, 20)),
    "end_date": (datetime(2025, 2, 25)),
    # Load from environment variable.
    "access_token": access_token,  
}

## Initialize GitHub Client

In [None]:
# Initialize the GitHub client using the access token from the config.
client = Github(config["access_token"])

# Verify authentication by retrieving the authenticated user.
try:
    authenticated_user = client.get_user().login
    print(f"Successfully authenticated as: {authenticated_user}")
except Exception as e:
    print(f"Authentication failed: {e}")

Successfully authenticated as: Prahar08modi


## Fetch Repositories for the Organization

The `get_repo_names` function retrieves all repositories within a specified GitHub organization. This helps in identifying available repositories before analyzing commits or pull requests.

In [None]:
repos_info = utils.get_repo_names(client, config["org_name"])
repos_info

{'owner': 'causify-ai',
 'repositories': ['dev_tools', 'cmamp', 'kaizenflow', 'helpers', 'tutorials']}

## Fetch Commit Statistics

The `get_total_commits` function allows us to retrieve the number of commits made in the repositories of a specified GitHub organization. 

### **Usage**
- You can fetch **all commits** made during a specific time range.
- Additionally, you can **filter commits by specific users** to analyze individual contributions.

### **Parameters**
- `client` (*Github*): The authenticated GitHub API client.
- `org_name` (*str*): The GitHub organization name.
- `period` (*Optional[Tuple[datetime, datetime]]*): A tuple containing `start_date` and `end_date`.
- `usernames` (*Optional[List[str]]*): A list of GitHub usernames to filter commits by specific users.

In [None]:
commit_stats = utils.get_total_commits(
    client, 
    config["org_name"], 
    period=(config["start_date"], 
    config["end_date"]))
commit_stats

{'total_commits': 137,
 'period': '2025-01-20 00:00:00 to 2025-02-25 00:00:00',
 'commits_per_repository': {'dev_tools': 0,
  'cmamp': 85,
  'kaizenflow': 0,
  'helpers': 36,
  'tutorials': 16}}

In [None]:
commit_stats_filtered = utils.get_total_commits(
    client, 
    config["org_name"], 
    period=(config["start_date"], config["end_date"]),
    # Replace with actual GitHub usernames.
    github_names=["heanhsok"]  
)
commit_stats_filtered

{'total_commits': 33,
 'period': '2025-01-20 00:00:00 to 2025-02-25 00:00:00',
 'commits_per_repository': {'dev_tools': 0,
  'cmamp': 18,
  'kaizenflow': 0,
  'helpers': 11,
  'tutorials': 4}}

## Fetch Pull Request Statistics

The `get_total_prs` function retrieves the number of pull requests (PRs) made within the repositories of a specified GitHub organization. This function allows filtering PRs by state, author, and time period.

### **Parameters**
- `client` (*Github*): The authenticated GitHub API client.
- `org_name` (*str*): The name of the GitHub organization.
- `usernames` (*Optional[List[str]]*): A list of GitHub usernames to filter PRs. If `None`, fetches PRs from all users.
- `period` (*Optional[Tuple[datetime, datetime]]*): A tuple containing `start_date` and `end_date` to filter PRs within a time range.
- `state` (*str*, default=`'open'`): The state of the pull requests to fetch. Can be:
  - `'open'`: Fetch only open PRs.
  - `'closed'`: Fetch only closed PRs.
  - `'all'`: Fetch all PRs.

In [None]:
pr_stats = utils.get_total_prs(
    client, 
    config["org_name"], 
    period=(config["start_date"], config["end_date"])
)
pr_stats

Processing repositories: 100%|████████████████████████████████████████████████████████████████████████████████████████| 5/5 [02:33<00:00, 30.67s/repo]


{'total_prs': 148,
 'period': '2025-01-20 00:00:00+00:00 to 2025-02-25 00:00:00+00:00',
 'prs_per_repository': {'dev_tools': 0,
  'cmamp': 101,
  'kaizenflow': 1,
  'helpers': 36,
  'tutorials': 10}}

### Fetching Only Closed PRs

In [None]:
pr_stats_closed = utils.get_total_prs(
    client, 
    config["org_name"], 
    period=(config["start_date"], config["end_date"]),
    state="closed"
)
pr_stats_closed

Processing repositories: 100%|████████████████████████████████████████████████████████████████████████████████████████| 5/5 [02:38<00:00, 31.76s/repo]


{'total_prs': 116,
 'period': '2025-01-20 00:00:00+00:00 to 2025-02-25 00:00:00+00:00',
 'prs_per_repository': {'dev_tools': 0,
  'cmamp': 79,
  'kaizenflow': 1,
  'helpers': 29,
  'tutorials': 7}}

### Filtering PRs by Specific Users

In [None]:
pr_stats_filtered = utils.get_total_prs(
    client, 
    config["org_name"], 
    period=(config["start_date"], config["end_date"]),
    # Replace with actual GitHub usernames.
    github_names=["heanhsok"],  
    state="closed"
)
pr_stats_filtered

Processing repositories: 100%|████████████████████████████████████████████████████████████████████████████████████████| 5/5 [02:34<00:00, 30.92s/repo]


{'total_prs': 32,
 'period': '2025-01-20 00:00:00+00:00 to 2025-02-25 00:00:00+00:00',
 'prs_per_repository': {'dev_tools': 0,
  'cmamp': 20,
  'kaizenflow': 1,
  'helpers': 9,
  'tutorials': 2}}

## Fetch Unmerged Pull Requests

The `get_prs_not_merged` function retrieves the count of **closed but unmerged** pull requests (PRs) within the repositories of a specified GitHub organization. This helps identify PRs that were closed without being merged, which could indicate rejected changes or abandoned contributions.

### **Parameters**
- `client` (*Github*): The authenticated GitHub API client.
- `org_name` (*str*): The name of the GitHub organization.
- `github_names` (*Optional[List[str]]*): A list of GitHub usernames to filter PRs. If `None`, fetches PRs from all users.
- `period` (*Optional[Tuple[datetime, datetime]]*): A tuple containing `start_date` and `end_date` to filter PRs within a time range.

In [None]:
unmerged_prs = utils.get_prs_not_merged(
    client, 
    config["org_name"], 
    period=(config["start_date"], config["end_date"])
)
unmerged_prs

### Filtering by Specific Users

In [None]:
unmerged_prs_filtered = utils.get_prs_not_merged(
    client, 
    config["org_name"], 
    period=(config["start_date"], config["end_date"]),
    # Replace with actual GitHub usernames.
    github_names=["heanhsok"]  
)
unmerged_prs_filtered