## Setting up the environment

In [1]:
import sys
import os

# Add the parent directory to the system path
parent_path = os.path.abspath(os.path.join(os.getcwd(), '..'))
sys.path.insert(0, parent_path)

import productivity_analytics

# Load the environment variables
from dotenv import load_dotenv # pip install python-dotenv

load_dotenv(parent_path + '/.env.dev')
repo_owner = os.getenv('repo_owner')
repo_name = os.getenv('repo_name')
token = os.getenv('token')

## Background

There are two ways to collect data, the first being `get` and second being `update`. As per their names, `get` starts over from scratch and gets all data, and `update` updates that data. The process by which the data is initially built is as follows: 

1) `get_pr_numbers` fetches all PR numbers based on `repo_owner` and `repo_name`. In the below examples `world-federation-of-advertisers/cross-media-measurement`.
2) Using the output of `get_pr_numbers` as the input for `build_pr_dataframe` yields a dataframe with data for each PR number. It's also possible to get raw JSON for individual PR number using `get_or_data`.
3) Similar to the above step, using the output of `get_pr_numbers` as the input for `build_review_dataframe` yields a dataframe with data for all the review comments for the given PR numbers.

The above steps take roughly 2 second per PR. So 2,000 PRs will take about an hour to process. As the data is already available in `../data`, there is no need to go through these steps again, but one can simply update the datasets, where the process is as follows: 

1) `get_pr_numbers` fetches all PR numbers based on `repo_owner` and `repo_name`.
2) Using the output of `get_pr_numbers` as the input for `update_pr_dataframe` yields an updated dataframe.
3) Similar to the above step, using the output of `get_pr_numbers` as the input for `update_review_dataframe` yields an updated dataframe

# 1. Build data from scratch i.e. `get`

## 1.1. Get a list of PR numbers
Fetches a list of all PR numbers for the repo

In [None]:
from productivity_analytics import get_pr_numbers

pr_numbers = get_pr_numbers(repo_owner, repo_name, token)

## 1.2. Build PR data table
Builds a PR dataframe based on all the PR numbers

In [None]:
from productivity_analytics import build_pr_dataframe

pr_df = build_pr_dataframe(pr_numbers[0:10], repo_owner, repo_name, token)

## 1.2. Get PR data for each id
Fetches the raw JSON data for a single PR number

In [None]:
from productivity_analytics import get_pr_data

pr_data = get_pr_data(pr_numbers[0], repo_owner, repo_name, token)

## 1.3. Build Review data table
Builds a review dataframe based on all the PR numbers

In [None]:
from productivity_analytics import build_review_dataframe

review_df = build_review_dataframe(pr_numbers,
                                   repo_owner,
                                   repo_name,
                                   token,
                                   save_to_file='../data/review_data.csv')

## 1.3. Get review data for each PR number
Fetches the raw JSON data for a single PR number

In [None]:
from productivity_analytics import get_review_data

review_data = get_review_data(pr_numbers[20], repo_owner, repo_name, token)

# 2. Update already built data i.e. `update`

## 2.1. Get a list of PR numbers
Fetches a list of all PR numbers for the repo

In [None]:
from productivity_analytics import get_pr_numbers

pr_numbers = get_pr_numbers(repo_owner, repo_name, token)

## 2.2. Update PR data table
Builds a PR dataframe based on existing and new PR numbers

In [None]:
from productivity_analytics import update_pr_data

pr_df = update_pr_data(repo_owner, repo_name, token)

## 2.3. Update review data table
Builds a review dataframe based on existing and new PR numbers

In [2]:
from productivity_analytics import update_review_data

update_review_data(repo_owner, repo_name, token, save_to_file=False)

Unnamed: 0,id,number,html_url,state,merged,title,user_login,user_id,body,created_at,closed_at,merged_at,comments,review_comments,base_repo_name,commits,additions,deletions,changed_files
0,2253759183,1983,https://github.com/world-federation-of-adverti...,open,False,test: noise correction statistics.,ple13,15066084,,2024-12-27 21:27:17+00:00,,,1,0,cross-media-measurement,24,896,102,12
1,2247574277,1982,https://github.com/world-federation-of-adverti...,open,False,feat!: Deploy Access as part of Reporting,SanjayVas,1747892,,2024-12-21 02:09:44+00:00,,,1,0,cross-media-measurement,1,569,137,21
2,2246918549,1981,https://github.com/world-federation-of-adverti...,open,False,style: Address lint errors in metric and metri...,kungfucraig,700015,,2024-12-20 15:37:22+00:00,,,1,0,cross-media-measurement,4,103,43,3
3,2245830434,1980,https://github.com/world-federation-of-adverti...,open,False,feat: implement Roles public api,roaminggypsy,37773368,,2024-12-20 02:02:10+00:00,,,1,0,cross-media-measurement,2,740,0,6
4,2245654089,1979,https://github.com/world-federation-of-adverti...,open,False,build: Update common-jvm dep to 0.99.1,SanjayVas,1747892,,2024-12-19 22:48:46+00:00,,,1,0,cross-media-measurement,1,6,45,3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1740,604965024,6,https://github.com/world-federation-of-adverti...,closed,False,Add EDP libraries for Requisition fulfillment,efoxepstein,127482,This introduces a number of interfaces in ../d...,2021-03-31 04:21:56+00:00,2021-08-24 16:53:42+00:00,,2,5,cross-media-measurement,1,1088,5,25
1741,604192038,5,https://github.com/world-federation-of-adverti...,closed,True,Implement skeletons of Panel Match public APIs.,efoxepstein,127482,This just adds the basic Kotlin classes and so...,2021-03-30 18:09:53+00:00,2021-04-01 17:54:24+00:00,2021-04-01 17:54:24+00:00,0,0,cross-media-measurement,1,404,0,8
1742,604043801,4,https://github.com/world-federation-of-adverti...,closed,True,Add Panel Match tables to Kingdom.,efoxepstein,127482,This is to support the centralized APIs for pa...,2021-03-30 16:28:03+00:00,2021-04-07 16:13:03+00:00,2021-04-07 16:13:03+00:00,0,0,cross-media-measurement,1,81,3,1
1743,603246119,2,https://github.com/world-federation-of-adverti...,closed,True,Use non-experimental WFA workspace targets.,SanjayVas,1747892,This also uses the canonical reproducible form...,2021-03-29 23:59:42+00:00,2021-03-30 16:21:21+00:00,2021-03-30 16:21:21+00:00,0,0,cross-media-measurement,1,79,95,4
