## Setting up the environment

In [None]:
import sys
import os

# Add the parent directory to the system path
parent_path = os.path.abspath(os.path.join(os.getcwd(), '..'))
sys.path.insert(0, parent_path)

import productivity_analytics

# Load the environment variables
from dotenv import load_dotenv # pip install python-dotenv

load_dotenv(parent_path + '/.env.dev')
repo_owner = os.getenv('repo_owner')
repo_name = os.getenv('repo_name')
token = os.getenv('token')

## Background

There are two ways to collect data, the first being `get` and second being `update`. As per their names, `get` starts over from scratch and gets all data, and `update` updates that data. The process by which the data is initially built is as follows: 

1) `get_pr_numbers` fetches all PR numbers based on `repo_owner` and `repo_name`. In the below examples `world-federation-of-advertisers/cross-media-measurement`.
2) Using the output of `get_pr_numbers` as the input for `build_pr_dataframe` yields a dataframe with data for each PR number. It's also possible to get raw JSON for individual PR number using `get_or_data`.
3) Similar to the above step, using the output of `get_pr_numbers` as the input for `build_review_dataframe` yields a dataframe with data for all the review comments for the given PR numbers.

The above steps take roughly 2 second per PR. So 2,000 PRs will take about an hour to process. As the data is already available in `../data`, there is no need to go through these steps again, but one can simply update the datasets, where the process is as follows: 

1) `get_pr_numbers` fetches all PR numbers based on `repo_owner` and `repo_name`.
2) Using the output of `get_pr_numbers` as the input for `update_pr_dataframe` yields an updated dataframe.
3) Similar to the above step, using the output of `get_pr_numbers` as the input for `update_review_dataframe` yields an updated dataframe

# 1. Build data from scratch i.e. `get`

## 1.1. Get a list of PR numbers
Fetches a list of all PR numbers for the repo. 

**NOTE:** This takes anywhere from 10 to 20 seconds.

In [None]:
from productivity_analytics import get_pr_numbers

pr_numbers = get_pr_numbers(repo_owner, repo_name, token)

## 1.2. Build PR data table
Builds a PR dataframe based on all the PR numbers

In [None]:
from productivity_analytics import build_pr_dataframe

pr_df = build_pr_dataframe(pr_numbers[0:10],
                           repo_owner,
                           repo_name,
                           token)

## 1.2. Get PR data for a PR number
Fetches the raw JSON data for a single PR number

In [None]:
from productivity_analytics import get_pr_data

pr_data = get_pr_data(pr_numbers[0],
                      repo_owner,
                      repo_name,
                      token)

## 1.3. Build Review data table
Builds a review dataframe based on all the PR numbers

In [None]:
from productivity_analytics import build_review_dataframe

review_df = build_review_dataframe(pr_numbers[0:10],
                                   repo_owner,
                                   repo_name,
                                   token)

## 1.3. Get review data for a PR number
Fetches the raw JSON data for a single PR number

In [None]:
from productivity_analytics import get_review_data

review_data = get_review_data(pr_numbers[20],
                              repo_owner,
                              repo_name,
                              token)

# 2. Update already built data i.e. `update`

## 2.1. Get a list of PR numbers
Fetches a list of all PR numbers for the repo.

**NOTE:** This is the same function as above, provided here for completeness.

In [None]:
from productivity_analytics import get_pr_numbers

pr_numbers = get_pr_numbers(repo_owner, repo_name, token)

## 2.2. Update PR data table
Builds a PR dataframe based on existing and new PR numbers

In [None]:
from productivity_analytics import update_pr_data

pr_df = update_pr_data(repo_owner, repo_name, token, save_to_file=False)

## 2.3. Update review data table
Builds a review dataframe based on existing and new PR numbers

In [None]:
from productivity_analytics import update_review_data

update_review_data(repo_owner, repo_name, token, save_to_file=False)