## microtask-2

Create a Python script to execute Perceval via its Python interface using the GitLab and GitHub backends.

> Perceval is a Python module for retrieving data from various data source like Git repositories and GitHub, GitLab projects, Discourse, StackOverflow. You can get to know more about the project from [chaoss/grimoirelab-perceval](https://github.com/chaoss/grimoirelab-perceval).

In this notebook, we will use the github and gitlab backend modules to extract the information from a selected repository. The documentation to the perceval package can be found at https://perceval.readthedocs.io/en/latest/perceval.html

In [1]:
from datetime import datetime
import json
from pprint import pprint

## GitHub Backend

The GitHub repository we are gonna target is [amfoss/gitlit](https://github.com/amfoss/gitlit).

> The documentation for the GitHub Backend can be found [here](https://perceval.readthedocs.io/en/latest/perceval.backends.core.html#module-perceval.backends.core.github).

We need the personal access token to use the GitHub Backend of Perceval without having any problem with the GitHub API. You can generate a new token [here](https://github.com/settings/tokens/new). Please follow these steps if you need any help, [GitHub: PAT Help](https://help.github.com/en/github/authenticating-to-github/creating-a-personal-access-token-for-the-command-line).

One of the latest features added to the GitHub backend is the possiblity to pass a list of tokens.
[chaoss/grimoirelab-perceval#546](https://github.com/chaoss/grimoirelab-perceval/issues/546)

In [2]:
API_TOKENS = ["12345678abcdefgh", "12345678abcdefgh"]
# your github api tokens goes here

OWNER = "amfoss"
REPOSITORY = "gitlit"

from_date = datetime(2018,1,1)
to_date = datetime(2020,1,1)

In [3]:
from perceval.backends.core.github import GitHub

In [4]:
github_backend = GitHub(owner = OWNER,
                        repository = REPOSITORY,
                        api_token = API_TOKENS,
                        sleep_for_rate = True)

Let's printout some basic information.

In [5]:
print(github_backend.owner)
print(github_backend.repository)
print(github_backend.origin)

amfoss
gitlit
https://github.com/amfoss/gitlit


In [6]:
print(github_backend.categories)

['issue', 'pull_request', 'repository']


### Fetching Issues from GitHub Backend

The Issues of a project can be fetched using GitHub Backend. [Reference](https://perceval.readthedocs.io/en/latest/perceval.backends.core.html#perceval.backends.core.github.GitHub.CATEGORIES).


We can use the fetch function with the argument `category = 'issue'`.

In [7]:
issues = github_backend.fetch(
    category = 'issue', 
    from_date = from_date,
    to_date = to_date
    )

issues_list = list(issues)
print("ISSUES COUNT:", len(issues_list))

ISSUES COUNT: 34


> In GitHub, every pull request is an issue, but not every issue is a pull request. For this reason, "shared" actions for both features, like manipulating assignees, labels and milestones, are provided within the Issues API.

https://developer.github.com/v3/pulls/#labels-assignees-and-milestones

Let's have a look at some of the information which can be drawn from the data that was fetched by the Perceval.

In [8]:
issue = issues_list[-2]

# pprint(issue)

print(issue.keys())
print()
print(issue['data'].keys())
print()
print(issue['data']['user'].keys())

dict_keys(['backend_name', 'backend_version', 'perceval_version', 'timestamp', 'origin', 'uuid', 'updated_on', 'classified_fields_filtered', 'category', 'search_fields', 'tag', 'data'])

dict_keys(['url', 'repository_url', 'labels_url', 'comments_url', 'events_url', 'html_url', 'id', 'node_id', 'number', 'title', 'user', 'labels', 'state', 'locked', 'assignee', 'assignees', 'milestone', 'comments', 'created_at', 'updated_at', 'closed_at', 'author_association', 'body', 'reactions', 'user_data', 'assignee_data', 'assignees_data', 'comments_data', 'reactions_data'])

dict_keys(['login', 'id', 'node_id', 'avatar_url', 'gravatar_id', 'url', 'html_url', 'followers_url', 'following_url', 'gists_url', 'starred_url', 'subscriptions_url', 'organizations_url', 'repos_url', 'events_url', 'received_events_url', 'type', 'site_admin'])


In [9]:
print("The title of the issue:", issue['data']['title'])
print("The issue is opened by:", issue['data']['user']['login'])
print("The issue is created at:", issue['data']['created_at'])
print("The status of the issue is:", issue['data']['state'])
print("The URL of the issue is:", issue['data']['html_url'])

The title of the issue: Add Logo to NavBar
The issue is opened by: harshithpabbati
The issue is created at: 2019-10-09T08:31:06Z
The status of the issue is: open
The URL of the issue is: https://github.com/amfoss/GitLit/issues/34


### Fetching Pull Requests from GitHub Backend

The PRs of a project can be fetched using GitHub Backend. [Reference](https://perceval.readthedocs.io/en/latest/perceval.backends.core.html#perceval.backends.core.github.GitHub.CATEGORIES).

We can use the fetch function with the argument `category = 'pull_request'`.

In [10]:
prs = github_backend.fetch(
    category = 'pull_request', 
    from_date = from_date,
    to_date = to_date
    )

prs_list = list(prs)
print("PRS COUNT:", len(prs_list))

PRS COUNT: 33


Let's have a look at some of the information which can be drawn from the data that was fetched by the Perceval.

In [11]:
pr = prs_list[1]

# pprint(pr)

print(pr.keys())
print()
print(pr['data'].keys())
print()
print(pr['data']['user'].keys())

dict_keys(['backend_name', 'backend_version', 'perceval_version', 'timestamp', 'origin', 'uuid', 'updated_on', 'classified_fields_filtered', 'category', 'search_fields', 'tag', 'data'])

dict_keys(['url', 'id', 'node_id', 'html_url', 'diff_url', 'patch_url', 'issue_url', 'number', 'state', 'locked', 'title', 'user', 'body', 'created_at', 'updated_at', 'closed_at', 'merged_at', 'merge_commit_sha', 'assignee', 'assignees', 'requested_reviewers', 'requested_teams', 'labels', 'milestone', 'commits_url', 'review_comments_url', 'review_comment_url', 'comments_url', 'statuses_url', 'head', 'base', '_links', 'author_association', 'merged', 'mergeable', 'rebaseable', 'mergeable_state', 'merged_by', 'comments', 'review_comments', 'maintainer_can_modify', 'commits', 'additions', 'deletions', 'changed_files', 'user_data', 'review_comments_data', 'reviews_data', 'requested_reviewers_data', 'merged_by_data', 'commits_data'])

dict_keys(['login', 'id', 'node_id', 'avatar_url', 'gravatar_id', 'url', '

In [12]:
print("The title of pull request:", pr['data']['title'])
print("The pull request is opened by:", pr['data']['user']['login'])
print("The pull request is created at:", pr['data']['created_at'])
print("The status of the pull request is:", pr['data']['state'])
print("Additions:", pr['data']['additions'] , "\nDeletions:", pr['data']['deletions'])
print("The URL of the pull request is:", pr['data']['html_url'])

The title of pull request: Update README.md
The pull request is opened by: akhilam512
The pull request is created at: 2018-12-15T14:00:01Z
The status of the pull request is: closed
Additions: 4 
Deletions: 1
The URL of the pull request is: https://github.com/amfoss/GitLit/pull/2


## GitLab Backend

The GitLab project we are gonna target is [amfoss/gitlit](https://gitlab.com/amfoss/gitlit).

> The documentation for the GitLab Backend can be found [here](https://perceval.readthedocs.io/en/latest/perceval.backends.core.html#module-perceval.backends.core.gitlab).

We need the personal access token to use the GitHub Backend of Perceval without having any problem with the GitHub API. You can generate a new token [here](https://gitlab.com/profile/personal_access_tokens). Please follow these steps if you need any help, [GitLab: PAT Docs](https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html).

In [13]:
API_TOKEN = "12345678abcdefgh"
# your gitlab api token goes here

OWNER = "amfoss"
REPOSITORY = "gitlit"

from_date = datetime(2018,1,1)

In [14]:
from perceval.backends.core.gitlab import GitLab

In [15]:
gitlab_backend = GitLab(owner = OWNER,
                        repository = REPOSITORY,
                        api_token = API_TOKEN,
                        sleep_for_rate = True)

Let's printout some basic information.

In [16]:
print(gitlab_backend.owner)
print(gitlab_backend.repository)
print(gitlab_backend.origin)

amfoss
gitlit
https://gitlab.com/amfoss/gitlit


In [17]:
print(gitlab_backend.categories)

['issue', 'merge_request']


### Fetching Issues from GitLab Backend

The Issues of a project can be fetched using GitLab Backend. [Reference](https://perceval.readthedocs.io/en/latest/perceval.backends.core.html#perceval.backends.core.gitlab.GitLab.CATEGORIES).


We can use the fetch function with the argument `category = 'issue'`.

In [18]:
issues = gitlab_backend.fetch(
    category = 'issue', 
    from_date = from_date
    )

issues_list = list(issues)
print("ISSUES COUNT:", len(issues_list))

ISSUES COUNT: 1


Let's have a look at some of the information which can be drawn from the data that was fetched by the Perceval.


In [19]:
issue = issues_list[0]

# pprint(issue)

print(issue.keys())
print()
print(issue['data'].keys())
print()
print(issue['data']['author'].keys())

dict_keys(['backend_name', 'backend_version', 'perceval_version', 'timestamp', 'origin', 'uuid', 'updated_on', 'classified_fields_filtered', 'category', 'search_fields', 'tag', 'data'])

dict_keys(['id', 'iid', 'project_id', 'title', 'description', 'state', 'created_at', 'updated_at', 'closed_at', 'closed_by', 'labels', 'milestone', 'assignees', 'author', 'assignee', 'user_notes_count', 'merge_requests_count', 'upvotes', 'downvotes', 'due_date', 'confidential', 'discussion_locked', 'web_url', 'time_stats', 'task_completion_status', 'has_tasks', '_links', 'references', 'moved_to_id', 'weight', 'epic_iid', 'epic', 'notes_data', 'award_emoji_data'])

dict_keys(['id', 'name', 'username', 'state', 'avatar_url', 'web_url'])


In [20]:
print("The title of the issue:", issue['data']['title'])
print("The issue is opened by:", issue['data']['author']['username'])
print("The issue is created at:", issue['data']['created_at'])
print("The status of the issue is:", issue['data']['state'])
print("The URL of the issue is:", issue['data']['web_url'])

The title of the issue: Add Logo to NavBar
The issue is opened by: amfoss_in
The issue is created at: 2019-10-09T08:31:06.000Z
The status of the issue is: opened
The URL of the issue is: https://gitlab.com/amfoss/GitLit/issues/34


### Fetching Merge Requests from GitLab Backend

The MRs of a project can be fetched using GitLab Backend. [Reference](https://perceval.readthedocs.io/en/latest/perceval.backends.core.html#perceval.backends.core.gitlab.GitLab.CATEGORIES).


We can use the fetch function with the argument `category = 'merge_request'`.

In [21]:
mrs = gitlab_backend.fetch(
    category = 'merge_request', 
    from_date = from_date
    )

mrs_list = list(mrs)
print("MRS COUNT:", len(mrs_list))

MRS COUNT: 33


Let's have a look at some of the information which can be drawn from the data that was fetched by the Perceval.

In [22]:
mr = mrs_list[1]

# pprint(mr)

print(mr.keys())
print()
print(mr['data'].keys())
print()
print(mr['data']['author'].keys())

dict_keys(['backend_name', 'backend_version', 'perceval_version', 'timestamp', 'origin', 'uuid', 'updated_on', 'classified_fields_filtered', 'category', 'search_fields', 'tag', 'data'])

dict_keys(['id', 'iid', 'project_id', 'title', 'description', 'state', 'created_at', 'updated_at', 'merged_by', 'merged_at', 'closed_by', 'closed_at', 'target_branch', 'source_branch', 'user_notes_count', 'upvotes', 'downvotes', 'assignee', 'author', 'assignees', 'source_project_id', 'target_project_id', 'labels', 'work_in_progress', 'milestone', 'merge_when_pipeline_succeeds', 'merge_status', 'sha', 'merge_commit_sha', 'squash_commit_sha', 'discussion_locked', 'should_remove_source_branch', 'force_remove_source_branch', 'reference', 'references', 'web_url', 'time_stats', 'squash', 'task_completion_status', 'has_conflicts', 'blocking_discussions_resolved', 'approvals_before_merge', 'subscribed', 'changes_count', 'latest_build_started_at', 'latest_build_finished_at', 'first_deployed_to_production_at', '

In [23]:
print("The title of merge request:", mr['data']['title'])
print("The merge request is opened by:", mr['data']['author']['username'])
print("The merge request is created at:", mr['data']['created_at'])
print("The status of the merge request is:", mr['data']['state'])
print("The URL of the merge request is:", mr['data']['web_url'])

The title of merge request: Update README.md
The merge request is opened by: akhilkg
The merge request is created at: 2018-12-15T14:00:01.000Z
The status of the merge request is: merged
The URL of the merge request is: https://gitlab.com/amfoss/GitLit/-/merge_requests/2
