# Microtask 4: 
> Produce a listing of repositories, as a table and as CSV file, with the number of commits authored, issues opened, and pull/merge requests opened, during the last three months, ordered by the total number (commits plus issues plus pull requests). Use plain Python3 (eg, no Pandas) for this.

I am using the data source files of five repositories of FOSSASIA. They are 
- [badgeyay](https://github.com/fossasia/badgeyay) 
- [open-event-server](https://github.com/fossasia/open-event-server) 
- [phimpme-android](https://github.com/fossasia/phimpme-android) 
- [susi_android](https://github.com/fossasia/susi_android) 
- [susi_server](https://github.com/fossasia/susi_server) 

All the data source are located in the `data/` folder of the repository.

In [1]:
!pip install prettytable
!pip install pandas
!pip install perceval

[31mgrimoirelab-toolkit 0.1.9 has requirement python-dateutil>=2.8.0, but you'll have python-dateutil 2.7.3 which is incompatible.[0m
[33mYou are using pip version 10.0.1, however version 19.0.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


# Retrieving the data

You can also retrieve the data source files from the jupyter notebook itself. Just provide your `github_token` (github personal access token) and uncomment the code and run the code in the below cell.

In [2]:
github_token = "" # Please enter your github token here
owner = "fossasia"
repos = ["badgeyay", "open-event-server","phimpme-android","susi_android","susi_server"]
repos_url = ["https://github.com/" + owner + "/" + repo for repo in repos]
files = [repo+".json" for repo in repos] # file to which perceval stores data (a ../ is automatically added)

#for repo, repo_url, file in zip(repos, repos_url, files):
#    print(repo, repo_url, file)
#    !perceval git --json-line $repo_url >> ../$file
#    !perceval github --json-line --sleep-for-rate -t $github_token --category pull_request $owner $repo >> ../data/$file
#    !perceval github --json-line --sleep-for-rate -t $github_token --category issue $owner $repo >> ../data/$file

In [3]:
# json library is used to handle json files, here, it is the data source retrieved by the perceval module.
import json 
# to write and read csv files, to show the output in the end
import csv  

# importing pandas
# pandas is used for handling huge data using dataframe
import pandas as pd

# to handle the time formats, like to determine 'created_at' of an issue or pr.
from datetime import datetime, date, timedelta

# dictionaries are a convenient way to store data for later retrieval by name (key).
from collections import defaultdict  

# it is used to send http requests, I used to get the year in which the project created to do the analysis, using requests and github api.
import requests 

from prettytable import from_csv

In [4]:
# function to get the required details of commits
# commit has a different json structure unlike issue/pr

def details_commit(commit):
    """
    Get the contents of the commit.
    
    This method gives, by taking the line data, 
    the summary of the commit.
    
    :param item: line json data of the commit
    :return: content of the line
    """
    # load the commit data into the object
    data = commit['data']
    # traverse through the json line to find the required data
    content ={
            # get the hash of the commit
            'hash': data['commit'],
            # get the author_name
            'author': data['Author'],  
            # get the date at which the commit was created
            'created_date': datetime.strptime(data['CommitDate'],"%a %b %d %H:%M:%S %Y %z")  
    }
    return content

In [5]:
# function to get the required details of issue/pull requests
# as issue/pr has the same json structure in the data source scraped by perceval
# I wrote a single function to get the either issue/pr details 

def details_ipr(item):
    """
    Get the contents of the issue/pr.
    
    This method gives, by taking the line data, 
    the summary of the issue/pr.
    
    :param item: line json data of the issue/pr
    
    :return: content of the line
    """
    # load the commit data into the object
    data = item['data']
    # traverse through the json line to find the required data
    content ={
            # get the hash of the issue/pr
            'hash': data['id'],
            # get the author_name
            'author': data['user']['login'],  
            # get the date at which the issue/pr was created
            'created_date': datetime.strptime(data['created_at'],"%Y-%m-%dT%H:%M:%SZ")  
    }
    return content 

In [6]:
def get_contents(repo):
    """
    Get the contents of the project.
    
    This method gives, by taking the data retrived by perceval, 
    the content of the repository.
    
    :param repo: get the name of the repository
    
    :return: contents dataframe of the repository
    """
    # using a defaultdict of list so that I can store the sorted details according to the ctype as (key, value) 
    contents = defaultdict(list)

    # to filter out commit, issue, pr details from the data source and store them seperately in dict.
    # loading the file into an object
    with open('../data/%s.json'%repo) as datasrc:
        for line in datasrc:
            # load the line in the json format so as to iterate to get the required results
            line = json.loads(line)
            # if it is a commit, get the details of commit
            if line['category'] == 'commit':    
                content = details_commit(line) 
            # if it is a issue, get the details of issue
            elif line['category'] == 'issue':    
                content = details_ipr(line)
            # if it is a pr, get the details of pr
            elif line['category'] == 'pull_request':    
                content = details_ipr(line) 
            # add the (key, value) to the list
            contents[line['category']].append(content)
        # return the contents
        return contents

In [7]:
# names of the repositories
repos = ['badgeyay','phimpme-android','susi_server','susi_android','open-event-server']

# contribution types
ctypes = ('commit','pull_request','issue')

# date before three months 
initial_date = datetime.combine(date.today() - timedelta(3*365/12), datetime.min.time())
    # REFERENCE: Stack Overflow https://stackoverflow.com/a/546356/8268998

# to store the total count of th contribution types
repodata = defaultdict(list)

# iterating through the repos
for repo in repos:
    # getting the contents of the repo by calling the function
    repocontents = get_contents(repo)
    # total count initialized
    total = 0
    # iterating through the contribution types
    for ctype in ctypes:
        # initialized the ctype count
        count = 0
        # iterating through the repocontents of a particular type
        for item in repocontents[ctype]:
            # checking if the created date is less than 3 months
            if item['created_date'].replace(tzinfo=None) >= initial_date:
                # if yes, increase the count
                count += 1
        # total count
        total += count
        # append the ctype count in the repodata
        repodata[ctype].append(count)
    # append the total count
    repodata['total'].append(total)

In [8]:
print("Repositories Details in the past three months\n")
for item in dict(repodata):
    # print the total activity quaterly
    print (item, dict(repodata)[item])  

Repositories Details in the past three months

commit [100, 308, 37, 78, 301]
pull_request [0, 0, 16, 140, 121]
issue [137, 345, 30, 262, 228]
total [237, 653, 83, 480, 650]


In [9]:
# add headers to the csv file 
header = ['Repository','# Commits','# PullRequests','# Issues','# Total']
# opening a new csv to write the data into it.
with open('result.csv', 'w') as file:
    # intilize the writer object
    writer = csv.writer(file)
    # wring the header first
    writer.writerow(header)
    # to map the similar index of multiple containers so that they can be added in single entity i.e, rows
    rows = zip(repos,repodata['commit'],repodata['pull_request'],repodata['issue'],repodata['total'])
    # writing all the rows at a time
    writer.writerows(rows)

In [10]:
# to show the output in the form of a table
# load the csv file into a object
with open("result.csv", "r") as csvfile: 
    # using from_csv method from prettytable module
    csvtable = from_csv(csvfile)
    
# print the prettified table
print(csvtable)

+-------------------+-----------+----------------+----------+---------+
|     Repository    | # Commits | # PullRequests | # Issues | # Total |
+-------------------+-----------+----------------+----------+---------+
|      badgeyay     |    100    |       0        |   137    |   237   |
|  phimpme-android  |    308    |       0        |   345    |   653   |
|    susi_server    |     37    |       16       |    30    |    83   |
|    susi_android   |     78    |      140       |   262    |   480   |
| open-event-server |    301    |      121       |   228    |   650   |
+-------------------+-----------+----------------+----------+---------+
