# Microtask-1

> Produce a notebook showing (and producing) a list with the activity per quarter: number of new committers, submitters of issues, and submitters of pull/merge requests, number of items (commits, issues, pull/merge requests), number of repositories with new items (all of this per quarter) as a table and as a CSV file using plain python3 (no pandas).


I am using the same data source file which is used in the [microtask-0](https://github.com/vchrombie/chaoss-microtasks/blob/master/microtask-0/microtask-0.ipynb) i.e, [elasticsearch-py](https://github.com/elastic/elasticsearch-py) project which is located in the `data/` folder of the repository.

In [1]:
# while running this in mybiner notebooks if you are facing 
# dependency errors, you need to uncomment the below lines.

#!pip install prettytable
#!pip install pandas
#!pip install perceval
#!pip install regex
#!pip install matplotlib

# Retrieving the data

You can also retrieve the data source files from the jupyter notebook itself. Just provide your `github_token` (github personal access token) and uncomment the code and run the code in the below cell.

In [2]:
# Please enter your github token here
github_token = "" 
owner = "elastic"
repos = ["elasticsearch-py"]
repos_url = ["https://github.com/" + owner + "/" + repo for repo in repos]
# file to which perceval stores data source
files = [repo+".json" for repo in repos] 
ctypes = ('commit','issue','pull_request')

#for repo, repo_url, file in zip(repos, repos_url, files):
#    print(repo, repo_url, file)
#    !perceval git --json-line $repo_url >> ../$file
#    !perceval github --json-line --sleep-for-rate -t $github_token --category pull_request $owner $repo >> ../data/$file
#    !perceval github --json-line --sleep-for-rate -t $github_token --category issue $owner $repo >> ../data/$file

## Defining Quaters

The calendar year can be divided into four quarters, often abbreviated as Q1, Q2, Q3, and Q4.
- First quarter, Q1: 1 January – 31 March (90 days or 91 days in leap years)
- Second quarter, Q2: 1 April – 30 June (91 days)
- Third quarter, Q3: 1 July – 30 September (92 days)
- Fourth quarter, Q4: 1 October – 31 December (92 days)

Reference: https://en.wikipedia.org/wiki/Calendar_year

Each Quater is represented as **Qi yyyy** where *i* is the quater number and _yyyy_ is the year.

In [3]:
import json 
import csv  
import requests 
import regex as re

from datetime import datetime 
from collections import defaultdict  
from prettytable import from_csv

In [4]:
class Activity_Quarter:
    """
    Class for Activity_Quarter for Git repositories
    
    Objects are instantiated by specifying a file with the
    commits, issues, pull_requests obtained by Perceval
    from a set of repositories.
    
    :param path: Path to file with one Perceval JSON document per line
    """
    
    def __init__(self, path):      
        """
        Initilizes self.comments, self.quarters, self.activities,
        self.newcontributors, self.oldcontributors.
        """
        self.contents = self.get_contents("%s"%path)

        self.quarters = []
        self.activities = defaultdict(list)
        self.newcontributors = defaultdict(list)
        self.oldcontributors = defaultdict(set)

    def summerize_quarterwise(self):
        """
        """
        created, present = self.get_dates()
        for year,quarter,start,end in self.quarterwise(created, present):
            # add `Qi yyyy`  format as a quater in the quaters list
            self.quarters.append(r"Q%d %d"%(quarter+1,year))
            for ctype in ctypes:
                activity =  newcontributor =  0 
                for item in self.contents[ctype]:
                    # checking if the date of contribtion (commit/issue/pr) created is in between start & end
                    if start<=item['created_date'].replace(tzinfo=None)<=end:
                        activity+=1
                        if item['author'] not in self.oldcontributors[ctype]:
                            newcontributor+=1
                            self.oldcontributors[ctype].add(item['author'])
                self.activities[ctype].append(activity)
                # newcontributors, either through commit/issue/pr are added to the dict as (ctype, value) in list
                self.newcontributors[ctype].append(newcontributor)

                
    def get_contents(self, path):
        """Get the contents of the project"""
        
        contents = defaultdict(list)
        with open('%s'%path) as datasrc:
            for line in datasrc:
                line = json.loads(line)
                if line['category'] == 'commit':    
                    content = self.summary_commit(line) 
                elif line['category'] == 'issue':    
                    content = self.summary_issue(line)
                elif line['category'] == 'pull_request':    
                    content = self.summary_pr(line) 
                contents[line['category']].append(content)
        return contents
        

    def summary_commit(self, commit):
        """Compute a summary of a commit, suitable as a row"""
        
        repo = commit['origin']
        data = commit['data']
        summary ={
                'repo': repo,
                'hash': data['commit'],
                'author': data['Author'],

                'created_date': datetime.strptime(data['CommitDate'],
                                                          "%a %b %d %H:%M:%S %Y %z")
        }
        return summary

    def summary_issue(self, issue):
        """Compute a summary of a issue, suitable as a row"""

        repo = issue['origin']
        data = issue['data']
        summary ={
                'repo': repo,
                'hash': data['id'],
                'author': data['user']['login'],
                'created_date': datetime.strptime(data['created_at'],
                                                  "%Y-%m-%dT%H:%M:%SZ")
        }
        return summary

    def summary_pr(self, pr):
        """Compute a summary of a pull_request, suitable as a row"""

        repo = pr['origin']
        data = pr['data']
        summary ={
                'repo': repo,
                'hash': data['id'],
                'author': data['user']['login'],
                'created_date': datetime.strptime(data['created_at'],
                                                  "%Y-%m-%dT%H:%M:%SZ")
        }  
        return summary
    
    
    # I found it difficult to scrape through the data retrieved by the perceval and find the
    # dates so I had to find them through this hack

    def get_dates(self):
        """Get the project created date and present date"""
        
        repo = self.repo_name()
        repodata =json.loads(requests.get("https://api.github.com/repos/%s/%s"%(owner,repo)).text)

        created =datetime.strptime(repodata['created_at'][:10], "%Y-%m-%d").year
        present =datetime.strptime(repodata['updated_at'][:10], "%Y-%m-%d").year
        return created, present
        
        
    def define_quarters(self):
        """Define the quaters of the year"""
        
        QUARTERS = (
            ({'month':1,'day':1},  {'month':3,'day':31}),
            ({'month':4,'day':1},  {'month':6,'day':30}),
            ({'month':7,'day':1},  {'month':9,'day':30}),
            ({'month':10,'day':1}, {'month':12,'day':31}),
        )
        return QUARTERS
    
    
    def quarterwise(self, first_year, last_year):
        """Divides the contributions quaterly based on QUATERS"""
        
        QUARTERS = self.define_quarters()
        for year in range(first_year, last_year+1):
            for quarter,(start,end) in enumerate(QUARTERS):
                start = datetime(year,**start)
                end = datetime(year,**end)
                yield year,quarter,start,end  
                
                
    def repo_name(self):
        """Get the name of the repository"""
        
        content = self.contents
        repourl = "%s"%content['commit'][0]['repo']
        reponame = re.split('/', repourl)
        return reponame[-1]
                
        
    def show_total_activity(self):
        """Prints the total activity quaterly"""
        
        print("\n%s Quaterwise Total Activity"%self.repo_name())
        for item in dict(self.activities):
            print (item, dict(self.activities)[item])  
            
    def show_new_contributors(self):
        """Prints the number of new contributors quaterly."""    
        
        print("\n%s Quaterwise New Contributors Activity"%self.repo_name())
        for item in dict(self.newcontributors):
            print (item, dict(self.newcontributors)[item])  
            
            
    def create_output_csv(self):
        """Creates a CSV file with the summary"""
        
        header = ['Quarter','# Commits','# PullRequests','# Issues',
                  '# NewCommitters','# NewIssueSubmitters','# NewPRSubmitters' ]
        with open('csv_files/%s.csv'%self.repo_name(), 'w') as file:
            writer = csv.writer(file)
            writer.writerow(header)
            # to map the similar index of multiple containers so that they can be added in single entity i.e, rows
            rows = zip(self.quarters,self.activities['commit'],self.activities['pull_request'],
                       self.activities['issue'],self.newcontributors['commit'],
                       self.newcontributors['pull_request'],self.newcontributors['issue'])
            writer.writerows(rows)
        
    def show_as_table(self):
        """Creates a table from CSV file data."""
        
        print("\n%s Quaterwise Activity"%self.repo_name())
        with open("csv_files/%s.csv"%self.repo_name(), "r") as csvfile: 
            csvtable = from_csv(csvfile)
        print(csvtable)

## Summary of the Microtask-1

In [5]:
for repo in repos:
    print("_____Summary of %s_____"%repo)
    act_quar = Activity_Quarter("../data/%s.json"%repo)
    act_quar.summerize_quarterwise()
    act_quar.show_total_activity()
    act_quar.show_new_contributors()
    act_quar.create_output_csv()
    act_quar.show_as_table()
    print("\n\n")

_____Summary of elasticsearch-py_____

elasticsearch-py Quaterwise Total Activity
commit [0, 78, 154, 94, 126, 33, 30, 38, 45, 37, 27, 80, 28, 29, 21, 34, 27, 28, 31, 57, 44, 26, 10, 37, 4, 0, 0, 0]
issue [0, 0, 6, 24, 39, 29, 33, 39, 41, 33, 32, 50, 63, 26, 41, 52, 51, 47, 40, 39, 65, 54, 35, 39, 19, 0, 0, 0]
pull_request [0, 0, 3, 14, 16, 18, 8, 14, 13, 9, 6, 14, 11, 4, 7, 17, 15, 14, 13, 20, 24, 17, 7, 13, 8, 0, 0, 0]

elasticsearch-py Quaterwise New Contributors Activity
commit [0, 1, 2, 9, 6, 11, 3, 3, 8, 4, 5, 9, 7, 2, 0, 6, 7, 4, 5, 5, 10, 8, 3, 7, 4, 0, 0, 0]
issue [0, 0, 5, 16, 32, 25, 25, 30, 28, 31, 25, 38, 46, 20, 36, 42, 36, 33, 31, 25, 37, 38, 27, 26, 13, 0, 0, 0]
pull_request [0, 0, 2, 9, 16, 15, 8, 11, 12, 7, 6, 12, 10, 3, 5, 14, 12, 6, 8, 9, 10, 9, 4, 8, 8, 0, 0, 0]

elasticsearch-py Quaterwise New Activity
+---------+-----------+----------------+----------+-----------------+----------------------+-------------------+
| Quarter | # Commits | # PullRequests | # Issues |