Contains the complete step-wise filter, pull, and SonarQube analysis pipeline for the provided git repositories.  
The expected input is a list of repositories of which the data is acquired through the github web API.

Make sure that a SonarQube instance is running prior to running this script and that the credentials used in the analysis phase are set correctly.
Consider that executing this script takes quite a while to complete.

- SonarQube logs are put in the `./output` folder, you can use this to figure out why SonarQube fails when analyzing certain projects.
- The results are put in the `./results` folder.

If you interrupt this script halfway through, or the script crashes at some point, it might be that it doesn't work the next run.
Restarting Jupyter might help here, but also check if the access tokens are revoked correctly.


In [103]:
import array
from git import Repo, GitCommandError
import os
import requests
import subprocess
from typing import Tuple
import shutil

with open("./data/test_repositories.csv", "r") as data_file:
    data = [entry.strip().split(",") for entry in data_file.readlines()[1:]]

print(f'loaded data:\n{data}')


loaded data:
[['trafficserver', 'https://github.com/apache/trafficserver', '132898', 'master', 'C++']]


Step 1 of the lifecycle, all methods used to statically filter the repositories.


In [104]:
PROJECT_TYPE = 4


def is_not_java(entry: array) -> bool:
    return entry[PROJECT_TYPE] != "Java"


def is_considered(entry: array) -> bool:
    """Returns true if the project is a maven project"""

    # return is_java(entry) and is_maven(entry)
    return is_not_java(entry)


Step 2 of the lifecycle, cloning a repository


In [105]:
PROJECT_NAME = 0
PROJECT_REPO_URL = 1


def clone_repository(entry: array) -> Tuple[int, str]:
    """
    Clones repository to the ./repos folder and 
    returns the status code and repo's folder.
    """

    url = entry[PROJECT_REPO_URL]
    dir = os.path.join("./repos", entry[PROJECT_NAME])

    try:
        Repo.clone_from(url, dir)
        status = 0

    except GitCommandError as e:
        if e.status != 128:
            status = e.status
        else:
            status = 0
            print(f'repository for {entry[PROJECT_NAME]} is already cloned')

    finally:
        return status, dir


Implements steps 3 to 5 of the lifecycle, the SonarQube steps.


In [106]:
server_url = "http://localhost:9000"
auth = ('admin', 'password')


def create_sonarqube_project(entry: array) -> int:
    """Creates SonarQube project if none exists yet"""

    name = entry[PROJECT_NAME]

    url = f"{server_url}/api/projects/create"
    data = {'name': name, 'project': name, 'visibility': 'public'}
    c_res = requests.post(url=url, data=data, auth=auth)

    return c_res.status_code


def create_sonarqube_token(entry:array) -> Tuple[int, str]:
    """Generates new SonarQube token"""

    name = entry[PROJECT_NAME]

    url = f"{server_url}/api/user_tokens/generate"
    data = {'name': name}
    t_res = requests.post(url=url, data=data, auth=auth)
    token = t_res.json()["token"]

    return t_res.status_code, token


def perform_sonarqube_analysis(entry: array, dir: str, token: str) -> int:
    """Executes sonarqube analaysis and sends it to the server"""

    name = entry[PROJECT_NAME]

    args = [
        'sonar-scanner',
        '-Dsonar.sources=.',
        f'-Dsonar.projectKey={name}',
        f'-Dsonar.host.url={server_url}',
        f'-Dsonar.login={token}',
        '-Dsonar.coverage.exclusions=/**.java',
        '-Dsonar.test.exclusions=/**.java',
        '-Dsonar.exclusions=/**.java'
    ]
    
    with open(f"./output/{name}-sonarqube-output.log", "w") as output_file:
        os.chdir(dir)
        res = subprocess.run(args, stdout=output_file)

    # TODO: this shouldn't assume two layers.
    os.chdir("../..")

    return res.returncode


def export_sonarqube_issues(entry: array):
    """Exports the generated issues through the web API"""

    name = entry[PROJECT_NAME]

    url = f"{server_url}/api/issues/search"
    data = {'componentKeys': name}
    res = requests.get(url=url, data=data, auth=auth)

    with open(f"./results/issues-{name}.json", "w") as results_file:
        results_file.write(res.text)


Step 6 of the lifecycle, clean up methods


In [107]:
def revoke_sonarqube_token(entry: array):
    """Revokes the sonarqube access token"""

    name = entry[PROJECT_NAME]

    url = f'{server_url}/api/user_tokens/revoke'
    data = {'name': name}
    requests.post(url=url, data=data, auth=auth)


def delete_sonarqube_project(entry: array):
    """Deletes sonarqube project """

    name = entry[PROJECT_NAME]

    url = f'{server_url}/api/projects/delete'
    data = {'project': name}
    requests.post(url=url, data=data, auth=auth)


def delete_repository(entry, dir):
    """Deletes repository"""

    shutil.rmtree(dir)


Implements the pipeline lifecycle.

1. filtering repositories
2. cloning repositories
3. creating SonarQube project
4. analyzing repository
5. exporting results
6. (optional) deleting sonarqube project


In [108]:
def perform_lifecycle(entry: array):

    name = entry[PROJECT_NAME]

    # step 1: filtering
    if not is_considered(entry):
        print(f"filtered out {name}.")
        return

    # step 2: cloning repositories.
    print(f'retrieving repository of {name}')
    status, dir = clone_repository(entry)
    if status != 0:
        print(f'cloning repository failed for {name} with status {status}')
        return

    # step 3: creating sonarqube project
    print(f'creating project for {name}')
    status = create_sonarqube_project(entry)
    if status != 200: 
        print(f'creating project failed for {name}')
        return 

    # step 3a: creating access token
    print(f'creating access token for {name}')
    status, token = create_sonarqube_token(entry)
    if status != 200: 
        print(f'creating token failed for {name}')
        return 

    # step 4: running SonarQube
    print(f'running sonarqube on {name} ({token=})')
    status = perform_sonarqube_analysis(entry, dir, token)
    if status != 0:
        print(f'sonarqube analysis failed for {name} with status {status}')
        return

    # step 5: extracting sonarqube data
    print(f'exporting results of {name}')
    export_sonarqube_issues(entry)

    # step 6: deleting sonarqube project
    print(f'cleaning up after {name}')
    revoke_sonarqube_token(entry)
    # delete_sonarqube_project(entry)
    delete_repository(entry, dir)

    print(f'completed analysis on {name}')


for entry in data:
    perform_lifecycle(entry)


retrieving repository of trafficserver
creating project for trafficserver
creating project failed for trafficserver


ChildProcessError: [Errno 10] No child processes