Contains the complete step-wise filter, pull, and SonarQube analysis pipeline for the provided git repositories.  
The expected input is a list of repositories of which the data is acquired through the github web API.

Make sure that a SonarQube instance is running prior to running this script and that the credentials used in the analysis phase are set correctly.
Consider that executing this script takes quite a while to complete.

- SonarQube logs are put in the `./output` folder, you can use this to figure out why SonarQube fails when analyzing certain projects.
- The results are put in the `./results` folder.

If you interrupt this script halfway through, or the script crashes at some point, it might be that it doesn't work the next run.
Restarting Jupyter might help here, but also check if the access tokens are revoked correctly.
Additionally, at some point the SonarQube might run out of heap memory, which is why this script is executed in batches, each of which has a corresponding sonarqube instance.

This script uses bash calls to run SonarQube, so if you run this on Windows, these commands probably have to be changed somewhat.
It's definitely not perfect either; it is quite likely it will crash a few times whilst running it.


In [1]:
import array
import os
import requests
from git import Repo, GitCommandError
import subprocess
from typing import Tuple
import shutil
from subprocess import Popen
import time

with open("./data/all_repositories_batch_2.csv", "r") as data_file:
    data = [entry.strip().split(",") for entry in data_file.readlines()[1:]]

print(f'loaded data:\n{data}')


loaded data:
[['zookeeper', 'https://github.com/apache/zookeeper', '128397', 'master', 'Java'], ['camel', 'https://github.com/apache/camel', '572222', 'main', 'Java'], ['nutch', 'https://github.com/apache/nutch', '136780', 'master', 'Java'], ['commons-lang', 'https://github.com/apache/commons-lang', '26071', 'master', 'Java'], ['activemq', 'https://github.com/apache/activemq', '58435', 'main', 'Java'], ['hive', 'https://github.com/apache/hive', '561127', 'master', 'Java'], ['trafficserver', 'https://github.com/apache/trafficserver', '132898', 'master', 'C++'], ['jmeter', 'https://github.com/apache/jmeter', '91357', 'master', 'Java'], ['karaf', 'https://github.com/apache/karaf', '67084', 'main', 'Java'], ['bookkeeper', 'https://github.com/apache/bookkeeper', '58252', 'master', 'Java'], ['bigtop', 'https://github.com/apache/bigtop', '63745', 'master', 'Groovy'], ['kafka', 'https://github.com/apache/kafka', '132860', 'trunk', 'Java'], ['ambari', 'https://github.com/apache/ambari', '365428

Step 1 of the lifecycle, all methods used to statically filter the repositories.


In [2]:
PROJECT_TYPE = 4


def is_not_java(entry: array) -> bool:
    return entry[PROJECT_TYPE] != "Java"


def is_considered(entry: array) -> bool:
    """Returns true if the project is a maven project"""

    # return is_java(entry) and is_maven(entry)
    return is_not_java(entry)


Step 2 of the lifecycle, cloning a repository


In [3]:
PROJECT_NAME = 0
PROJECT_REPO_URL = 1
DEFAULT_BRANCH = 3

date = "May 7, 2020"


def clone_repository(entry: array) -> Tuple[int, str]:
    """
    Clones repository to the ./repos folder and 
    returns the status code and repo's folder.
    """

    url = entry[PROJECT_REPO_URL]
    dir = os.path.join("./repos", entry[PROJECT_NAME])

    try:
        Repo.clone_from(url, dir)
        status = 0

    except GitCommandError as e:
        if e.status != 128:
            status = e.status
        else:
            status = 0
            print(f'repository for {entry[PROJECT_NAME]} is already cloned')

    finally:
        return status, dir


def checkout_date(dir: str, entry: array):
    """Checks out the latest version of the default branch at the given time."""

    os.chdir(dir)

    args = [f'git rev-list -1 --before="{date}" {entry[DEFAULT_BRANCH]}']
    with Popen(args, stdout=subprocess.PIPE, stderr=None, shell=True) as get_commit:
        commit = get_commit.stdout.read().decode('UTF-8')

    args = [f'git checkout {commit}']
    with Popen(args, stdout=subprocess.PIPE, stderr=None, shell=True) as do_checkout: 
        status = do_checkout.returncode

    os.chdir("../..")

    return status


Implements steps 3 to 5 of the lifecycle, the SonarQube steps.


In [4]:
server_url = "http://sonarqube:9000"
auth = ('admin', 'password')


def create_sonarqube_project(entry: array) -> int:
    """Creates SonarQube project if none exists yet"""

    name = entry[PROJECT_NAME]

    url = f"{server_url}/api/projects/create"
    data = {'name': name, 'project': name, 'visibility': 'public'}
    c_res = requests.post(url=url, data=data, auth=auth)

    return c_res.status_code


def perform_sonarqube_analysis(entry: array, dir: str) -> int:
    """Executes sonarqube analysis and sends it to the server"""

    name = entry[PROJECT_NAME]

    # All java files are ignored as they require compiling and
    # will cause the analysis to fail.
    args = [
        'sonar-scanner',
        '-Dsonar.sources=.',
        f'-Dsonar.projectKey={name}',
        f'-Dsonar.host.url={server_url}',
        f'-Dsonar.login={auth[0]}',
        f'-Dsonar.password={auth[1]}',
        '-Dsonar.coverage.exclusions=/**.java',
        '-Dsonar.test.exclusions=/**.java',
        '-Dsonar.exclusions=/**.java'
    ]

    with open(f"./output/{name}-sonarqube-output.log", "w") as output_file:
        os.chdir(dir)
        res = subprocess.run(args, stdout=output_file)

    # TODO: this shouldn't assume two layers.
    os.chdir("../..")

    return res.returncode


def export_sonarqube_issues(entry: array):
    """Exports the generated issues through the web API"""

    name = entry[PROJECT_NAME]

    url = f"{server_url}/api/issues/search"
    data = {'componentKeys': name}
    res = requests.get(url=url, data=data, auth=auth)

    with open(f"./results/issues-{name}.json", "w") as results_file:
        results_file.write(res.text)


Step 6 of the lifecycle, clean up methods


In [5]:
def delete_sonarqube_project(entry: array):
    """Deletes sonarqube project """

    name = entry[PROJECT_NAME]

    url = f'{server_url}/api/projects/delete'
    data = {'project': name}
    requests.post(url=url, data=data, auth=auth)


def delete_repository(entry, dir):
    """Deletes repository"""

    shutil.rmtree(dir)


Implements the pipeline lifecycle.

1. filtering repositories
2. cloning repositories
3. creating SonarQube project
4. analyzing repository
5. exporting results
6. (optional) deleting sonarqube project


In [6]:
def perform_lifecycle(entry: array):

    name = entry[PROJECT_NAME]

    # step 1: filtering
    if not is_considered(entry):
        print(f"filtered out {name}.")
        return

    # step 2: cloning repositories.
    print(f'retrieving repository of {name}')
    status, dir = clone_repository(entry)
    if status != 0:
        print(f'cloning repository failed for {name} with status {status}')
        return

    time.sleep(1)
    # step 2a: checking out at date.
    print(f'checking out at date for {name}')
    checkout_date(dir, entry)

    # step 3: creating sonarqube project
    print(f'creating project for {name}')
    status = create_sonarqube_project(entry)
    if status != 200:
        print(f'creating project failed for {name}')
        return

    # step 4: running SonarQube
    print(f'running sonarqube on {name}')
    status = perform_sonarqube_analysis(entry, dir)
    if status != 0:
        print(f'sonarqube analysis failed for {name} with status {status}')
    else:
        # step 5: extracting sonarqube data
        print(f'exporting results of {name}')
        export_sonarqube_issues(entry)

    # step 6: deleting sonarqube project
    print(f'cleaning up after {name}')
    # delete_sonarqube_project(entry)
    # delete_repository(entry, dir)

    print(f'completed analysis on {name}')


for entry in data:
    perform_lifecycle(entry)


filtered out zookeeper.
filtered out camel.
filtered out nutch.
filtered out commons-lang.
filtered out activemq.
filtered out hive.
retrieving repository of trafficserver
checking out at date for trafficserver


Note: switching to 'ad89f720fa1bc769db5f0aac9897ff2689c74051'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at ad89f720f Fix typos in comments


creating project for trafficserver
running sonarqube on trafficserver
exporting results of trafficserver
cleaning up after trafficserver
completed analysis on trafficserver
filtered out jmeter.
filtered out karaf.
filtered out bookkeeper.
retrieving repository of bigtop
checking out at date for bigtop
creating project for bigtop


Note: switching to '5784cad1f3a32dcbe3001f1f1db55eac2c947a3f'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 5784cad1 BIGTOP-3353: hive.hwi.war.file configuration is no longer needed (#638)


running sonarqube on bigtop
exporting results of bigtop
cleaning up after bigtop
completed analysis on bigtop
filtered out kafka.
filtered out ambari.
filtered out accumulo.
filtered out jackrabbit-oak.
filtered out dubbo.
filtered out drill.
filtered out druid.
filtered out jena.
filtered out tomee.
retrieving repository of echarts
checking out at date for echarts


Note: switching to '1368d2c03f37ddc1ca0783ebaf30b50ff26b0a15'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 1368d2c03 feat(bmap): add `mapOptions` for bmap extension.


creating project for echarts
running sonarqube on echarts
exporting results of echarts
cleaning up after echarts
completed analysis on echarts
filtered out cloudstack.
retrieving repository of cordova-plugin-camera
checking out at date for cordova-plugin-camera
creating project for cordova-plugin-camera


Note: switching to 'ba4f77468ff83377607aab775ebdb3faaff6a8f9'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at ba4f774 chore(release): 4.2.0-dev


running sonarqube on cordova-plugin-camera
exporting results of cordova-plugin-camera
cleaning up after cordova-plugin-camera
completed analysis on cordova-plugin-camera
retrieving repository of cordova-plugin-file
checking out at date for cordova-plugin-file
creating project for cordova-plugin-file


Note: switching to '7d76943c376da404e8fd797ec687c1138c63796c'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 7d76943 chore(asf): update git notification settings


running sonarqube on cordova-plugin-file
exporting results of cordova-plugin-file
cleaning up after cordova-plugin-file
completed analysis on cordova-plugin-file
retrieving repository of cordova-plugin-inappbrowser
checking out at date for cordova-plugin-inappbrowser
creating project for cordova-plugin-inappbrowser


Note: switching to '2793e16ab433a413580e2a3a5ccbc40eda46c3b6'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 2793e16 fix(ios): prevent statusbar rotation after closing InAppBrowser (#672)


running sonarqube on cordova-plugin-inappbrowser
exporting results of cordova-plugin-inappbrowser
cleaning up after cordova-plugin-inappbrowser
completed analysis on cordova-plugin-inappbrowser
retrieving repository of cordova-plugin-media
checking out at date for cordova-plugin-media
creating project for cordova-plugin-media


Note: switching to 'cd2cdacb39a1d98d64113c4790124e2dd024b472'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at cd2cdac chore(asf): update git notification settings


running sonarqube on cordova-plugin-media
exporting results of cordova-plugin-media
cleaning up after cordova-plugin-media
completed analysis on cordova-plugin-media
filtered out storm.
filtered out usergrid.
filtered out helix.
filtered out jackrabbit-filevault.
retrieving repository of qpid-dispatch
checking out at date for qpid-dispatch
creating project for qpid-dispatch


Note: switching to '936d679f754639895dccccf85b665651ed4d73f3'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 936d679f DISPATCH-1637: Added a hashtable to store names of link routes, auto links and address configs. This closes #732.


running sonarqube on qpid-dispatch
exporting results of qpid-dispatch
cleaning up after qpid-dispatch
completed analysis on qpid-dispatch
filtered out cxf.
retrieving repository of spark
checking out at date for spark


Note: switching to '272d229005b7166ab83bbb8f44a4d5e9d89424a1'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 272d229005 [SPARK-31361][SQL][TESTS][FOLLOWUP] Check non-vectorized Parquet reader while date/timestamp rebasing


creating project for spark
running sonarqube on spark
exporting results of spark
cleaning up after spark
completed analysis on spark
filtered out streams.
filtered out tajo.
filtered out incubator-brooklyn.
filtered out pinot.
filtered out hbase.
filtered out stratos.
filtered out phoenix.
filtered out flink.
retrieving repository of parquet-cpp
checking out at date for parquet-cpp
creating project for parquet-cpp


Note: switching to '642da055adf009652689b20e68a198cffb857651'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 642da05 Incrementing snapshot version to 1.5.1-SNAPSHOT.


running sonarqube on parquet-cpp
exporting results of parquet-cpp
cleaning up after parquet-cpp
completed analysis on parquet-cpp
filtered out parquet-mr.
filtered out calcite.
filtered out hadoop.
filtered out reef.
filtered out karaf-decanter.
filtered out commons-text.
retrieving repository of infrastructure-puppet
checking out at date for infrastructure-puppet


Note: switching to '725ae7cd1a5fe7a886935990e3170fc7ffde1ad1'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 725ae7cd1 recrypt eyaml


creating project for infrastructure-puppet
running sonarqube on infrastructure-puppet


ERROR: Unable to parse file: modules/qmail_asf/files/apmail/bin/generate-qmail-pmcs.py
ERROR: Parse error at line 46 column 22:

    9: import glob
   10: import logging
   11: import os
   12: import subprocess
   13: import urllib
   14: import json
   15: 
   16: execfile("common.conf")
   17: 
   18: URL = "https://whimsy.apache.org/public/committee-info.json"
   19: 
   20: LISTDIR = "APMAIL_HOME/lists'
   21: TARGET = 'APMAIL_HOME/.qmail-pmcs'
   22: 
   23: logging.getLogger().setLevel(logging.INFO)
   24: 
   25: def main():
   26:     pmcs = []
   27:     cttees = json.load(urllib.urlopen(URL),'utf-8')['committees']
   28:     for j in cttees:
   29:         cttee = cttees[j]
   30:         if cttee['pmc']:
   31:             pmcs += [cttee['mail_list']]
   32: 
   33:     fd = open(TARGET+'.t', 'w')
   34:     fd.write('| APMAIL_HOME/bin/ezmlm-filter-bcc pmcs@apache.org\n')
   35:     for p in sorted(set(pmcs)):
   36:         fd.write('private@%s.apache.org\n' % p)
   37:   

exporting results of infrastructure-puppet
cleaning up after infrastructure-puppet
completed analysis on infrastructure-puppet
filtered out gobblin.
filtered out nifi.
filtered out kylin.
filtered out tinkerpop.
filtered out ignite.
filtered out samza.
filtered out zeppelin.
retrieving repository of airflow
checking out at date for airflow


Note: switching to '723c52c942b49b0e8c8fa8667a4a6a45fa249498'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 723c52c94 Add documentation for SpannerDeployInstanceOperator (#8750)


creating project for airflow
running sonarqube on airflow
exporting results of airflow
cleaning up after airflow
completed analysis on airflow
filtered out groovy.
filtered out geode.
retrieving repository of incubator-mxnet
checking out at date for incubator-mxnet


Note: switching to '23df47fcd44891864c2bb4a7800e8faa30471ff2'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 23df47fcd Fix gluon link missing (#18243)


creating project for incubator-mxnet
running sonarqube on incubator-mxnet


ERROR: Unable to parse file: file:///workspace/scripts/compilation%20pipeline/repos/incubator-mxnet/docker/Dockerfiles/Dockerfile.in.scala. Parse error at position 1:0
ERROR: Cannot parse 'docker/Dockerfiles/Dockerfile.in.scala': Unable to parse file content.


exporting results of incubator-mxnet
cleaning up after incubator-mxnet
completed analysis on incubator-mxnet
retrieving repository of orc
checking out at date for orc


Note: switching to '31ed8b44e4889f15736ad926c8ce6f8286173a41'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 31ed8b44 Add ORC-1.5.10 and 1.6.3 to site.


creating project for orc
running sonarqube on orc


ERROR: Failed to parse file:///workspace/scripts/compilation%20pipeline/repos/orc/site/css/screen.scss, line 1, Unknown word


exporting results of orc
cleaning up after orc
completed analysis on orc
retrieving repository of flink-web
checking out at date for flink-web


Note: switching to 'e5b23762bfa48927f374c293306ac77e8e212c36'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at e5b23762 Adding a new community update blogpost.


creating project for flink-web
running sonarqube on flink-web
exporting results of flink-web
cleaning up after flink-web
completed analysis on flink-web
filtered out activemq-artemis.
retrieving repository of trafodion
checking out at date for trafodion


Note: switching to '962d9f8cab74b910da77329d2b1e5ba03f89b456'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 962d9f8ca Merge pull request #1875 from selvaganesang/release2.4


creating project for trafodion
running sonarqube on trafodion


ERROR: Unable to parse file: docs/command_interface/src/resources/source/sample.py
ERROR: Parse error at line 1 column 0:

  -->  import os import sys
    2: 
    3: 
    4: sys.path.append("C:\\Program Files (x86)\\Apache Software Foundation\\Trafodion Command Interface\\lib\\python")
    5: import Session
    6: 
    7: 
    8: sess = Session.Session()
    9: 
   10: 
   11: x=sess. connect

ERROR: Unable to parse file: core/conn/trafci/lib/python/Session.py
ERROR: Parse error at line 23 column 39:

   22: import sys
  -->  sys.path +=                            
   24: from org.trafodion.ci import ScriptsInterface
   25: from java.lang import System
   26: from java.io import PrintStream
   27: from java.io import BufferedOutputStream

ERROR: Failed to parse file:///workspace/scripts/compilation%20pipeline/repos/trafodion/dcs/src/main/resources/dcs-webapps/master/datatables/css/jquery.dataTables_themeroller.css, line 101, Unknown word
ERROR: Failed to parse file:///workspace/scripts

exporting results of trafodion
cleaning up after trafodion
completed analysis on trafodion
filtered out apex-core.
filtered out apex-malhar.
