Skip to content

Writing a Thesis with coala

Lasse Schuirmann edited this page Oct 9, 2015 · 11 revisions

Introduction

This document lists possible topics for theses related to the coala project. The theses listed below would be done together with our partner university.

Some proposals contain a preparation task which is a nontrivial task that can help to get you started with a related component of the coala source code that will help you accomplishing your goal. If you want to get to know coala or its community before working with it/us, feel free to join our public channel at https://gitter.im/coala-analyzer/coala and discuss any questions you have.

Benefits

Technical Benefits

In contrary to most other solutions out there, coala unites a lot of features a static analyzer must have completely language independently and the possibility to automatically correct source code. Using coala takes a high burden from the developer of an analysis routine. It provides the advantage of being able to rapidly prototype code while being able to actually use it in production with no overhead, making the result well reusable for further research.

Benefits for the Student

Any student successfully completing a thesis with coala is warmly invited to apply for the Google Summer of Code stipend under coala.

The coala community will be available for helping any student practically and reviewing the written code.

Requirements to your Thesis

  • The thesis does not have to be written in english although it is recommended.
  • Ideally the code will be tested and merged into coala before the end of the thesis.
  • Basic python knowledge is recommended for all proposals.

Thesis Proposals

Develop and Implement Debugging API

Software development workflows can be greatly enhanced, especially in quality, using static code analysis. coala is a framework allows developers to rapidly develop static code analysis routines for arbitrary target languages, called Bears. coala allows modularizing analysis routines (Bears) by chaining up several Bears to reuse partial results for other possible means and/or reduce the complexity of each Bear. To ease the task of developing Bears, coala already provides logging mechanisms that allow textual debugging. This is far from sufficient for many applications, including performance critical or graph based algorithms.

To solve this problem, higher level debugging mechanisms need to be introduced to coala. For performance critical applications, the builtin profiler of python can be used to generate useful statistics about function invocations and code execution times. To debug more complex data, Bears can be written that visualize data generated by other Bears, effectively making any raw data generated by chained bears available to the developer. For this, the Bear API has to be changed to allow Bears to be, arbitrarily or within some constraints, plugged on top of other Bears to debug their (partial) results. As graph based problems will be used to create language independent Bears in the future, being able to visualize graphs would be very beneficial.

The tasks involved with this thesis can be summarized concretely:

  • Implement profiling for bears to get started on the topic (Preparation task, also see https://github.com/coala-analyzer/coala/issues/565, prototype available)
  • Redesign the Bear API to allow arbitrarily pluggable debugger Bears.
  • Implement a debug Bear for graph data that allows browsing large graphs interactively.
  • Evaluate the usefulness of such a debugging interface for a graph based problem.

This topic may also be fit for a project work.

Implement Language Independent Program Transformations

Nowadays it is common to enhance software development workflows using static code analysis. Automatic program transformations, i.e. the ability of fixing bugs automatically if requested, are the next evolutionary step and already supported by some language specific tools. However, most static code analysers are specific to one or only few programming languages, leading to a lot of work being spent on reimplementing existent code analysis and transformation routines again for each programming language.

coala tries to conquer this problem by providing a unified framework for arbitrary textual analysis and transformations. To actually reuse the same code to analyse and transform code written in multiple programming languages, code can be transformed into language independent graph. One example is a graph only depicting the order of different usages for all variables within certain scopes. Based on this graph, analysis algorithms like dead code detection algorithms could be implemented easily in a language independent manner. Program transformations like dead code removal or identifying and extracting larger clusters in functions would easily be possible while enhancing the workflow of developers greatly by being able to solve more complex problems language independently.

The tasks involved with this thesis can be summarized concretely:

  • Create a use/follow graph of source code for at least one particular language.
  • Add the ability to write back the graph to source code.
  • Writing at least one analysis algorithm and program transformation using this graph. Examples:
    • Search and remove dead code.
    • Search for large functions, identify clusters and extract them into own methods.
  • Evaluate this approach against existing solutions.
  • Evaluate for which analysis routines a graph based approach could be used, what would need to be changed and what other approaches there are.

It is recommended to realize this thesis only, if "Develop and Implement Debugging API" is already realized. Ideally one person realized both theses although this is not a mandatory prerequisite.

Clone this wiki locally