Travis Listener is an architecture that crawls TravisCI to collect Builds and Jobs. It currently contains two plugins. One save all the builds and jobs inside a MangoDB database. The second plugin detect the restarted builds and save them.
Travis Listener has been build for the paper "Empirical study of restarted and flaky builds on Travis CI". You can cite this paper using the following bibtex:
@inproceedings{DurieuxTravis20,
title = {Empirical Study of Restarted and Flaky Builds on Travis CI},
author = {Durieux, Thomas and Le Goues, Claire and Hilton, Michael and Abreu, Rui},
year = 2020,
booktitle = {Proceedings of the 17th International Conference on Mining Software Repositories},
location = {Seoul, Republic of Korea},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
series = {MSR '20},
pages = {254–264},
doi = {10.1145/3379597.3387460},
isbn = 9781450375177,
url = {https://doi.org/10.1145/3379597.3387460},
numpages = 11
}
The scripts for the paper "An Analysis of 35+ Million Jobs of Travis CI" is available in this repository https://github.com/tdurieux/travis-collector.
- Install Docker and Dockercompose
- Add Github token in the file github/server.js
- Start the service
docker-compose up -d
- Go to http://localhost:5001
Travis Listener of seven modules: three services, two plugins, a dashboard, and a database. The infrastructure is built on top of Docker compose v2.4. Docker Compose is straightforward to install, scalable, and resilient (given. e.g., its auto-restart capabilities). Each module of our infrastructure is thus a docker image integrated into Docker compose. The services, plugins, and the dashboard are implemented using JavaScript and Node.js v.10.
The Dashboard provides a web interface to configure and monitor the state of the different modules of the system.
For the Database, we use MongoDB which integrates well with Node.js and provides data compression by default. Data compression is a useful feature, since we collect millions of highly compressible log files.
The Log Parser Service is a service that is used to manipulate logs. The current version of the service provides the following features:
- Log minimization: removes the meaningless content such as progress bar status and log formatting.
- Log Diff: produced minimized diffs between two logs by removing all random or time-based content, such as ids or dates.
- Data extraction: parses the log to extract failures reasons such as test failures, checkstyle warnings, compilation errors, or timeouts. We are currently using 93 regular expressions to extract failure reasons from logs.
The GitHub Service is a simple middleware component that handles GitHub API's tokens. It serves to simplify the usage of the GitHub API within Travis Listener by centralizing identification and rate limiting.
The Travis Crawler Service extracts the information from TravisCI. Its main purpose is to crawl TravisCI to detect any new jobs and builds triggered by TravisCI, live. Travis Crawler Service provides a WebSocket service that can be listened to by all Travis Listener modules. The WebSocket provides live notifications for any new TravisCI jobs or builds.
The Build Saver Plugin listens to the Travis Crawler Service and saves all information to the database. We save the following information: TravisCI's job, TravisCI's build, commit information (not including the diff), repository information, and user information. The goal of this plugin is to track all changes, and provide statistics on who is using TravisCI.
The Restarted Build Plugin collects the information relevant to the present study.
Its goal is to detect restarted builds on TravisCI.
When a build is restarted by a developer, all the original information is overwritten. Tracking restarted builds thus requires live collection of build data (in our case, using the Build Saver Plugin).
To detect restarted builds, the Restarted Build Plugin crawls periodically (once a day) the collected builds from the 30 previous days, comparing the build start timestamp provided by the TravisCI' API to the start time saved by the Build Saver Plugin.
If the two times differ, the build was restarted.
For each restarted build, we collect the new TravisCI job information and execution logs.