Skip to content


Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


Flash (FLaky ASsertion Handler) is a tool for detecting flaky tests in projects which use probabilistic programming systems or machine learning frameworks. This is an implementation of our paper Detecting Flaky Tests in Probabilistic and Machine Learning Applications published in ISSTA 2020.

FLASH focuses on tests failing due to different sequences of random numbers produced in each execution. FLASH runs the test several times and monitors the actual and expected values used in the assertion in the test. Finally, FLASH reports any observed failures and returns the probability of failure of each test, based on the collected samples.

Installing Dependencies

This installation requires Python>=3.6.

Run to install all python dependencies
Note: We have tested our system on Ubuntu 16.04 and 18.04. Mac/Windows users might need to make some adjustments to the scripts to make them run.

Setting up a target project

To run FLASH, we first need to setup a virtual environment for a target project. We will setup the HazyResearch/metal project as an example.

First, clone the project in the projects folder

git clone

Then setup a anaconda environment for this project. Example script is attached in scripts/ We also provide scripts to setup other projects used in the paper.

./scripts/ [metal-install-directory]

Configure the library in file. Each project has the following options.

name : name of the project

conda_env: name of the conda environment where projects is installed

parallel: run the tests in parallel

path: path of the project

enabled: whether to run this project

deps : libraries with random number generators that the project depends on. Currently, pytorch, tensorflow, and numpy libraries are supported.

Running FLASH

First, setup the configurations in src/

DEFAULT_ITERATIONS : minimum iterations to run

SUBSEQUENT_ITERATIONS: subsequent iterations to run after the first run

MAX_ITERATIONS : maximum iterations to run before convergence

MAX_DEVIATION_GEWEKE : threshold for convergence test

THREAD_COUNT : number of threads to run in parallel

Next, run flash using:


This will run FLASH on all the projects in FLASH performs the following steps:

  1. Collect all tests with approximate assertions. E.g. tests of kind assert a >|>=|<=|<= b, assert_allclose, etc. The full list can be found in our paper.

  2. Run each test several times with different seeds for each RNG until it converges.

  3. Finally, it reports any failures and shows the probability of the failure of the test


FLASH will produce the following output after running:

A logs\run_[runID]_[project] folder with all the results for the project that was run (metal in this case). This will contain a folder for each assert named assert_[assertID] and the logs for all assertions named log.txt

Each assert_[assertID] folder will contain the following files:

output_* : files with the output of each execution of the test and the seeds
samples.txt : Samples collected from each execution of the test
report.txt : Details of the assertion (file name, location in file, assertion string), Statistics of the samples, # of runs, # of passes/fails, Probability of failure of the test
test*.py: Instrumented test file

New Bugs Found By FLASH

The file evaluation/newbugs.csv lists all the new bugs found using FLASH with their corresponding Pull Request and/or Issue Links.

Citing FLASH

Please cite us if you use our tool:

  title={Detecting flaky tests in probabilistic and machine learning applications},
  author={Dutta, Saikat and Shi, August and Choudhary, Rutvik and Zhang, Zhekun and Jain, Aryaman and Misailovic, Sasa},
  booktitle={Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis},


No description, website, or topics provided.






No releases published


No packages published