This is the artifact accompanying the submission "Efficient Generation of Error-Inducing Floating-Point Inputs via Symbolic Execution".
The original results are produced on a workstation with Intel(R) Xeon(R) Gold 6238 CPU (8 cores, 2.10GHz), 32GB RAM, and the operating system is Ubuntu 14.04.5 LTS. To reproduce the results, a machine with similar CPUs(~2.10GHz and at least 8 cores), 32GB or larger RAM is required. Running the artifact on a different machine could possibly diverge the execution and lead to different results. Moreover, all the experiments have to be run sequentially.
1. Pull the docker image of FPGen artifact and creat, run the FPGen container
$ docker pull ucdavisplse/fpgen-artifact:icse20 $ docker run -ti --name=FPGen --cpus=8 --memory=32g ucdavisplse/fpgen-artifact:icse20
when inside the FPGen container,
(1) to exit the FPGen docker container and terminate all running jobs inside the container, use:
(2) to detach from the running FPGen container without stopping it,
press CTRL+P followed by CTRL+Q
when outside the FPGen container and looking to start/attach to it, run:
$ docker start -ai FPGen
2. Clone FPGen github repo. and rename it
Inside FPGen container, navigate to home directory, clone this repo. and rename it,
$ cd $ git clone https://github.com/ucd-plse/FPGen.git $ mv FPGen FPTesting
Note: you can ignore the example and directly go to the next section if you are only looking to replicating/reproducing the results of FPGen.
navigate to example
cd /home/fptesting/example; ls
We use the recursive summation program that takes an input array of size 4 as an example. There are two c files needed to run FPGen:
- add-SYMBOLIC.c : the original program with error injections
- test/add-INOUT.c : the original program that is modified to read the input array from a file and write the result to the output file, and can be compiled to various precision (float, double, __float128) for error measurement.
(1) compile and symbolically run the test
$ ../scripts/compile.sh add-SYMBOLIC
This step enables KLEE to explore the error paths and it will take 2~3 minutes. When KLEE finishes, you should see the number of paths KLEE has reached,
KLEE: done: total instructions = 4133 KLEE: done: completed paths = 84 KLEE: done: generated tests = 84
(2) read the inputs generated by KLEE
(3) prepare the executables with multiple precisions
$ cd test $ make $ ls
There are 3 executables: add-32 with float precision, add-64 with double precision, _add-128_with __float128 precision. (4) Measure errors for the generated inputs
$ cd ../../example $ ../scripts/compute_error.sh "klee-last/test0000*-input" test/add-64 test/add128
It measures the errors between double and __float128, and prints out the top 10 error-inducing inputs and the relative error the input triggers like following,
Maximum relative error: 5.1607477707e-05 [klee-last/test000084-input]. ---top 10 inputs klee-last/test000084-input -- 5.1607477707e-05 klee-last/test000080-input -- 2.1288082770e-09 klee-last/test000064-input -- 5.0207933244e-16 klee-last/test000077-input -- 1.6766168035e-16 klee-last/test000072-input -- 1.1026481444e-16 klee-last/test000067-input -- 1.0747657680e-16 klee-last/test000079-input -- 1.0714443092e-16 klee-last/test000066-input -- 8.8272423124e-17 klee-last/test000082-input -- 4.1739994958e-17 klee-last/test000063-input -- 6.9408001417e-18
The maximum error triggered is
5.1607477707e-05 and the error-inducing input is
Reproduce the results of FPGen or baseline tools on selected benchmarks
navigate to benchmarks
$ cd /home/fptesting/FPTesting/benchmarks; ls
The benchmarks are divided into three parts:
(1) summations (3 summation tests):
recursive-summation pairwise-summation compensated-summation
(2) matrix (9 meschach tests):
sum 1norm 2norm dot conv mv mm lu qr
(3) gsl (15 gsl tests):
wmean wvariance wsd wvariance-w wsd-w wtss wtss-m wvariance-m wsd-m wabsdev wskew wkurtosis wabsdev-m wskew-m wkurtosis-m
In each part, there is a README file (click the link below) that describes the detailed steps to replicate the results of FPGen or the 3 baseline tools RANDOM, S3FP and KLEE-FLOAT for one, some or all tests in that part.
- summations https://github.com/ucd-plse/FPGen/tree/master/benchmarks/summations
- matrix https://github.com/ucd-plse/FPGen/tree/master/benchmarks/matrix
- gsl https://github.com/ucd-plse/FPGen/tree/master/benchmarks/gsl
Reproduce the results of FPGen or baseline tools on all benchmarks
navigate to benchmarks
$ cd /home/fptesting/FPTesting/benchmarks
To replicate the results of FPGen for all 27 tests at one time, use fpgen_all.sh, and the run will take approximately 54-60 hours:
$ nohup ./fpgen_all.sh &
The FPGen results will be collected in file result-fpgen.txt. You can either inspect the results manually by comparing them to reference/result-fpgen-all.txt or use our utility script to compare the results automatically by running:
$ ../scripts/cmp_to_ref.sh -a result-fpgen.txt reference/result-fpgen-all.txt
The comparison script will report "check pass" or "check failed" for each test.
To replicate the results of RANDOM/S3FP/KLEE-FLOAT for all tests at one time, use baselines_all.sh, and the run will take approximately 108-120 hours:
$ nohup ./baselines_all.sh &
The RANDOM/S3FP/KLEE-FLOAT results will be collected in file result-baselines.txt. Inspect the results by comparing them to reference/result-baselines-all.txt.
FPGen starts with a random search that provides the base values of the input data for the algorithm to further improve. The random search, however, is performed with a time threshold which we found is unstable across machines even with similar hardware specifications. To increase the chance of replication/reproduction, we provide the base values in the artifact and skip this step, which is obsereved to be helpful in replicating/reproducing the FPGen results in other machines. To ignore the base values we provide and start FPGen from scratch, use fpgen-from-scratch.sh for gsl and matrix tests, and the same fpgen.sh for summations tests.