Description
I noticed one particular example of misleading code coverage percentages -- Xen. I describe below this particular example, but I believe the problem is general to all C/C++ projects in OSS-Fuzz.
As of 8. June 2025, the Xen page on Fuzz Introspector shows code coverage of 81% and static reachability of 86%.
These percentages are misleading because the code coverage is baselined on the single fuzzing target x86_instruction_emulator
, which is built from a tiny subset of Xen functionality. See the OSS-Fuzz wrapper scripts and the actual fuzzing target in the Xen repo.
This happens because OSS-Fuzz collects coverage statistics something like this:
$ LLVM_PROFILE_FILE=fuzzer1.profraw ./x86_instruction_emulator /fuzzing/corpus/*
$ llvm-profdata merge -sparse *.profraw -o default.profdata
$ llvm-cov report -object x86_instruction_emulator --instr-profile=default.profdata
Note the last command, llvm-cov report
. It takes only the fuzzing target itself (x86_instruction_emulator
), which was built from a small subset of sources that constitute Xen. It is also clear that the Xen hypervisor is not built in this fuzzing harness based on the ./configure ... --disable-xen
invocation.
Reproducing and fixing
In my poor-man's reproducer, I built this fuzzing target in an Ubuntu Docker container and ran libFuzzer for a minute. After collecting coverage, I got these percentages:
$ llvm-cov report -object x86_instruction_emulator --instr-profile=default.profdata
Filename Regions Missed Regions Cover Functions Missed Functions Executed Lines Missed Lines Cover Branches Missed Branches Cover
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
tools/fuzz/x86_instruction_emulator/cpuid.c 432 365 15.51% 12 9 25.00% 372 293 21.24% 228 189 17.11%
tools/fuzz/x86_instruction_emulator/fuzz-emul.c 387 99 74.42% 35 10 71.43% 430 132 69.30% 206 60 70.87%
tools/fuzz/x86_instruction_emulator/wrappers.c 10 3 70.00% 10 3 70.00% 73 24 67.12% 0 0 -
tools/fuzz/x86_instruction_emulator/x86-emulate.c 83 61 26.51% 11 6 45.45% 134 74 44.78% 62 49 20.97%
tools/fuzz/x86_instruction_emulator/x86-emulate.h 6 5 16.67% 3 2 33.33% 14 10 28.57% 4 4 0.00%
tools/fuzz/x86_instruction_emulator/x86_emulate/0f01.c 723 723 0.00% 1 1 0.00% 292 292 0.00% 378 378 0.00%
tools/fuzz/x86_instruction_emulator/x86_emulate/0fae.c 398 398 0.00% 1 1 0.00% 173 173 0.00% 198 198 0.00%
tools/fuzz/x86_instruction_emulator/x86_emulate/0fc7.c 168 168 0.00% 1 1 0.00% 148 148 0.00% 76 76 0.00%
tools/fuzz/x86_instruction_emulator/x86_emulate/decode.c 1559 1060 32.01% 7 4 42.86% 1092 707 35.26% 1078 670 37.85%
tools/fuzz/x86_instruction_emulator/x86_emulate/fpu.c 684 562 17.84% 2 0 100.00% 362 294 18.78% 378 318 15.87%
tools/fuzz/x86_instruction_emulator/x86_emulate/private.h 63 27 57.14% 10 3 70.00% 63 35 44.44% 30 20 33.33%
tools/fuzz/x86_instruction_emulator/x86_emulate/x86_emulate.c 23056 20250 12.17% 26 9 65.38% 6584 5350 18.74% 9092 6385 29.77%
tools/fuzz/x86_instruction_emulator/x86_emulate/x86_emulate.h 23 6 73.91% 10 4 60.00% 57 25 56.14% 4 0 100.00%
tools/include/xen/lib/x86/cpu-policy.h 4 1 75.00% 4 1 75.00% 22 4 81.82% 0 0 -
xen/lib/x86/private.h 1 1 0.00% 1 1 0.00% 4 4 0.00% 0 0 -
Files which contain no functions:
...
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL 27597 23729 14.02% 134 55 58.96% 9820 7565 22.96% 11734 8347 28.86%
My line code coverage was 23%, far lower than OSS-Fuzz-reported 81%, but I attribute it to the facts that (1) I didn't have any seed corpus, (2) ran it only for 1 minute, and (3) ran only libFuzzer. The exact percentage is not relevant for the purposes of this issue. What is important is that llvm-cov
baselines itself only on a few source files of Xen. (Also note that total number of lines is 9,820 which corresponds exactly to the OSS-Fuzz-reported number)
What percentage would be correct? First, I built Xen (and Xen hypervisor only) with code coverage instrumentation, roughly like this:
$ ./configure --disable-seabios --disable-tools --disable-stubdom --disable-docs --with-system-qemu
$ make -C xen menuconfig
# in the config menu, find and uncheck "live patching" and then find and check "code coverage support"
$ make clang=y world
This produces xen/xen-syms
which is the recommended binary to use for llvm-cov
.
Now it's enough to add this binary as another -object ...
parameter, and we have a properly baselined coverage report and percentages:
$ llvm-cov report -object xen/xen-syms -object x86_instruction_emulator --instr-profile=default.profdata
Filename Regions Missed Regions Cover Functions Missed Functions Executed Lines Missed Lines Cover Branches Missed Branches Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
tools/fuzz/x86_instruction_emulator/cpuid.c 432 365 15.51% 12 9 25.00% 372 293 21.24% 228 189 17.11%
tools/fuzz/x86_instruction_emulator/fuzz-emul.c 387 99 74.42% 35 10 71.43% 430 132 69.30% 206 60 70.87%
tools/fuzz/x86_instruction_emulator/wrappers.c 10 3 70.00% 10 3 70.00% 73 24 67.12% 0 0 -
tools/fuzz/x86_instruction_emulator/x86-emulate.c 83 61 26.51% 11 6 45.45% 134 74 44.78% 62 49 20.97%
tools/fuzz/x86_instruction_emulator/x86-emulate.h 6 5 16.67% 3 2 33.33% 14 10 28.57% 4 4 0.00%
...
xen/common/bitmap.c 457 457 0.00% 21 21 0.00% 240 240 0.00% 142 142 0.00%
xen/common/bug.c 60 60 0.00% 1 1 0.00% 63 63 0.00% 38 38 0.00%
xen/common/compat/domain.c 136 136 0.00% 1 1 0.00% 79 79 0.00% 52 52 0.00%
...
xen/drivers/acpi/apei/apei-base.c 90 90 0.00% 14 14 0.00% 144 144 0.00% 48 48 0.00%
xen/drivers/acpi/apei/apei-internal.h 4 4 0.00% 4 4 0.00% 12 12 0.00% 0 0 -
xen/drivers/acpi/apei/apei-io.c 218 218 0.00% 12 12 0.00% 201 201 0.00% 88 88 0.00%
...
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL 296641 292636 1.35% 6943 6853 1.30% 157006 154635 1.51% 98436 94943 3.55%
This report shows that the real LoC number is ~157K, not 9.8K, and a real fuzzing line coverage is ~1.5%.
Potential impact
I did not closely analyze other OSS-Fuzz projects with suspiciously-high code coverage. For unrelated reasons, I also looked into nginx, apache httpd, and unbound. In those three cases, OSS-Fuzz-reported code coverage looked reasonable.
A simple approach to find affected projects would be to compare OSS-Fuzz-reported LoC with some project-official LoC reports. If the two numbers differ significantly, then the project probably suffers from the described issue. I did not perform such analysis.
Proposed solutions
-
Low effort: mention in the documentation this issue of code coverage -- that only the subset of sources (actually compiled into the fuzzing target) is counted towards the total lines of code. Also make it explicit on the Fuzz Introspector web site, e.g., by adding asterisks (
*
) to "Lines of code", "Lines covered" and "Code coverage" fields, and explaining the asterisks in the footer of the page. -
High effort: Add the "mandatory baselining binary" functionality to OSS-Fuzz's coverage scripts (perhaps through a new mandatory environment variable). Go through all projects onboarded to OSS-Fuzz and add yourselves/ask maintainers to add the baselining binaries, as described above. Incorporate this rule of a "mandatory baselining binary" in PR review process.
The former solution has the downside that you keep misleading statistics visible (maybe remove them altogether then?). The latter solution requires a lot of time investment.