Detect fuzzing issues by considering past results

Hello, as part of some research we analyzed fuzzer performance degradation by looking at the reasons why fuzzing coverage reduces for C/C++ projects in OSS-Fuzz. We found that there are several types of issues that are easier to detect by comparing to past reports.

**I would be happy to implement these metrics if you are interested.**

-  Detecting coverage drops would be a generic way to detect degradation, this is already discussed here: https://github.com/google/oss-fuzz/issues/11398. Here a threshold would need to be decided, maybe percentage or absolute number of lines.
- A common reason for large coverage drops is the vendoring of third-party library code, though, sometimes also project specific code. If you agree that library code should not be included in the coverage measurement, large changes should cause an alert and be ignored. See [grpc-httpjson-transcoding](https://storage.googleapis.com/oss-fuzz-introspector/grpc-httpjson-transcoding/inspector-report/20250202/fuzz_report.html) as an example, which is by itself a few hundred lines of code with close to 100% coverage but vendored 100k lines of library code.
- Compare the fuzz targets over time. It sometimes happens that a project starts to have a partial build failure that only stops one (or few) fuzz target from building, while not necessarily causing a build failure issue to be created for the project. For example this happened with curl: https://github.com/google/oss-fuzz/issues/11398#issuecomment-1867707444
- The number of corpus entries is normally quite stable. But due to the way coverage is collected can fluctuate and drop to a fraction of the real size: https://github.com/google/oss-fuzz/issues/12986 and https://github.com/google/oss-fuzz/issues/11935. So this could be detected by looking at past corpus sizes. Though, if I understand correctly the seed corpus is combined across fuzz targets? Alternatively, a expected number of corpus entries for covered code branches/lines could be decided. For example covering 10k lines with five corpus entries does not seem like effective fuzzing.

This is also related to diffing runs: https://github.com/ossf/fuzz-introspector/issues/734

I can also provide more examples if you want, just wanted to keep it short.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Detect fuzzing issues by considering past results #2054

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Detect fuzzing issues by considering past results #2054

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions