kvnemesis: first step towards a fuzzed KVNemesis #148634

stevendanna · 2025-06-21T21:45:01Z

You can run this test with the go fuzzer with something like:

go test ./pkg/kv/kvnemesis/ -test.fuzz=FuzzKVNemesisSingleNode
-test.fuzzcachedir=_fuzzcache -v -test.run=^$
-tags crdb_test -timeout=300m -parallel=4

It can also be run under bazel, but I have not yet sorted out all of the flags needed to get a coverage enabled build and to ensure that the failing test cases get written somewhere that can be referenced on subsequent runs.

The idea here is that the fuzzer provides a []byte that then determines the output of all random decisions in KVNemesis. This doesn't account for metamorphic decisions made outside of KVNemesis.

KVNemesis is a rather heavyweight test which seemed to be a problem for running it reliably under go-fuzz; however, go-fuzz's poor diagnostics when the test worker crash has made it hard to determine the exact cause so far.

Epic: none
Release note: None

cockroach-teamcity · 2025-06-21T21:45:12Z

This change is

To run natively, ``` go build -buildmode=c-archive -gcflags=all=-d=libfuzzer -tags "crdb_test,gofuzz,gofuzz_libfuzzer,libfuzzer" -trimpath -gcflags syscall=-d=libfuzzer=0 -o kvnemesis_fuzz.a pkg/kv/kvnemesis/main/main.go clang -o kvnemesis_fuzz kvnemesis_fuzz.a /home/stan_cockroachlabs_com/go/src/github.com/cockroachdb/cockroach/bin/c-deps/archived_cdep_libjemalloc_linux/lib/libjemalloc.a /home/stan_cockroachlabs_com/go/src/github.com/cockroachdb/cockroach/bin/c-deps/archived_cdep_libproj_linux/lib/libproj.a -fsanitize=fuzzer ./kvnemesis_fuzz -max_len=8192 seeds >libfuzzer.out 2>&1 ``` To run in go with libfuzzer instrumentation, ``` bazel run --override_repository=io_bazel_rules_go=/home/stan_cockroachlabs_com/go/src/github.com/cockroachdb/rules_go --@io_bazel_rules_go//go/config:gc_goopts=-d=libfuzzer --crdb_test pkg/kv/kvnemesis:kvnemesis_test -- -test.run notests -test.fuzz FuzzKVNemesisSingleNode -test.fuzzcachedir /tmp/foobar/ -test.v >gofuzzer.out 2>&1 ``` [1] cockroachdb#148634 [2] https://cockroachlabs.slack.com/archives/C023S0V4YEB/p1750542607409039?thread_ts=1750341116.758639&cid=C023S0V4YEB Epic: none Release note: None

stevendanna · 2025-06-23T12:13:47Z

Test failure is unrelated (but important and fixed here #148660)

You can run this test with the go fuzzer with something like: go test ./pkg/kv/kvnemesis/ -test.fuzz=FuzzKVNemesisSingleNode \ -test.fuzzcachedir=_fuzzcache -v -test.run=^$ \ -tags crdb_test -timeout=300m -parallel=4 It can also be run under bazel, but I have not yet sorted out all of the flags needed to get a coverage enabled build and to ensure that the failing test cases get written somewhere that can be referenced on subsequent runs. The idea here is that the fuzzer provides a []byte that then determines the output of all random decisions in KVNemesis. This doesn't account for metamorphic decisions made outside of KVNemesis. KVNemesis is a rather heavyweight test which seemed to be a problem for running it reliably under go-fuzz; however, go-fuzz's poor diagnostics when the test worker crash has made it hard to determine the exact cause so far. Epic: none Release note: None

stevendanna · 2025-06-30T18:14:30Z

Overall, this test might be a bit too big for a fuzz-based approach. But, I think this might still be worth committing just so we have an example of how it might work.

miraradeva

Looks great, just trying to understand how fuzzing works. Naively I would have just let the fuzzer pick a randomness source or just a seed, and pass it to kvnemesis, but then again I don't know how the fuzzer actually navigates a coverage area such as kvnemesis, which itself makes random choices. Feel free to point me to reading material; I just did the go fuzzing tutorial and that didn't do into depth.

Reviewed all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @srosenberg)

pkg/kv/kvnemesis/kvnemesis_test.go line 443 at r1 (raw file):

		// reliably. With all default settings the test runner fails without
		// printing any useful info. I _think_ it might be the result of a
		// hard-coded 10s timeout in the go-fuzz test worker.

It seems like that can be changed with -fuzztime?

pkg/kv/kvnemesis/kvnemesis_test.go line 458 at r1 (raw file):

			assertRaftApply:              true,
		})
		f.Add(rndSource.Output())

Another fuzzing noob question: it seems important to actually run kvnemesis with the generated randomness source and pass the resulting output to the fuzzer? Why is that exactly? Isn't the output the same as grabbing a random value from the source a fixed number of times? Is the problem that we don't know what that fixed number is?

pkg/util/randutil/rand.go line 301 at r1 (raw file):

// fuzzer's next step can change a single decision without affecting all
// previous decisions in the test, giving it some ability to direct its
// exploration.

Fuzzing noob question: one step of the fuzzer is one run of the given test (e.g. kvnemesis), which might consist of multiple random decisions, right? If yes, then what does it mean for "the fuzzer's next step can change a single decision without affecting all previous decisions in the test"?

srosenberg

Naively I would have just let the fuzzer pick a randomness source or just a seed

The fuzzer uses a different API than stdlib's Random; see LLVMFuzzerTestOneInput in [1]. Most other fuzzers including Go's essentially work the same way. That is, they use an internal PRNG to generate and mutate an input, i.e., a byte array. (You can see an example of a Go's fuzzing mutator in [2]. Go's fuzzer uses reflection to extract out typed inputs from the byte array.) Fuzzers usually implement their own Random, mostly for performance reasons. (Fundamentally, there is no difference.)

Since fuzzer input generation is guided by coverage, it's important that the entire set of inputs is generated by the fuzzer. Essentially we have to adapt Random to read directly from the fuzzer's byte array. We have to be careful when the fuzzer's input is exhausted before we can generate a typed (kvnemesis) input; i.e., it's not generally safe to return a sentinel value [3].

[1] https://llvm.org/docs/LibFuzzer.html
[2] https://github.com/golang/go/blob/fdd7713fe5a7bc054fe08f79a1877f90c7188e53/src/internal/fuzz/mutator.go#L37
[3] https://cockroachlabs.slack.com/archives/C023S0V4YEB/p1750687993416619?thread_ts=1750341116.758639&cid=C023S0V4YEB

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @miraradeva and @stevendanna)

stevendanna

I don't know how the fuzzer actually navigates a coverage area such as kvnemesis, which itself makes random choices.

I don't either, this was based on some assumptions about what might be useful across a couple of different fuzzers that have been discussed, I put an argument for it in the response to one of your other questions but it is a hand-wave for sure. My assumption here is that if we just let the fuzzer choose the seed, then the fuzzer isn't any different than just letting the test run under stress.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @miraradeva)

pkg/util/randutil/rand.go line 301 at r1 (raw file):

Previously, miraradeva (Mira Radeva) wrote…

Fuzzing noob question: one step of the fuzzer is one run of the given test (e.g. kvnemesis), which might consist of multiple random decisions, right? If yes, then what does it mean for "the fuzzer's next step can change a single decision without affecting all previous decisions in the test"?

What I was trying to get at is this. Say that a single run of the test makes 5 random decisions driven from 5*8 byte buffer that (for ease of notation) represents the following uint64s:

[ 45, 67, 12, 56, 12]

My thinking here is that a profile guided fuzzer could then, on its next fuzz iteration, mutate this input such that it looked like:

[ 45, 67, 12, 57, 12]
              ^- mutated

The random choices cause by the common prefix [ 45, 67, 12 would be the same and then the test would diverse. Whereas if we just had the fuzzer choose the seed, then the next fuzz iteration would likely represent a completely different series of choices.

Note, I have no idea if this works out in practice as I don't know how the fuzzers we plan on using work.

pkg/kv/kvnemesis/kvnemesis_test.go line 443 at r1 (raw file):

Previously, miraradeva (Mira Radeva) wrote…

It seems like that can be changed with -fuzztime?

The timeout I was referring to here is this rather unfortunately hard-coded timeout:

https://github.com/golang/go/blob/86fca3dcb63157b8e45e565e821e7fb098fcf368/src/internal/fuzz/worker.go#L492-L494

But, I wrote this before I changed the random source to skip the test on short input, so maybe we won't hit this as much anymore.

pkg/kv/kvnemesis/kvnemesis_test.go line 458 at r1 (raw file):

Previously, miraradeva (Mira Radeva) wrote…

Another fuzzing noob question: it seems important to actually run kvnemesis with the generated randomness source and pass the resulting output to the fuzzer? Why is that exactly? Isn't the output the same as grabbing a random value from the source a fixed number of times? Is the problem that we don't know what that fixed number is?

The f.Add function allows you to add initial "input" into the corpus of tests that the fuzzer will use. This initial input can then be used to get initial coverage information and (depending on the fuzzer) serve as the basis for future, mutated, fuzz inputs. So my idea here was that by allowing the test to just run through completely and recording all the decisions, we would at least have some corpus data where the input byte array was a sufficient length to generate all the decisions kvnemesis needs to make. To be honest I'm not sure how well that works out in practice.

miraradeva

Thanks for the explanations! I think I understand: we essentially take the randomness away from kvnemesis and let the fuzzer fully control it.

Reviewed 3 of 3 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @stevendanna)

pkg/util/randutil/rand.go line 301 at r1 (raw file):

Previously, stevendanna (Steven Danna) wrote…

What I was trying to get at is this. Say that a single run of the test makes 5 random decisions driven from 5*8 byte buffer that (for ease of notation) represents the following uint64s:
[ 45, 67, 12, 56, 12]
My thinking here is that a profile guided fuzzer could then, on its next fuzz iteration, mutate this input such that it looked like:
[ 45, 67, 12, 57, 12]
              ^- mutated
The random choices cause by the common prefix [ 45, 67, 12 would be the same and then the test would diverse. Whereas if we just had the fuzzer choose the seed, then the next fuzz iteration would likely represent a completely different series of choices.

Note, I have no idea if this works out in practice as I don't know how the fuzzers we plan on using work.

Yeah, I get it. We make the random choices explicit to the fuzzer, so it can vary them more granularly. Though maybe it is smart enough to figure out: given random seed X that produces choices [ 45, 67, 12, 56, 12], what random seed would produce choices [ 45, 67, 12, 57, 12].

pkg/kv/kvnemesis/kvnemesis_test.go line 458 at r1 (raw file):

Previously, stevendanna (Steven Danna) wrote…

The f.Add function allows you to add initial "input" into the corpus of tests that the fuzzer will use. This initial input can then be used to get initial coverage information and (depending on the fuzzer) serve as the basis for future, mutated, fuzz inputs. So my idea here was that by allowing the test to just run through completely and recording all the decisions, we would at least have some corpus data where the input byte array was a sufficient length to generate all the decisions kvnemesis needs to make. To be honest I'm not sure how well that works out in practice.

I agree, though it seems like it's really the length of the input array that matters. E.g. if we know exactly how many random choices kvnemesis will make, we can simulate the resulting rndSource.Output() by just drawing that many random values from rndSource. It could speed up the test if it seems like a correct assumption.

stevendanna · 2025-07-01T15:02:37Z

I agree, though it seems like it's really the length of the input array that matters. E.g. if we know exactly how many random choices kvnemesis will make, we can simulate the resulting rndSource.Output() by just drawing that many random values from rndSource. It could speed up the test if it seems like a correct assumption.

Good idea, I'll take a look.

stevendanna · 2025-07-07T23:02:00Z

bors r=miraradeva

craig · 2025-07-07T23:54:03Z

Build succeeded:

stevendanna requested a review from a team as a code owner June 21, 2025 21:45

stevendanna requested a review from srosenberg June 21, 2025 21:50

stevendanna force-pushed the ssd/kvnemesis-fuzz branch from 9a180d0 to 616d692 Compare June 21, 2025 23:55

stevendanna force-pushed the ssd/kvnemesis-fuzz branch from 616d692 to ee29395 Compare June 30, 2025 18:12

stevendanna requested a review from miraradeva June 30, 2025 18:13

miraradeva reviewed Jun 30, 2025

View reviewed changes

srosenberg reviewed Jun 30, 2025

View reviewed changes

stevendanna commented Jul 1, 2025

View reviewed changes

miraradeva approved these changes Jul 1, 2025

View reviewed changes

craig bot merged commit 221c3df into cockroachdb:master Jul 7, 2025
21 of 22 checks passed

celeste-cockroachdb bot added the target-release-25.4.0 label Jul 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

kvnemesis: first step towards a fuzzed KVNemesis #148634

kvnemesis: first step towards a fuzzed KVNemesis #148634

Uh oh!

stevendanna commented Jun 21, 2025

Uh oh!

cockroach-teamcity commented Jun 21, 2025

Uh oh!

stevendanna commented Jun 23, 2025

Uh oh!

stevendanna commented Jun 30, 2025

Uh oh!

miraradeva left a comment

Uh oh!

srosenberg left a comment

Uh oh!

stevendanna left a comment

Uh oh!

miraradeva left a comment

Uh oh!

stevendanna commented Jul 1, 2025

Uh oh!

stevendanna commented Jul 7, 2025

Uh oh!

craig bot commented Jul 7, 2025

Uh oh!

Uh oh!

Uh oh!

kvnemesis: first step towards a fuzzed KVNemesis #148634

kvnemesis: first step towards a fuzzed KVNemesis #148634

Uh oh!

Conversation

stevendanna commented Jun 21, 2025

Uh oh!

cockroach-teamcity commented Jun 21, 2025

Uh oh!

stevendanna commented Jun 23, 2025

Uh oh!

stevendanna commented Jun 30, 2025

Uh oh!

miraradeva left a comment

Choose a reason for hiding this comment

Uh oh!

srosenberg left a comment

Choose a reason for hiding this comment

Uh oh!

stevendanna left a comment

Choose a reason for hiding this comment

Uh oh!

miraradeva left a comment

Choose a reason for hiding this comment

Uh oh!

stevendanna commented Jul 1, 2025

Uh oh!

stevendanna commented Jul 7, 2025

Uh oh!

craig bot commented Jul 7, 2025

Uh oh!

Uh oh!

Uh oh!