Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider running tests in parallel by deafult #15953

Open
matklad opened this issue Jun 4, 2023 · 8 comments
Open

Consider running tests in parallel by deafult #15953

matklad opened this issue Jun 4, 2023 · 8 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@matklad
Copy link
Contributor

matklad commented Jun 4, 2023

Today, when I run the tests using zig test / zig build test, tests inside a single translation unit are being run sequentially:

for (test_fn_list, 0..) |test_fn, i| {

I'd love to see the tests being run in parallel.

There are two motivations here:

Directly, I have 14 cores on the laptop I use, and it'd be nice to use all of them when working on well-tested codebases.

More importantly, indirectly defaults shape testing culture. It's very easy to writhe the first three test such that they are order dependent and can't be run in parallel. And this isn't a problem at three tests. However, once you get to 3 000 tests, this becomes a problem, but at that point fixing the tests requires a lot of effort.

In other words, "my tests can't be run in parallel" is a boiling frog kind of problem, and it would help if the language helped to avert this problem before it becomes big, hence "parallel by default". As a case study, Rust went for parallel tests, and, imo, it had lasting positive effects on the ecosystem.

Of course, some software is fundamentally single threaded and it's important to avoid pessimizing this case, but Zig already have single-threaded builds just for that.

One really hard problem here is what to do with test output (logging or direct use of stderr). If you run a bunch of tests in parallel, and something gets printing to stderr, how do you know which test is responsible? As far as I know, this problem isn't really solvable, and, to get test output, the tests should be run sequentially.

@matu3ba
Copy link
Contributor

matu3ba commented Jun 4, 2023

Of course, some software is fundamentally single threaded and it's important to avoid pessimizing this case, but Zig already have single-threaded builds just for that.

  1. Tests without or with only marginal Kernel interaction (ie allocation) have typically better performance without using another thread. What annotations does Rust offer to accommodate this use case?
  2. Unit tests are typically designed to have as few as possible Kernel interaction (This includes allocations, which are so far not optimized), so what heuristic for when to use another thread is reasonable?

@matklad
Copy link
Contributor Author

matklad commented Jun 4, 2023

Rust doesn’t really allow you controlling whether the tests are threaded or not (there’s a cli option to the test binary controlling how many threads to use, but you can’t say “this test suite requires a single thread”).

Regarding whether it makes sense to use parallelism, I think “always” would be a good choice here: just spawn num_cpus threads, and let them eagerly pick up tests to run by atomically fetch-adding a shared test index. Sorbet’s multiplexJob would be a nice API shape for this task.

@andrewrk andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Jun 4, 2023
@andrewrk andrewrk added this to the 0.12.0 milestone Jun 4, 2023
@andrewrk
Copy link
Member

andrewrk commented Jun 4, 2023

I think this could make a lot of sense especially given that the build runner has become a sort of "driver" for the test runner recently:

fn evalZigTest(
self: *Run,
child: *std.process.Child,
prog_node: *std.Progress.Node,
) !StdIoResult {
const gpa = self.step.owner.allocator;
const arena = self.step.owner.allocator;
var poller = std.io.poll(gpa, enum { stdout, stderr }, .{
.stdout = child.stdout.?,
.stderr = child.stderr.?,
});
defer poller.deinit();
try sendMessage(child.stdin.?, .query_test_metadata);
const Header = std.zig.Server.Message.Header;
const stdout = poller.fifo(.stdout);
const stderr = poller.fifo(.stderr);
var fail_count: u32 = 0;
var skip_count: u32 = 0;
var leak_count: u32 = 0;
var test_count: u32 = 0;
var metadata: ?TestMetadata = null;
var sub_prog_node: ?std.Progress.Node = null;
defer if (sub_prog_node) |*n| n.end();
poll: while (true) {
while (stdout.readableLength() < @sizeOf(Header)) {
if (!(try poller.poll())) break :poll;
}
const header = stdout.reader().readStruct(Header) catch unreachable;
while (stdout.readableLength() < header.bytes_len) {
if (!(try poller.poll())) break :poll;
}
const body = stdout.readableSliceOfLen(header.bytes_len);
switch (header.tag) {
.zig_version => {
if (!std.mem.eql(u8, builtin.zig_version_string, body)) {
return self.step.fail(
"zig version mismatch build runner vs compiler: '{s}' vs '{s}'",
.{ builtin.zig_version_string, body },
);
}
},
.test_metadata => {
const TmHdr = std.zig.Server.Message.TestMetadata;
const tm_hdr = @ptrCast(*align(1) const TmHdr, body);
test_count = tm_hdr.tests_len;
const names_bytes = body[@sizeOf(TmHdr)..][0 .. test_count * @sizeOf(u32)];
const async_frame_lens_bytes = body[@sizeOf(TmHdr) + names_bytes.len ..][0 .. test_count * @sizeOf(u32)];
const expected_panic_msgs_bytes = body[@sizeOf(TmHdr) + names_bytes.len + async_frame_lens_bytes.len ..][0 .. test_count * @sizeOf(u32)];
const string_bytes = body[@sizeOf(TmHdr) + names_bytes.len + async_frame_lens_bytes.len + expected_panic_msgs_bytes.len ..][0..tm_hdr.string_bytes_len];
const names = std.mem.bytesAsSlice(u32, names_bytes);
const async_frame_lens = std.mem.bytesAsSlice(u32, async_frame_lens_bytes);
const expected_panic_msgs = std.mem.bytesAsSlice(u32, expected_panic_msgs_bytes);
const names_aligned = try arena.alloc(u32, names.len);
for (names_aligned, names) |*dest, src| dest.* = src;
const async_frame_lens_aligned = try arena.alloc(u32, async_frame_lens.len);
for (async_frame_lens_aligned, async_frame_lens) |*dest, src| dest.* = src;
const expected_panic_msgs_aligned = try arena.alloc(u32, expected_panic_msgs.len);
for (expected_panic_msgs_aligned, expected_panic_msgs) |*dest, src| dest.* = src;
prog_node.setEstimatedTotalItems(names.len);
metadata = .{
.string_bytes = try arena.dupe(u8, string_bytes),
.names = names_aligned,
.async_frame_lens = async_frame_lens_aligned,
.expected_panic_msgs = expected_panic_msgs_aligned,
.next_index = 0,
.prog_node = prog_node,
};
try requestNextTest(child.stdin.?, &metadata.?, &sub_prog_node);
},
.test_results => {
const md = metadata.?;
const TrHdr = std.zig.Server.Message.TestResults;
const tr_hdr = @ptrCast(*align(1) const TrHdr, body);
fail_count += @boolToInt(tr_hdr.flags.fail);
skip_count += @boolToInt(tr_hdr.flags.skip);
leak_count += @boolToInt(tr_hdr.flags.leak);
if (tr_hdr.flags.fail or tr_hdr.flags.leak) {
const name = std.mem.sliceTo(md.string_bytes[md.names[tr_hdr.index]..], 0);
const msg = std.mem.trim(u8, stderr.readableSlice(0), "\n");
const label = if (tr_hdr.flags.fail) "failed" else "leaked";
if (msg.len > 0) {
try self.step.addError("'{s}' {s}: {s}", .{ name, label, msg });
} else {
try self.step.addError("'{s}' {s}", .{ name, label });
}
stderr.discard(msg.len);
}
try requestNextTest(child.stdin.?, &metadata.?, &sub_prog_node);
},
else => {}, // ignore other messages
}
stdout.discard(body.len);
}
if (stderr.readableLength() > 0) {
const msg = std.mem.trim(u8, try stderr.toOwnedSlice(), "\n");
if (msg.len > 0) try self.step.result_error_msgs.append(arena, msg);
}
// Send EOF to stdin.
child.stdin.?.close();
child.stdin = null;
return .{
.stdout = &.{},
.stderr = &.{},
.stdout_null = true,
.stderr_null = true,
.test_results = .{
.test_count = test_count,
.fail_count = fail_count,
.skip_count = skip_count,
.leak_count = leak_count,
},
.test_metadata = metadata,
};
}

One really hard problem here is what to do with test output (logging or direct use of stderr). If you run a bunch of tests in parallel, and something gets printing to stderr, how do you know which test is responsible? As far as I know, this problem isn't really solvable, and, to get test output, the tests should be run sequentially.

This could be solved in a related way as #1356 which is to execute each test in parallel in a different sub-process. This gives up a bit of efficiency, especially on Windows where spawning child processes is more expensive, but it does enable excellent error reporting, and enable the use case of testing that an expected safety check indeed was triggered.

@matu3ba
Copy link
Contributor

matu3ba commented Jun 4, 2023

This could be solved in a related way as #1356 which is to execute each test in parallel in a different sub-process.

I have unfinished code for this here https://github.com/matu3ba/zig/tree/enable_testing_panics, but for now designed to only to spawn a process, if error.SpawnZigTest is returned and to compare it to data written into tls section (user customized expected panic), so I should be able to bench perf costs until 07.06.2023.

APPENDUM:
Not sure, if you want to spawn all processes in parallel or what the limit should be etc.
UPDATE: Took longer, because personal life.

@matklad
Copy link
Contributor Author

matklad commented Jun 4, 2023

which is to execute each test in parallel in a different sub-process.

Oh, but this is fantastic! With build.zig in control, this can run multiple build/test jobs concurrently without over subscribing the CPU and avoiding make jobserver complexity!

@rohlem
Copy link
Contributor

rohlem commented Jun 7, 2023

Also related is #11080 (global state in testing), which has discussion on randomizing test order (with subprocesses, but only considered sequential execution within one process for now).

@jcalabro
Copy link
Sponsor Contributor

jcalabro commented Jun 8, 2023

I love this proposal! In addition to everything already mentioned, the practice of running tests in parallel also helps in identifying race conditions.

When I develop Go code, I run my tests in parallel with the race detector1, 2 enabled, and often times run all the tests repeatedly in CI. Go uses MemorySanitizer (#1471) to surface detected data races, if any. It has saved my butt more times than I can count.

It looks something like:

go test ./... -race -count=X

There is also a -parallel flag (-p for short) that defaults to GOMAXPROCS and it specifies the number of tests that are allowed to run at once. You can always set it to 1 if you really want tests to be run serially.

From go help testflag:

-parallel n
    Allow parallel execution of test functions that call t.Parallel, and
    fuzz targets that call t.Parallel when running the seed corpus.
    The value of this flag is the maximum number of tests to run simultaneously.
    While fuzzing, the value of this flag is the maximum number of
    subprocesses that may call the fuzz function simultaneously, regardless of
    whether T.Parallel is called. kBy default, -parallel is set to the value of GOMAXPROCS.
    Setting -parallel to values higher than GOMAXPROCS may cause degraded
    performance due to CPU contention, especially when fuzzing.
    Note that -parallel only applies within a single test binary.
    The 'go test' command may run tests for different packages
    in parallel as well, according to the setting of the -p flag
    (see 'go help build').

@matu3ba
Copy link
Contributor

matu3ba commented Jun 9, 2023

For the spawn approach, take a look at #15991. This is not yet parallelized or using a seed to keep it simple.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

5 participants