Consider running tests in parallel by deafult #15953

matklad · 2023-06-04T13:47:41Z

Today, when I run the tests using zig test / zig build test, tests inside a single translation unit are being run sequentially:

zig/lib/test_runner.zig

Line 145 in 629f0d2

for (test_fn_list, 0..) |test_fn, i| {

I'd love to see the tests being run in parallel.

There are two motivations here:

Directly, I have 14 cores on the laptop I use, and it'd be nice to use all of them when working on well-tested codebases.

More importantly, indirectly defaults shape testing culture. It's very easy to writhe the first three test such that they are order dependent and can't be run in parallel. And this isn't a problem at three tests. However, once you get to 3 000 tests, this becomes a problem, but at that point fixing the tests requires a lot of effort.

In other words, "my tests can't be run in parallel" is a boiling frog kind of problem, and it would help if the language helped to avert this problem before it becomes big, hence "parallel by default". As a case study, Rust went for parallel tests, and, imo, it had lasting positive effects on the ecosystem.

Of course, some software is fundamentally single threaded and it's important to avoid pessimizing this case, but Zig already have single-threaded builds just for that.

One really hard problem here is what to do with test output (logging or direct use of stderr). If you run a bunch of tests in parallel, and something gets printing to stderr, how do you know which test is responsible? As far as I know, this problem isn't really solvable, and, to get test output, the tests should be run sequentially.

The text was updated successfully, but these errors were encountered:

matu3ba · 2023-06-04T21:08:27Z

Of course, some software is fundamentally single threaded and it's important to avoid pessimizing this case, but Zig already have single-threaded builds just for that.

Tests without or with only marginal Kernel interaction (ie allocation) have typically better performance without using another thread. What annotations does Rust offer to accommodate this use case?
Unit tests are typically designed to have as few as possible Kernel interaction (This includes allocations, which are so far not optimized), so what heuristic for when to use another thread is reasonable?

matklad · 2023-06-04T21:48:54Z

Rust doesn’t really allow you controlling whether the tests are threaded or not (there’s a cli option to the test binary controlling how many threads to use, but you can’t say “this test suite requires a single thread”).

Regarding whether it makes sense to use parallelism, I think “always” would be a good choice here: just spawn num_cpus threads, and let them eagerly pick up tests to run by atomically fetch-adding a shared test index. Sorbet’s multiplexJob would be a nice API shape for this task.

andrewrk · 2023-06-04T22:11:26Z

I think this could make a lot of sense especially given that the build runner has become a sort of "driver" for the test runner recently:

zig/lib/std/Build/Step/Run.zig

Lines 936 to 1071 in 7e0a02e

    
           fn evalZigTest( 
        
               self: *Run, 
        
               child: *std.process.Child, 
        
               prog_node: *std.Progress.Node, 
        
           ) !StdIoResult { 
        
               const gpa = self.step.owner.allocator; 
        
               const arena = self.step.owner.allocator; 
        
               var poller = std.io.poll(gpa, enum { stdout, stderr }, .{ 
        
                   .stdout = child.stdout.?, 
        
                   .stderr = child.stderr.?, 
        
               }); 
        
               defer poller.deinit(); 
        
               try sendMessage(child.stdin.?, .query_test_metadata); 
        
               const Header = std.zig.Server.Message.Header; 
        
               const stdout = poller.fifo(.stdout); 
        
               const stderr = poller.fifo(.stderr); 
        
               var fail_count: u32 = 0; 
        
               var skip_count: u32 = 0; 
        
               var leak_count: u32 = 0; 
        
               var test_count: u32 = 0; 
        
               var metadata: ?TestMetadata = null; 
        
               var sub_prog_node: ?std.Progress.Node = null; 
        
               defer if (sub_prog_node) |*n| n.end(); 
        
               poll: while (true) { 
        
                   while (stdout.readableLength() < @sizeOf(Header)) { 
        
                       if (!(try poller.poll())) break :poll; 
        
                   } 
        
                   const header = stdout.reader().readStruct(Header) catch unreachable; 
        
                   while (stdout.readableLength() < header.bytes_len) { 
        
                       if (!(try poller.poll())) break :poll; 
        
                   } 
        
                   const body = stdout.readableSliceOfLen(header.bytes_len); 
        
                   switch (header.tag) { 
        
                       .zig_version => { 
        
                           if (!std.mem.eql(u8, builtin.zig_version_string, body)) { 
        
                               return self.step.fail( 
        
                                   "zig version mismatch build runner vs compiler: '{s}' vs '{s}'", 
        
                                   .{ builtin.zig_version_string, body }, 
        
                               ); 
        
                           } 
        
                       }, 
        
                       .test_metadata => { 
        
                           const TmHdr = std.zig.Server.Message.TestMetadata; 
        
                           const tm_hdr = @ptrCast(*align(1) const TmHdr, body); 
        
                           test_count = tm_hdr.tests_len; 
        
                           const names_bytes = body[@sizeOf(TmHdr)..][0 .. test_count * @sizeOf(u32)]; 
        
                           const async_frame_lens_bytes = body[@sizeOf(TmHdr) + names_bytes.len ..][0 .. test_count * @sizeOf(u32)]; 
        
                           const expected_panic_msgs_bytes = body[@sizeOf(TmHdr) + names_bytes.len + async_frame_lens_bytes.len ..][0 .. test_count * @sizeOf(u32)]; 
        
                           const string_bytes = body[@sizeOf(TmHdr) + names_bytes.len + async_frame_lens_bytes.len + expected_panic_msgs_bytes.len ..][0..tm_hdr.string_bytes_len]; 
        
                           const names = std.mem.bytesAsSlice(u32, names_bytes); 
        
                           const async_frame_lens = std.mem.bytesAsSlice(u32, async_frame_lens_bytes); 
        
                           const expected_panic_msgs = std.mem.bytesAsSlice(u32, expected_panic_msgs_bytes); 
        
                           const names_aligned = try arena.alloc(u32, names.len); 
        
                           for (names_aligned, names) |*dest, src| dest.* = src; 
        
                           const async_frame_lens_aligned = try arena.alloc(u32, async_frame_lens.len); 
        
                           for (async_frame_lens_aligned, async_frame_lens) |*dest, src| dest.* = src; 
        
                           const expected_panic_msgs_aligned = try arena.alloc(u32, expected_panic_msgs.len); 
        
                           for (expected_panic_msgs_aligned, expected_panic_msgs) |*dest, src| dest.* = src; 
        
                           prog_node.setEstimatedTotalItems(names.len); 
        
                           metadata = .{ 
        
                               .string_bytes = try arena.dupe(u8, string_bytes), 
        
                               .names = names_aligned, 
        
                               .async_frame_lens = async_frame_lens_aligned, 
        
                               .expected_panic_msgs = expected_panic_msgs_aligned, 
        
                               .next_index = 0, 
        
                               .prog_node = prog_node, 
        
                           }; 
        
                           try requestNextTest(child.stdin.?, &metadata.?, &sub_prog_node); 
        
                       }, 
        
                       .test_results => { 
        
                           const md = metadata.?; 
        
                           const TrHdr = std.zig.Server.Message.TestResults; 
        
                           const tr_hdr = @ptrCast(*align(1) const TrHdr, body); 
        
                           fail_count += @boolToInt(tr_hdr.flags.fail); 
        
                           skip_count += @boolToInt(tr_hdr.flags.skip); 
        
                           leak_count += @boolToInt(tr_hdr.flags.leak); 
        
                           if (tr_hdr.flags.fail or tr_hdr.flags.leak) { 
        
                               const name = std.mem.sliceTo(md.string_bytes[md.names[tr_hdr.index]..], 0); 
        
                               const msg = std.mem.trim(u8, stderr.readableSlice(0), "\n"); 
        
                               const label = if (tr_hdr.flags.fail) "failed" else "leaked"; 
        
                               if (msg.len > 0) { 
        
                                   try self.step.addError("'{s}' {s}: {s}", .{ name, label, msg }); 
        
                               } else { 
        
                                   try self.step.addError("'{s}' {s}", .{ name, label }); 
        
                               } 
        
                               stderr.discard(msg.len); 
        
                           } 
        
                           try requestNextTest(child.stdin.?, &metadata.?, &sub_prog_node); 
        
                       }, 
        
                       else => {}, // ignore other messages 
        
                   } 
        
                   stdout.discard(body.len); 
        
               } 
        
               if (stderr.readableLength() > 0) { 
        
                   const msg = std.mem.trim(u8, try stderr.toOwnedSlice(), "\n"); 
        
                   if (msg.len > 0) try self.step.result_error_msgs.append(arena, msg); 
        
               } 
        
               // Send EOF to stdin. 
        
               child.stdin.?.close(); 
        
               child.stdin = null; 
        
               return .{ 
        
                   .stdout = &.{}, 
        
                   .stderr = &.{}, 
        
                   .stdout_null = true, 
        
                   .stderr_null = true, 
        
                   .test_results = .{ 
        
                       .test_count = test_count, 
        
                       .fail_count = fail_count, 
        
                       .skip_count = skip_count, 
        
                       .leak_count = leak_count, 
        
                   }, 
        
                   .test_metadata = metadata, 
        
               }; 
        
           }

One really hard problem here is what to do with test output (logging or direct use of stderr). If you run a bunch of tests in parallel, and something gets printing to stderr, how do you know which test is responsible? As far as I know, this problem isn't really solvable, and, to get test output, the tests should be run sequentially.

This could be solved in a related way as #1356 which is to execute each test in parallel in a different sub-process. This gives up a bit of efficiency, especially on Windows where spawning child processes is more expensive, but it does enable excellent error reporting, and enable the use case of testing that an expected safety check indeed was triggered.

matu3ba · 2023-06-04T22:23:40Z

This could be solved in a related way as #1356 which is to execute each test in parallel in a different sub-process.

I have unfinished code for this here https://github.com/matu3ba/zig/tree/enable_testing_panics, but for now designed to only to spawn a process, if error.SpawnZigTest is returned and to compare it to data written into tls section (user customized expected panic), so I should be able to bench perf costs until 07.06.2023.

APPENDUM:
Not sure, if you want to spawn all processes in parallel or what the limit should be etc.
UPDATE: Took longer, because personal life.

matklad · 2023-06-04T22:45:39Z

which is to execute each test in parallel in a different sub-process.

Oh, but this is fantastic! With build.zig in control, this can run multiple build/test jobs concurrently without over subscribing the CPU and avoiding make jobserver complexity!

rohlem · 2023-06-07T06:36:08Z

Also related is #11080 (global state in testing), which has discussion on randomizing test order (with subprocesses, but only considered sequential execution within one process for now).

jcalabro · 2023-06-08T16:46:28Z

I love this proposal! In addition to everything already mentioned, the practice of running tests in parallel also helps in identifying race conditions.

When I develop Go code, I run my tests in parallel with the race detector^{1, 2} enabled, and often times run all the tests repeatedly in CI. Go uses MemorySanitizer (#1471) to surface detected data races, if any. It has saved my butt more times than I can count.

It looks something like:

go test ./... -race -count=X

There is also a -parallel flag (-p for short) that defaults to GOMAXPROCS and it specifies the number of tests that are allowed to run at once. You can always set it to 1 if you really want tests to be run serially.

From go help testflag:

-parallel n
    Allow parallel execution of test functions that call t.Parallel, and
    fuzz targets that call t.Parallel when running the seed corpus.
    The value of this flag is the maximum number of tests to run simultaneously.
    While fuzzing, the value of this flag is the maximum number of
    subprocesses that may call the fuzz function simultaneously, regardless of
    whether T.Parallel is called. kBy default, -parallel is set to the value of GOMAXPROCS.
    Setting -parallel to values higher than GOMAXPROCS may cause degraded
    performance due to CPU contention, especially when fuzzing.
    Note that -parallel only applies within a single test binary.
    The 'go test' command may run tests for different packages
    in parallel as well, according to the setting of the -p flag
    (see 'go help build').

matu3ba · 2023-06-09T23:13:48Z

For the spawn approach, take a look at #15991. This is not yet parallelized or using a seed to keep it simple.

andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Jun 4, 2023

andrewrk added this to the 0.12.0 milestone Jun 4, 2023

rohlem mentioned this issue Jun 7, 2023

Global state creates unpredictable zig test results #11080

Open

kassane mentioned this issue Jun 7, 2023

Zig test - run single-thread (only) kassane/asio-zig#3

Open

andrewrk modified the milestones: 0.13.0, 0.12.0 Jul 9, 2023

matklad mentioned this issue Feb 27, 2024

tests: introduce explicit coverage marks tigerbeetle/tigerbeetle#1607

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider running tests in parallel by deafult #15953

Consider running tests in parallel by deafult #15953

matklad commented Jun 4, 2023

matu3ba commented Jun 4, 2023

matklad commented Jun 4, 2023

andrewrk commented Jun 4, 2023

matu3ba commented Jun 4, 2023 •

edited

matklad commented Jun 4, 2023

rohlem commented Jun 7, 2023 •

edited

jcalabro commented Jun 8, 2023

matu3ba commented Jun 9, 2023

Consider running tests in parallel by deafult #15953

Consider running tests in parallel by deafult #15953

Comments

matklad commented Jun 4, 2023

matu3ba commented Jun 4, 2023

matklad commented Jun 4, 2023

andrewrk commented Jun 4, 2023

matu3ba commented Jun 4, 2023 • edited

matklad commented Jun 4, 2023

rohlem commented Jun 7, 2023 • edited

jcalabro commented Jun 8, 2023

matu3ba commented Jun 9, 2023

matu3ba commented Jun 4, 2023 •

edited

rohlem commented Jun 7, 2023 •

edited