Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Built-in benchmarking support #917

Closed
4 tasks done
zoubingwu opened this issue Mar 9, 2022 · 29 comments · Fixed by #1029
Closed
4 tasks done

Built-in benchmarking support #917

zoubingwu opened this issue Mar 9, 2022 · 29 comments · Fixed by #1029
Labels
enhancement New feature or request

Comments

@zoubingwu
Copy link

zoubingwu commented Mar 9, 2022

Clear and concise description of the problem

Since we could write unit test in source files now with vitest I wish to also write some benchmarks like we do in rust/go. Deno cli will support bench command soon (denoland/deno#13713) Deno added deno bench subcommand recently in 1.20.0 release.

In Node.js most of the project uses a library called benchmark whose last commit was 5 years ago. It works pretty well honestly but things changed a lot since then especially we are now suffering from the pain of esm/cjs interop. With vitest it could provide out-of-box ESM or even TypeScript support to bring a much more friendly dx.

Suggested solution

Rust uses bench attribute and --bench flag should be passed in or just cargo bench to run it:

// function to benchmark must be annotated with `#[bench]`
#[bench]
fn recursive_fibonacci(b: &mut Bencher) {
    // exact code to benchmark must be passed as a closure to the iter
    // method of Bencher
    b.iter(|| {
        (0..BENCH_SIZE).map(fibonacci).collect::<Vec<u32>>()
    })
}

#[bench]
fn iterative_fibonacci(b: &mut Bencher) {
    b.iter(|| {
        fibonacci_sequence().take(BENCH_SIZE).collect::<Vec<u32>>()
    })
}

Benchmarking is similar to unit testing in Golang, run it with go test -bench=.:

// from fib_test.go
func BenchmarkFib10(b *testing.B) {
        // run the Fib function b.N times
        for n := 0; n < b.N; n++ {
                Fib(10)
        }
}

In deno, see denoland/deno#13713:

// Defaults to 1000 warmup iterations and 1000 bench iterations
Deno.bench("URL parsing", { n: 250, warmup: 100 }, () => {
  new URL("https://deno.land/");  
});


Deno.bench("URL resolving", { n: 1000 }, () => {
  new URL("./foo.js", import.meta.url);
});
$ deno bench --unstable example_bench.ts
running 2 benches from file:///Users/ib/dev/deno/example_bench.ts
bench URL parsing ... 250 iterations 23,063 ns/iter (208..356,041 ns/iter) ok (1s)
bench URL resolving ... 1000 iterations 48,045 ns/iter (250..128,625 ns/iter) ok (4s)

bench result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out (5s)

My first thought was to provide a benchmark function, it works just like describe except it only runs with a bench flag:

import { benchmark } from 'vitest'

benchmark('fib', (bench) => {
  bench('recursive', () => {
    recursiveFibonacci()
  })
  bench('iterative', () => {
    iterativeFibonacci()
  })
})

Also the output could be much prettier and modern comparing to benchmark package.

Alternative

No response

Additional context

No response

Validations

@sheremet-va sheremet-va added the enhancement New feature or request label Mar 9, 2022
@antfu
Copy link
Member

antfu commented Mar 26, 2022

I generally like this idea. As I am not very familiar with benchmarking personally, my perspective might be biased or complete wrong. Feel free to correct me or leave your thoughts.

New command vitest bench

I think we shouldn't run tests with the bench together, and the bench should be strictly running in serial, not in parallels. Since parallelization could potentially take the computational resources and make the result less accurate. (from what I learned, the ideal situation is that you restart your system, have a minimal number of applications running to provide a clean environment for the bench to run).

xxxx.bench.js Convention

So since I propose to run test and bench in different commands, maybe we could also have a different convention than xxx.spec.js and xxx.test.js to be explicitly and for Vitest to grab them more efficiently (configurable). Users could even put them under __bench__ if they want to.

Benchmark guard

This is the field that I am not very sure about. But I imagine ppl who write benchmarks would expect some extent of automation with it. I would think the bench suite would be much more useful if users could provide some "expectation" on it. Like (API design to be discussed)

bench('recursive', () => recursiveFibonacci())
  .operationsPerSecond
  .toBeGreaterThan(10_000) 
bench('recursive', () => recursiveFibonacci())
  .operationsPerSecond
  .notSlowerThanPrevious({ tolerance: 0.05 }) // have 5% of tolerance, a bit like snapshot

So the CI could fail then you make changes that affect the performance.

I am not very sure how to make it stable in practice as the result may vary based on different hardware. (Or can we limit the performance of a node worker/thread, or provide some relative data?)

Programmatically benchmark result

I think this could also be useful to ensure that one's implementation is faster than the other

const resultA = bench('recursiveA', () => recursiveA())
const resultB = bench('recursiveB', () => recursiveB())

expect(resultA.operationsPerSecond / resultB.operationsPerSecond)
  .toBeGreatorThan(2) // expect the A version is 2x faster then B

Reports to Markdown

This would be good to have future improvements, but I see it could be useful for some packages to showcase their benchmark in their README. A command to update automatically can help the data up to date

vitest bench --reporter=md --file README.md

@antfu antfu changed the title [feature request] Add benchmarking support Built-in benchmarking support Mar 26, 2022
@antfu antfu pinned this issue Mar 26, 2022
@poyoho
Copy link
Member

poyoho commented Mar 26, 2022

I strongly agree with the idea of separating benchmark tests from unit tests. In fact, when I was writing this implementation, I thought that it would not be a good choice if benchmarks were mixed with unit tests. For example, I can't need test coverage results in benchmark tests.

I am not very familiar with benchmarking personally too. But I think it would be nice if you could run multiple different types of benchmarks in parallel to get the results out earlier. (run fib and calc in different processes, I guess it should not affect the test results in different processes. 😀)

import { benchmark } from 'vitest'

benchmark('fib', (bench) => {
  bench('recursive', () => {
    recursiveFibonacci()
  })
  bench('iterative', () => {
    iterativeFibonacci()
  })
})

benchmark('calc', (bench) => {
  bench('add', () => {
    add(Math.random(), Math.random())
  })
  bench('subtraction', () => {
    subtraction(Math.random(), Math.random())
  })
})

So the CI could fail then you make changes that affect the performance.

This is good point. It can even count the performance curve, which is the same as the coverage of unit tests.

@patak-dev
Copy link
Member

So the CI could fail then you make changes that affect the performance.

I am not very sure how to make it stable in practice as the result may vary based on different hardware. (Or can we limit the performance of a node worker/thread, or provide some relative data?)

Interesting article about relative benchmarking using GitHub actions
https://labs.quansight.org/blog/2021/08/github-actions-benchmarks/

We have seen that GitHub Actions is indeed good enough to perform relative benchmarking in a reliable way as long as several passes are averaged together

It may be expensive to do it in every push if it needs to be averaged to get reliable results, but doing it on demand looks interesting. It may also be good to run the CI on main regularly, every day, or a few times a week, and generate some performance evolution graphs for a library.

@antfu
Copy link
Member

antfu commented Mar 26, 2022

I imagine we could run a bench fixture from Vitest to get the relative performance of the running hardware, like:

// in vitest, maybe customizable
export const relativeBenchScore = bench('fixture', () => {
  // just for example, to be discussed
  let i = 100_000
  while(i--) {}
}

And then on userland,

bench('recursive', () => recursiveFibonacci())
  .operationsPerSecond
  .toHaveRelativeScoreRange(80, 100) // ratio to `relativeBenchScore`

This way we could reduce the impact of difference on hardware

@rockwotj
Copy link

rockwotj commented Mar 26, 2022

I think a microbenchmark addition would be great! As someone who did a lot of microbenchmarking in my previous job (Firebase) - I can say it's a very tricky problem to get correct. Oracle has a good article here describing problems with benchmarking in JIT'd languages. It's talking about Java but I assume node / V8 will try to do the same optimizations.

I'm not really feeling confident about having assertions on performance - I have never seen this actually work well - there is simply too much variation between machines to get this right - this can even vary on having background processing running and potentially stealing CPU time, from the current fragmentation state of your RAM, etc. Everywhere I've seen have great reporting mechanisms and publish graphs of important benchmarks (that run on dedicated machines) and run them at some periodic interval. Here is a benchmark that gRPC publishes as an example.

Even trying to relativize them seems like an impossible to hit target, as this is going to depend on the version of node, the instruction set of the CPU (for example if there is some SIMD calculations that make a particular benchmark faster that may not match the benchmark you're comparing against). One other thing to consider is that the current API is notSlowerThan or moreIterationsThan could easily allow for someone to improve the benchmark but forget to up the benchmark baseline so that a regression to put it back to the old value would not be caught. That being said if people would find assertions useful then by all means don't let me talk you out of it - let's just make sure they are optional 😀

One bit of functionality that is massively helpful is being able to compare two benchmarks to get the relative performance difference for a change to the system. Google Benchmark comes with a script to generate this from two benchmark runs.

Another requirement from my prospective for a benchmarking tool is some support for profiling the benchmark. Thankfully nodejs has some decent builtin support for profiling, but a must is being able to see a flamegraph of what the CPU was doing.

Outside of this, as far as the API here goes, I wholly agree there should be a different command to run benchmarks I don't think it's good practice to run benchmarks and tests at the same time in CI for example.

However in terms of API I personally would rather reusing the existing describe block and have a bench export that is functionally the same as test but is a benchmark. It's important to allow for benchmark setup and have the beforeAll and beforeEach hooks work just like normally - as most benchmarks need setup that isn't executed as part of the code you're measuring.

I'm also not sure why there would need to be a restriction to put these in another file? That seems opposite the rest of the API decisions that vitest has made. I can write my tests inline with source files, but my benchmarks have to be in another file? I would assume that the functions provided to a bench are just only executed in as part of the bench command and ignored otherwise (and test blocks would be ignored as part of benchmarking). This would allow for sharing test fixtures from unit tests and performance tests.

@antfu
Copy link
Member

antfu commented Mar 26, 2022

Re @rockwotj

Thanks for the great feedback!

assertions on performance

I saw a usage is that ppl using CI to run the same bench on the same machine on PRs that compare with the changes and the main branch. So you can see how the changes in PR affect the performance (as a guard for decision making etc.) /cc @posva IIRC you made one right?

If you think this way would be more reasonable usage?

reusing the existing describe and hooks

That sounds like a good idea to me!

restriction to put these in another file?

No, just an idea. To me, being able to do is different from having to do. Vitest can run tests in the source file, while ppl could choose to not use it and always in separate test files. So yes, it should be possible to write bench and test in the same file, or even in the source. But I think it would be good to also allow them to run in separate files with different file conventions.

@zoubingwu
Copy link
Author

zoubingwu commented Mar 26, 2022

The benchmark guard part sounds interesting but I'm not sure if we really need that, other parts look pretty good to me.

When we want to write some benchmarking, we are comparing our own implementation with others. They may be totally different ones like swc vs babel, or esbuild vs rollup vs webpack. In that case, we only need those numbers, not assertions on it and we definitely don't want to bring in CI failures.

Or they can be the exact same implementation but with different "versions". If the new version brings a performance regression, most of the time it's because we have no other way around. Leave with those results we can even have a full record to measure the history of performance change, just like how deno did.

Or we know we are doing it wrong, the comparing result is clear enough to show that, and I believe it is better to rely on the results to tell us the difference between those two implementations but not arbitrary assertions because they are just too hard to use in real world. It is very tricky to define how good/bad it should be to pass/fail the CI, too loose makes it pointless and too strict it'll frequently cause CI failures and that would be a very frustrating and annoying thing.

@posva
Copy link
Contributor

posva commented Mar 26, 2022

I saw a usage is that ppl using CI to run the same bench on the same machine on PRs that compare with the changes and the main branch. So you can see how the changes in PR affect the performance (as a guard for decision making etc.) /cc

I only did this for the size comparison but in the end, it was just running one script and outputting the results. I copied the thing from https://github.com/andresz1/size-limit-action and added it to multiple repos like the router with some modifications. The comment feature would only work for PRs of branches on the same repo and not for PRs coming from forks. This is because something changed in the available rights for GitHub actions.

@rockwotj
Copy link

I think having an official GitHub Action to come with a benchmarking feature in vitest would be wonderful 💯

I do think that's the right place for assertions about regressions however.

@Uzlopak
Copy link

Uzlopak commented Apr 3, 2022

I did not read the whole thread.

As I wrote in the original PR where it was suggested to join forces on forking benchmark.js and I am open to it.

I improved the code in my work significantly, like ripping out lodash as much as possible atleast in the node-only part. Also integrated the ts-typings in my fork.

I want to add the ability to detect async functions also so that the whole deferred stuff to test async functions does not need to be done manually.

Also I wanted to integrate the PR, which fixes the statistics part.

I added a PR to the original benchmark.js for esm support.

I also wanted to test, if there is a way to spawn node processes to avoid the issue, that benchmarks get different results depending in which order the benchmarks are run, because v8 optimizes and then following benchmarks are using optimized code. Probably would make it necessary to require modules in the setup stage of the benchmark.

Link to my fork

https://github.com/cthulhu-node/benchmark.js

@Uzlopak
Copy link

Uzlopak commented Apr 5, 2022

I wrote the maintaners of bestiejs on twitter if I could take over the maintenance of the benchmark.js project.

Lets hope for an answer :)

@edimitchel
Copy link
Member

edimitchel commented Apr 5, 2022

Let's do tinyetabli? @Aslemammad 😅

Or tinybench?

@Aslemammad
Copy link
Member

Tinylibs welcome these projects! if anyone's interested (and the team agrees), I can give them access to tinylibs

@Uzlopak
Copy link

Uzlopak commented Apr 5, 2022

Putting benchmark.js into tinylibs?

@edimitchel
Copy link
Member

Putting benchmark.js into tinylibs?

Yes, why not if the official one is not supported yet, and we can probably do something to deliver same features with minimum of weight

@Uzlopak
Copy link

Uzlopak commented Apr 20, 2022

Should I refactor benchmark js further to make it tiny as possible? :D

@Aslemammad
Copy link
Member

@Uzlopak Yep
The goals are here:
easy API, understandable
tiny, prefer no deps

send me a DM on discord so i can get you an access to tinylibs!

@Uzlopak
Copy link

Uzlopak commented Apr 23, 2022

I worked further on the benchmark fork
It is still a drop in replacement for the original benchmark.js and because I can not test for browsers, as I use linux and to be honest, i dont want to have the pain to run windows and install a bunch off old and buggy browsers. I made it compatible to node 4 (if you still use old mocha) to ensure some backwards compatibility. But you can run the unit tests in the browser with the test.html. It loads mocha and chai from CDN and in my modern chrome the unit tests pass.

It does not have any production dependency, so no lodash and platform etc..

I am currently implementing a small wrapper for automatically detecting async functions
tinylibs/tinybench.old#19

Still working on it to handle properly error cases.

But yeah, it is kind of tiny

uzlopak@Battlestation:~/workspace/benchmark.js$ npm pack --dry-run
npm notice 
npm notice 📦  benchmark@2.1.4
npm notice === Tarball Contents === 
npm notice 1.1kB  LICENSE         
npm notice 68.0kB benchmark.js    
npm notice 144B   index.js        
npm notice 1.3kB  package.json    
npm notice 2.2kB  README.md       
npm notice 33.4kB types/index.d.ts
npm notice === Tarball Details === 
npm notice name:          benchmark                               
npm notice version:       2.1.4                                   
npm notice filename:      benchmark-2.1.4.tgz                     
npm notice package size:  26.8 kB                                 
npm notice unpacked size: 106.1 kB                                
npm notice shasum:        f7af00d3861cc8db8a3db53cb807216e29d20930
npm notice integrity:     sha512-kaKfVFw/Wi8je[...]8UhptxV9nFTqw==
npm notice total files:   6                                       
npm notice 
benchmark-2.1.4.tgz

The original project is about 1.6 MBy big (lodash + platform.js in node_modules).

Looking for feedback.

@Aslemammad
Copy link
Member

It's really tiny! gonna give you access to tinylibs!

@Uzlopak
Copy link

Uzlopak commented Apr 24, 2022

Just a remark:

It should be also configured so that it works with the benchmark github action

https://github.com/benchmark-action/github-action-benchmark

Maybe we should think about this relatively performance comparison. https://labs.quansight.org/blog/2021/08/github-actions-benchmarks/

@Uzlopak
Copy link

Uzlopak commented Apr 29, 2022

@Aslemammad You requested that I should refactor benchmark.js to typescript. It is a humongous task because the whole code is big mangled mess.

So you either refactor it by yourself or you have to wait few weeks, till I unmangeled the code.

@Aslemammad
Copy link
Member

@Uzlopak no rush for now, I'd be able to help in the future, we can keep it as is!

@zoubingwu
Copy link
Author

zoubingwu commented May 5, 2022

Just saw a beautiful bench output on twitter.

Really like the idea of showing p75/p99

image

@Uzlopak
Copy link

Uzlopak commented May 5, 2022

You mean with p75 and p95 and p995 the upper percentile?

I published tinybench.

@mangs
Copy link

mangs commented May 14, 2022

Thank you for working on this 😀

bun's benchmarking output is beautiful and is certainly something to use as a guide. The repo is private for now, so if you want access to bun's repo just visit their Discord by visiting bun.sh

Deno also has really nice benchmark output via their deno bench command. You can see more in their manual here: https://deno.land/manual/tools/benchmarker. It has much different output now than what is mentioned in the first post. Here's a decent real-world example, better than what's in the manual: denoland/deno#14368

Seems that both bun and Deno are using a tool internally called mitata. This could hopefully make your job a little easier.

Those are both good examples of really nice output formatting; this team already does a terrific job of it, so I know you'll do a great job with this topic. I just hope this helps a little.

@Uzlopak
Copy link

Uzlopak commented May 17, 2022

I published tinybench 1.0.2.

To get the same output with 0.75, 0.99 and 0.995 then we would need the corresponding t- and u-table for those.

Do you want to use mitata instead of tinybench?

@Aslemammad
Copy link
Member

An update on this, I've got plans for tinybench, so I need some time for studying benchmarking so we can get the best experience out of it! After that, I'll try integrating it with vitest!
Thank you all!

@Aslemammad
Copy link
Member

We've got tinybench here 🎉

Shout out to @poyoho for getting vitest benchmark ready!

@mangs
Copy link

mangs commented Aug 30, 2022

Woo nice job team! Looking forward to trying it out.

@sheremet-va sheremet-va unpinned this issue Sep 4, 2022
@github-actions github-actions bot locked and limited conversation to collaborators Jun 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.