Built-in benchmarking support #917

zoubingwu · 2022-03-09T15:35:08Z

Clear and concise description of the problem

Since we could write unit test in source files now with vitest I wish to also write some benchmarks like we do in rust/go. ~~Deno cli will support bench command soon (denoland/deno#13713)~~ Deno added deno bench subcommand recently in 1.20.0 release.

In Node.js most of the project uses a library called benchmark whose last commit was 5 years ago. It works pretty well honestly but things changed a lot since then especially we are now suffering from the pain of esm/cjs interop. With vitest it could provide out-of-box ESM or even TypeScript support to bring a much more friendly dx.

Alternative

No response

Additional context

No response

Validations

Follow our Code of Conduct
Read the Contributing Guidelines.
Read the docs.
Check that there isn't already an issue that request the same feature to avoid creating a duplicate.

The text was updated successfully, but these errors were encountered:

antfu · 2022-03-26T06:53:37Z

I generally like this idea. As I am not very familiar with benchmarking personally, my perspective might be biased or complete wrong. Feel free to correct me or leave your thoughts.

New command `vitest bench`

I think we shouldn't run tests with the bench together, and the bench should be strictly running in serial, not in parallels. Since parallelization could potentially take the computational resources and make the result less accurate. (from what I learned, the ideal situation is that you restart your system, have a minimal number of applications running to provide a clean environment for the bench to run).

`xxxx.bench.js` Convention

So since I propose to run test and bench in different commands, maybe we could also have a different convention than xxx.spec.js and xxx.test.js to be explicitly and for Vitest to grab them more efficiently (configurable). Users could even put them under __bench__ if they want to.

Benchmark guard

This is the field that I am not very sure about. But I imagine ppl who write benchmarks would expect some extent of automation with it. I would think the bench suite would be much more useful if users could provide some "expectation" on it. Like (API design to be discussed)

bench('recursive', () => recursiveFibonacci())
  .operationsPerSecond
  .toBeGreaterThan(10_000)

bench('recursive', () => recursiveFibonacci())
  .operationsPerSecond
  .notSlowerThanPrevious({ tolerance: 0.05 }) // have 5% of tolerance, a bit like snapshot

So the CI could fail then you make changes that affect the performance.

I am not very sure how to make it stable in practice as the result may vary based on different hardware. (Or can we limit the performance of a node worker/thread, or provide some relative data?)

Programmatically benchmark result

I think this could also be useful to ensure that one's implementation is faster than the other

const resultA = bench('recursiveA', () => recursiveA())
const resultB = bench('recursiveB', () => recursiveB())

expect(resultA.operationsPerSecond / resultB.operationsPerSecond)
  .toBeGreatorThan(2) // expect the A version is 2x faster then B

Reports to Markdown

This would be good to have future improvements, but I see it could be useful for some packages to showcase their benchmark in their README. A command to update automatically can help the data up to date

vitest bench --reporter=md --file README.md

poyoho · 2022-03-26T07:44:28Z

I strongly agree with the idea of separating benchmark tests from unit tests. In fact, when I was writing this implementation, I thought that it would not be a good choice if benchmarks were mixed with unit tests. For example, I can't need test coverage results in benchmark tests.

I am not very familiar with benchmarking personally too. But I think it would be nice if you could run multiple different types of benchmarks in parallel to get the results out earlier. (run fib and calc in different processes, I guess it should not affect the test results in different processes. 😀)

import { benchmark } from 'vitest'

benchmark('fib', (bench) => {
  bench('recursive', () => {
    recursiveFibonacci()
  })
  bench('iterative', () => {
    iterativeFibonacci()
  })
})

benchmark('calc', (bench) => {
  bench('add', () => {
    add(Math.random(), Math.random())
  })
  bench('subtraction', () => {
    subtraction(Math.random(), Math.random())
  })
})

So the CI could fail then you make changes that affect the performance.

This is good point. It can even count the performance curve, which is the same as the coverage of unit tests.

patak-dev · 2022-03-26T08:01:00Z

So the CI could fail then you make changes that affect the performance.

I am not very sure how to make it stable in practice as the result may vary based on different hardware. (Or can we limit the performance of a node worker/thread, or provide some relative data?)

Interesting article about relative benchmarking using GitHub actions
https://labs.quansight.org/blog/2021/08/github-actions-benchmarks/

We have seen that GitHub Actions is indeed good enough to perform relative benchmarking in a reliable way as long as several passes are averaged together

It may be expensive to do it in every push if it needs to be averaged to get reliable results, but doing it on demand looks interesting. It may also be good to run the CI on main regularly, every day, or a few times a week, and generate some performance evolution graphs for a library.

antfu · 2022-03-26T08:27:43Z

I imagine we could run a bench fixture from Vitest to get the relative performance of the running hardware, like:

// in vitest, maybe customizable
export const relativeBenchScore = bench('fixture', () => {
  // just for example, to be discussed
  let i = 100_000
  while(i--) {}
}

And then on userland,

bench('recursive', () => recursiveFibonacci())
  .operationsPerSecond
  .toHaveRelativeScoreRange(80, 100) // ratio to `relativeBenchScore`

This way we could reduce the impact of difference on hardware

rockwotj · 2022-03-26T18:05:45Z

I think a microbenchmark addition would be great! As someone who did a lot of microbenchmarking in my previous job (Firebase) - I can say it's a very tricky problem to get correct. Oracle has a good article here describing problems with benchmarking in JIT'd languages. It's talking about Java but I assume node / V8 will try to do the same optimizations.

I'm not really feeling confident about having assertions on performance - I have never seen this actually work well - there is simply too much variation between machines to get this right - this can even vary on having background processing running and potentially stealing CPU time, from the current fragmentation state of your RAM, etc. Everywhere I've seen have great reporting mechanisms and publish graphs of important benchmarks (that run on dedicated machines) and run them at some periodic interval. Here is a benchmark that gRPC publishes as an example.

Even trying to relativize them seems like an impossible to hit target, as this is going to depend on the version of node, the instruction set of the CPU (for example if there is some SIMD calculations that make a particular benchmark faster that may not match the benchmark you're comparing against). One other thing to consider is that the current API is notSlowerThan or moreIterationsThan could easily allow for someone to improve the benchmark but forget to up the benchmark baseline so that a regression to put it back to the old value would not be caught. That being said if people would find assertions useful then by all means don't let me talk you out of it - let's just make sure they are optional 😀

One bit of functionality that is massively helpful is being able to compare two benchmarks to get the relative performance difference for a change to the system. Google Benchmark comes with a script to generate this from two benchmark runs.

Another requirement from my prospective for a benchmarking tool is some support for profiling the benchmark. Thankfully nodejs has some decent builtin support for profiling, but a must is being able to see a flamegraph of what the CPU was doing.

Outside of this, as far as the API here goes, I wholly agree there should be a different command to run benchmarks I don't think it's good practice to run benchmarks and tests at the same time in CI for example.

However in terms of API I personally would rather reusing the existing describe block and have a bench export that is functionally the same as test but is a benchmark. It's important to allow for benchmark setup and have the beforeAll and beforeEach hooks work just like normally - as most benchmarks need setup that isn't executed as part of the code you're measuring.

I'm also not sure why there would need to be a restriction to put these in another file? That seems opposite the rest of the API decisions that vitest has made. I can write my tests inline with source files, but my benchmarks have to be in another file? I would assume that the functions provided to a bench are just only executed in as part of the bench command and ignored otherwise (and test blocks would be ignored as part of benchmarking). This would allow for sharing test fixtures from unit tests and performance tests.

antfu · 2022-03-26T18:30:09Z

Re @rockwotj

Thanks for the great feedback!

assertions on performance

I saw a usage is that ppl using CI to run the same bench on the same machine on PRs that compare with the changes and the main branch. So you can see how the changes in PR affect the performance (as a guard for decision making etc.) /cc @posva IIRC you made one right?

If you think this way would be more reasonable usage?

reusing the existing describe and hooks

That sounds like a good idea to me!

restriction to put these in another file?

No, just an idea. To me, being able to do is different from having to do. Vitest can run tests in the source file, while ppl could choose to not use it and always in separate test files. So yes, it should be possible to write bench and test in the same file, or even in the source. But I think it would be good to also allow them to run in separate files with different file conventions.

zoubingwu · 2022-03-26T19:15:27Z

The benchmark guard part sounds interesting but I'm not sure if we really need that, other parts look pretty good to me.

When we want to write some benchmarking, we are comparing our own implementation with others. They may be totally different ones like swc vs babel, or esbuild vs rollup vs webpack. In that case, we only need those numbers, not assertions on it and we definitely don't want to bring in CI failures.

Or they can be the exact same implementation but with different "versions". If the new version brings a performance regression, most of the time it's because we have no other way around. Leave with those results we can even have a full record to measure the history of performance change, just like how deno did.

Or we know we are doing it wrong, the comparing result is clear enough to show that, and I believe it is better to rely on the results to tell us the difference between those two implementations but not arbitrary assertions because they are just too hard to use in real world. It is very tricky to define how good/bad it should be to pass/fail the CI, too loose makes it pointless and too strict it'll frequently cause CI failures and that would be a very frustrating and annoying thing.

posva · 2022-03-26T20:14:40Z

I saw a usage is that ppl using CI to run the same bench on the same machine on PRs that compare with the changes and the main branch. So you can see how the changes in PR affect the performance (as a guard for decision making etc.) /cc

I only did this for the size comparison but in the end, it was just running one script and outputting the results. I copied the thing from https://github.com/andresz1/size-limit-action and added it to multiple repos like the router with some modifications. The comment feature would only work for PRs of branches on the same repo and not for PRs coming from forks. This is because something changed in the available rights for GitHub actions.

rockwotj · 2022-03-26T21:41:50Z

I think having an official GitHub Action to come with a benchmarking feature in vitest would be wonderful 💯

I do think that's the right place for assertions about regressions however.

Uzlopak · 2022-04-03T22:49:15Z

I did not read the whole thread.

As I wrote in the original PR where it was suggested to join forces on forking benchmark.js and I am open to it.

I improved the code in my work significantly, like ripping out lodash as much as possible atleast in the node-only part. Also integrated the ts-typings in my fork.

I want to add the ability to detect async functions also so that the whole deferred stuff to test async functions does not need to be done manually.

Also I wanted to integrate the PR, which fixes the statistics part.

I added a PR to the original benchmark.js for esm support.

I also wanted to test, if there is a way to spawn node processes to avoid the issue, that benchmarks get different results depending in which order the benchmarks are run, because v8 optimizes and then following benchmarks are using optimized code. Probably would make it necessary to require modules in the setup stage of the benchmark.

Link to my fork

https://github.com/cthulhu-node/benchmark.js

Uzlopak · 2022-04-05T08:38:11Z

I wrote the maintaners of bestiejs on twitter if I could take over the maintenance of the benchmark.js project.

Lets hope for an answer :)

edimitchel · 2022-04-05T12:20:05Z

Let's do tinyetabli? @Aslemammad 😅

Or tinybench?

Aslemammad · 2022-04-05T13:47:54Z

Tinylibs welcome these projects! if anyone's interested (and the team agrees), I can give them access to tinylibs

Uzlopak · 2022-04-05T14:25:23Z

Putting benchmark.js into tinylibs?

edimitchel · 2022-04-07T12:18:11Z

Putting benchmark.js into tinylibs?

Yes, why not if the official one is not supported yet, and we can probably do something to deliver same features with minimum of weight

Uzlopak · 2022-04-20T11:19:22Z

Should I refactor benchmark js further to make it tiny as possible? :D

Aslemammad · 2022-04-21T20:18:10Z

@Uzlopak Yep
The goals are here:
easy API, understandable
tiny, prefer no deps

send me a DM on discord so i can get you an access to tinylibs!

Uzlopak · 2022-04-23T16:09:40Z

I worked further on the benchmark fork
It is still a drop in replacement for the original benchmark.js and because I can not test for browsers, as I use linux and to be honest, i dont want to have the pain to run windows and install a bunch off old and buggy browsers. I made it compatible to node 4 (if you still use old mocha) to ensure some backwards compatibility. But you can run the unit tests in the browser with the test.html. It loads mocha and chai from CDN and in my modern chrome the unit tests pass.

It does not have any production dependency, so no lodash and platform etc..

I am currently implementing a small wrapper for automatically detecting async functions
tinylibs/tinybench.old#19

Still working on it to handle properly error cases.

But yeah, it is kind of tiny

uzlopak@Battlestation:~/workspace/benchmark.js$ npm pack --dry-run
npm notice 
npm notice 📦  benchmark@2.1.4
npm notice === Tarball Contents === 
npm notice 1.1kB  LICENSE         
npm notice 68.0kB benchmark.js    
npm notice 144B   index.js        
npm notice 1.3kB  package.json    
npm notice 2.2kB  README.md       
npm notice 33.4kB types/index.d.ts
npm notice === Tarball Details === 
npm notice name:          benchmark                               
npm notice version:       2.1.4                                   
npm notice filename:      benchmark-2.1.4.tgz                     
npm notice package size:  26.8 kB                                 
npm notice unpacked size: 106.1 kB                                
npm notice shasum:        f7af00d3861cc8db8a3db53cb807216e29d20930
npm notice integrity:     sha512-kaKfVFw/Wi8je[...]8UhptxV9nFTqw==
npm notice total files:   6                                       
npm notice 
benchmark-2.1.4.tgz

The original project is about 1.6 MBy big (lodash + platform.js in node_modules).

Looking for feedback.

Aslemammad · 2022-04-24T01:35:42Z

It's really tiny! gonna give you access to tinylibs!

Uzlopak · 2022-04-24T13:28:07Z

Just a remark:

It should be also configured so that it works with the benchmark github action

https://github.com/benchmark-action/github-action-benchmark

Maybe we should think about this relatively performance comparison. https://labs.quansight.org/blog/2021/08/github-actions-benchmarks/

Uzlopak · 2022-04-29T11:19:03Z

@Aslemammad You requested that I should refactor benchmark.js to typescript. It is a humongous task because the whole code is big mangled mess.

So you either refactor it by yourself or you have to wait few weeks, till I unmangeled the code.

Aslemammad · 2022-04-29T11:20:14Z

@Uzlopak no rush for now, I'd be able to help in the future, we can keep it as is!

zoubingwu · 2022-05-05T05:45:49Z

Just saw a beautiful bench output on twitter.

Really like the idea of showing p75/p99

Uzlopak · 2022-05-05T05:53:02Z

You mean with p75 and p95 and p995 the upper percentile?

I published tinybench.

mangs · 2022-05-14T16:01:17Z

Thank you for working on this 😀

bun's benchmarking output is beautiful and is certainly something to use as a guide. The repo is private for now, so if you want access to bun's repo just visit their Discord by visiting bun.sh

Deno also has really nice benchmark output via their deno bench command. You can see more in their manual here: https://deno.land/manual/tools/benchmarker. It has much different output now than what is mentioned in the first post. Here's a decent real-world example, better than what's in the manual: denoland/deno#14368

Seems that both bun and Deno are using a tool internally called mitata. This could hopefully make your job a little easier.

Those are both good examples of really nice output formatting; this team already does a terrific job of it, so I know you'll do a great job with this topic. I just hope this helps a little.

Uzlopak · 2022-05-17T12:09:01Z

I published tinybench 1.0.2.

To get the same output with 0.75, 0.99 and 0.995 then we would need the corresponding t- and u-table for those.

Do you want to use mitata instead of tinybench?

Aslemammad · 2022-06-24T10:29:12Z

An update on this, I've got plans for tinybench, so I need some time for studying benchmarking so we can get the best experience out of it! After that, I'll try integrating it with vitest!
Thank you all!

Aslemammad · 2022-08-30T03:37:56Z

We've got tinybench here 🎉

Shout out to @poyoho for getting vitest benchmark ready!

mangs · 2022-08-30T17:34:22Z

Woo nice job team! Looking forward to trying it out.

sheremet-va added the enhancement New feature or request label Mar 9, 2022

poyoho mentioned this issue Mar 25, 2022

feat: benchmark #1029

Merged

antfu added the request for comments label Mar 26, 2022

antfu changed the title ~~[feature request] Add benchmarking support~~ Built-in benchmarking support Mar 26, 2022

antfu pinned this issue Mar 26, 2022

mquandalle mentioned this issue Mar 26, 2022

Ajouter des benchmarks publicodes/publicodes#21

Closed

sheremet-va closed this as completed in #1029 Sep 4, 2022

sheremet-va unpinned this issue Sep 4, 2022

github-actions bot locked and limited conversation to collaborators Jun 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Built-in benchmarking support #917

Built-in benchmarking support #917

zoubingwu commented Mar 9, 2022 •

edited

antfu commented Mar 26, 2022

poyoho commented Mar 26, 2022

patak-dev commented Mar 26, 2022

antfu commented Mar 26, 2022 •

edited

rockwotj commented Mar 26, 2022 •

edited

antfu commented Mar 26, 2022

zoubingwu commented Mar 26, 2022 •

edited

posva commented Mar 26, 2022

rockwotj commented Mar 26, 2022

Uzlopak commented Apr 3, 2022 •

edited

Uzlopak commented Apr 5, 2022

edimitchel commented Apr 5, 2022 •

edited

Aslemammad commented Apr 5, 2022

Uzlopak commented Apr 5, 2022

edimitchel commented Apr 7, 2022

Uzlopak commented Apr 20, 2022

Aslemammad commented Apr 21, 2022

Uzlopak commented Apr 23, 2022

Aslemammad commented Apr 24, 2022

Uzlopak commented Apr 24, 2022

Uzlopak commented Apr 29, 2022

Aslemammad commented Apr 29, 2022

zoubingwu commented May 5, 2022 •

edited

Uzlopak commented May 5, 2022

mangs commented May 14, 2022 •

edited

Uzlopak commented May 17, 2022

Aslemammad commented Jun 24, 2022

Aslemammad commented Aug 30, 2022

mangs commented Aug 30, 2022

Built-in benchmarking support #917

Built-in benchmarking support #917

Comments

zoubingwu commented Mar 9, 2022 • edited

Clear and concise description of the problem

Suggested solution

Alternative

Additional context

Validations

antfu commented Mar 26, 2022

New command vitest bench

xxxx.bench.js Convention

Benchmark guard

Programmatically benchmark result

Reports to Markdown

poyoho commented Mar 26, 2022

patak-dev commented Mar 26, 2022

antfu commented Mar 26, 2022 • edited

rockwotj commented Mar 26, 2022 • edited

antfu commented Mar 26, 2022

zoubingwu commented Mar 26, 2022 • edited

posva commented Mar 26, 2022

rockwotj commented Mar 26, 2022

Uzlopak commented Apr 3, 2022 • edited

Uzlopak commented Apr 5, 2022

edimitchel commented Apr 5, 2022 • edited

Aslemammad commented Apr 5, 2022

Uzlopak commented Apr 5, 2022

edimitchel commented Apr 7, 2022

Uzlopak commented Apr 20, 2022

Aslemammad commented Apr 21, 2022

Uzlopak commented Apr 23, 2022

Aslemammad commented Apr 24, 2022

Uzlopak commented Apr 24, 2022

Uzlopak commented Apr 29, 2022

Aslemammad commented Apr 29, 2022

zoubingwu commented May 5, 2022 • edited

Uzlopak commented May 5, 2022

mangs commented May 14, 2022 • edited

Uzlopak commented May 17, 2022

Aslemammad commented Jun 24, 2022

Aslemammad commented Aug 30, 2022

mangs commented Aug 30, 2022

zoubingwu commented Mar 9, 2022 •

edited

New command `vitest bench`

`xxxx.bench.js` Convention

antfu commented Mar 26, 2022 •

edited

rockwotj commented Mar 26, 2022 •

edited

zoubingwu commented Mar 26, 2022 •

edited

Uzlopak commented Apr 3, 2022 •

edited

edimitchel commented Apr 5, 2022 •

edited

zoubingwu commented May 5, 2022 •

edited

mangs commented May 14, 2022 •

edited