New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Built-in benchmarking support #917
Comments
I generally like this idea. As I am not very familiar with benchmarking personally, my perspective might be biased or complete wrong. Feel free to correct me or leave your thoughts. New command
|
I strongly agree with the idea of separating benchmark tests from unit tests. In fact, when I was writing this implementation, I thought that it would not be a good choice if benchmarks were mixed with unit tests. For example, I can't need test coverage results in benchmark tests. I am not very familiar with benchmarking personally too. But I think it would be nice if you could run multiple different types of benchmarks in parallel to get the results out earlier. (run import { benchmark } from 'vitest'
benchmark('fib', (bench) => {
bench('recursive', () => {
recursiveFibonacci()
})
bench('iterative', () => {
iterativeFibonacci()
})
})
benchmark('calc', (bench) => {
bench('add', () => {
add(Math.random(), Math.random())
})
bench('subtraction', () => {
subtraction(Math.random(), Math.random())
})
})
This is good point. It can even count the performance curve, which is the same as the coverage of unit tests. |
Interesting article about relative benchmarking using GitHub actions
It may be expensive to do it in every push if it needs to be averaged to get reliable results, but doing it on demand looks interesting. It may also be good to run the CI on main regularly, every day, or a few times a week, and generate some performance evolution graphs for a library. |
I imagine we could run a bench fixture from Vitest to get the relative performance of the running hardware, like: // in vitest, maybe customizable
export const relativeBenchScore = bench('fixture', () => {
// just for example, to be discussed
let i = 100_000
while(i--) {}
} And then on userland, bench('recursive', () => recursiveFibonacci())
.operationsPerSecond
.toHaveRelativeScoreRange(80, 100) // ratio to `relativeBenchScore` This way we could reduce the impact of difference on hardware |
I think a microbenchmark addition would be great! As someone who did a lot of microbenchmarking in my previous job (Firebase) - I can say it's a very tricky problem to get correct. Oracle has a good article here describing problems with benchmarking in JIT'd languages. It's talking about Java but I assume node / V8 will try to do the same optimizations. I'm not really feeling confident about having assertions on performance - I have never seen this actually work well - there is simply too much variation between machines to get this right - this can even vary on having background processing running and potentially stealing CPU time, from the current fragmentation state of your RAM, etc. Everywhere I've seen have great reporting mechanisms and publish graphs of important benchmarks (that run on dedicated machines) and run them at some periodic interval. Here is a benchmark that gRPC publishes as an example. Even trying to relativize them seems like an impossible to hit target, as this is going to depend on the version of node, the instruction set of the CPU (for example if there is some SIMD calculations that make a particular benchmark faster that may not match the benchmark you're comparing against). One other thing to consider is that the current API is One bit of functionality that is massively helpful is being able to compare two benchmarks to get the relative performance difference for a change to the system. Google Benchmark comes with a script to generate this from two benchmark runs. Another requirement from my prospective for a benchmarking tool is some support for profiling the benchmark. Thankfully nodejs has some decent builtin support for profiling, but a must is being able to see a flamegraph of what the CPU was doing. Outside of this, as far as the API here goes, I wholly agree there should be a different command to run benchmarks I don't think it's good practice to run benchmarks and tests at the same time in CI for example. However in terms of API I personally would rather reusing the existing I'm also not sure why there would need to be a restriction to put these in another file? That seems opposite the rest of the API decisions that |
Re @rockwotj Thanks for the great feedback!
I saw a usage is that ppl using CI to run the same bench on the same machine on PRs that compare with the changes and the main branch. So you can see how the changes in PR affect the performance (as a guard for decision making etc.) /cc @posva IIRC you made one right? If you think this way would be more reasonable usage?
That sounds like a good idea to me!
No, just an idea. To me, being able to do is different from having to do. Vitest can run tests in the source file, while ppl could choose to not use it and always in separate test files. So yes, it should be possible to write bench and test in the same file, or even in the source. But I think it would be good to also allow them to run in separate files with different file conventions. |
The benchmark guard part sounds interesting but I'm not sure if we really need that, other parts look pretty good to me. When we want to write some benchmarking, we are comparing our own implementation with others. They may be totally different ones like swc vs babel, or esbuild vs rollup vs webpack. In that case, we only need those numbers, not assertions on it and we definitely don't want to bring in CI failures. Or they can be the exact same implementation but with different "versions". If the new version brings a performance regression, most of the time it's because we have no other way around. Leave with those results we can even have a full record to measure the history of performance change, just like how deno did. Or we know we are doing it wrong, the comparing result is clear enough to show that, and I believe it is better to rely on the results to tell us the difference between those two implementations but not arbitrary assertions because they are just too hard to use in real world. It is very tricky to define how good/bad it should be to pass/fail the CI, too loose makes it pointless and too strict it'll frequently cause CI failures and that would be a very frustrating and annoying thing. |
I only did this for the size comparison but in the end, it was just running one script and outputting the results. I copied the thing from https://github.com/andresz1/size-limit-action and added it to multiple repos like the router with some modifications. The comment feature would only work for PRs of branches on the same repo and not for PRs coming from forks. This is because something changed in the available rights for GitHub actions. |
I think having an official GitHub Action to come with a benchmarking feature in vitest would be wonderful 💯 I do think that's the right place for assertions about regressions however. |
I did not read the whole thread. As I wrote in the original PR where it was suggested to join forces on forking benchmark.js and I am open to it. I improved the code in my work significantly, like ripping out lodash as much as possible atleast in the node-only part. Also integrated the ts-typings in my fork. I want to add the ability to detect async functions also so that the whole deferred stuff to test async functions does not need to be done manually. Also I wanted to integrate the PR, which fixes the statistics part. I added a PR to the original benchmark.js for esm support. I also wanted to test, if there is a way to spawn node processes to avoid the issue, that benchmarks get different results depending in which order the benchmarks are run, because v8 optimizes and then following benchmarks are using optimized code. Probably would make it necessary to require modules in the setup stage of the benchmark. Link to my fork |
I wrote the maintaners of bestiejs on twitter if I could take over the maintenance of the benchmark.js project. Lets hope for an answer :) |
Let's do Or |
Tinylibs welcome these projects! if anyone's interested (and the team agrees), I can give them access to tinylibs |
Putting benchmark.js into tinylibs? |
Yes, why not if the official one is not supported yet, and we can probably do something to deliver same features with minimum of weight |
Should I refactor benchmark js further to make it tiny as possible? :D |
@Uzlopak Yep send me a DM on discord so i can get you an access to tinylibs! |
I worked further on the benchmark fork It does not have any production dependency, so no lodash and platform etc.. I am currently implementing a small wrapper for automatically detecting async functions Still working on it to handle properly error cases. But yeah, it is kind of tiny
The original project is about 1.6 MBy big (lodash + platform.js in node_modules). Looking for feedback. |
It's really tiny! gonna give you access to tinylibs! |
Just a remark: It should be also configured so that it works with the benchmark github action https://github.com/benchmark-action/github-action-benchmark Maybe we should think about this relatively performance comparison. https://labs.quansight.org/blog/2021/08/github-actions-benchmarks/ |
@Aslemammad You requested that I should refactor benchmark.js to typescript. It is a humongous task because the whole code is big mangled mess. So you either refactor it by yourself or you have to wait few weeks, till I unmangeled the code. |
@Uzlopak no rush for now, I'd be able to help in the future, we can keep it as is! |
Just saw a beautiful bench output on twitter. Really like the idea of showing p75/p99 |
You mean with p75 and p95 and p995 the upper percentile? I published tinybench. |
Thank you for working on this 😀 bun's benchmarking output is beautiful and is certainly something to use as a guide. The repo is private for now, so if you want access to bun's repo just visit their Discord by visiting bun.sh Deno also has really nice benchmark output via their Seems that both bun and Deno are using a tool internally called mitata. This could hopefully make your job a little easier. Those are both good examples of really nice output formatting; this team already does a terrific job of it, so I know you'll do a great job with this topic. I just hope this helps a little. |
I published tinybench 1.0.2. To get the same output with 0.75, 0.99 and 0.995 then we would need the corresponding t- and u-table for those. Do you want to use mitata instead of tinybench? |
An update on this, I've got plans for tinybench, so I need some time for studying benchmarking so we can get the best experience out of it! After that, I'll try integrating it with vitest! |
Woo nice job team! Looking forward to trying it out. |
Clear and concise description of the problem
Since we could write unit test in source files now with vitest I wish to also write some benchmarks like we do in rust/go.
Deno cli will support bench command soon (denoland/deno#13713)Deno addeddeno bench
subcommand recently in 1.20.0 release.In Node.js most of the project uses a library called benchmark whose last commit was 5 years ago. It works pretty well honestly but things changed a lot since then especially we are now suffering from the pain of esm/cjs interop. With vitest it could provide out-of-box ESM or even TypeScript support to bring a much more friendly dx.
Suggested solution
Rust uses bench attribute and
--bench
flag should be passed in or justcargo bench
to run it:Benchmarking is similar to unit testing in Golang, run it with
go test -bench=.
:In deno, see denoland/deno#13713:
My first thought was to provide a
benchmark
function, it works just likedescribe
except it only runs with a bench flag:Also the output could be much prettier and modern comparing to
benchmark
package.Alternative
No response
Additional context
No response
Validations
The text was updated successfully, but these errors were encountered: