Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate CI for Running Benchmarks #436

Closed
tusharmath opened this issue Oct 7, 2023 · 24 comments
Closed

Integrate CI for Running Benchmarks #436

tusharmath opened this issue Oct 7, 2023 · 24 comments
Assignees
Labels
💎 Bounty type: chore Routine tasks like conversions, reorganization, and maintenance work.

Comments

@tusharmath
Copy link
Contributor

tusharmath commented Oct 7, 2023

Overview

We need to enhance our CI pipeline to not only run our existing Criterion benchmarks but also ensure that the build time stays within acceptable limits. Moreover, a clear and concise report should be generated and published on the PR for quick and easy understanding.

Requirements

  1. Run Criterion Benchmarks: Our CI should execute the existing Criterion benchmarks as part of the pipeline.
  2. Fail on Degradation: The CI should fail if there's a performance degradation when compared to the main branch.
  3. Build Time Check: The CI should compare the build time of the current PR with the average build time of the last few successful builds on the main branch. If the build time increases by more than 10%, the CI should fail.
  4. Benchmark Report: A tabular comparison of benchmarks should be published on the PR, similar to the code-cov report. It should display the current benchmark results alongside the results from the main branch for easy comparison.
  5. Conditional Execution: Benchmarks and build time checks should only be executed when the benchmark label is added to a PR or commit.
  6. Support for Forks: The CI process should be designed to run benchmarks on forks as well.

Rationale

Performance is paramount for our project. It's essential to ensure that any changes do not adversely affect either the runtime performance (tracked by benchmarks) or the build time. By integrating these checks into our CI, we can maintain a high standard of performance and ensure that contributors receive immediate feedback on any potential issues.

@tusharmath tusharmath changed the title CI for Benchmarks Integrate CI for Running Benchmarks Oct 7, 2023
@tusharmath
Copy link
Contributor Author

/bounty 150$

@algora-pbc
Copy link

algora-pbc bot commented Oct 7, 2023

💎 $150 bounty created by tailcallhq
🙋 If you'd like to work on this issue, comment below to get assigned
👉 To claim this bounty, submit a pull request that includes the text /claim #436 somewhere in its body
📝 Before proceeding, please make sure you can receive payouts in your country
💵 Payment arrives in your account 2-5 days after the bounty is rewarded
💯 You keep 100% of the bounty award
🙏 Thank you for contributing to tailcallhq/tailcall!
🙋‍♂️ Join our discord channel if you need help.

@5war00p
Copy link

5war00p commented Oct 7, 2023

/attempt #436

Options

@algora-pbc
Copy link

algora-pbc bot commented Oct 8, 2023

The bounty is up for grabs! Everyone is welcome to /attempt #436 🙌

@algora-pbc
Copy link

algora-pbc bot commented Oct 8, 2023

Note: The user @cheikh2shift is already attempting to complete issue #436 and claim the bounty. If you attempt to complete the same issue, there is a chance that @cheikh2shift will complete the issue first, and be awarded the bounty. We recommend discussing with @cheikh2shift and potentially collaborating on the same solution versus creating an alternate solution.

@algora-pbc
Copy link

algora-pbc bot commented Oct 9, 2023

The bounty is up for grabs! Everyone is welcome to /attempt #436 🙌

@neo773
Copy link
Contributor

neo773 commented Oct 9, 2023

/attempt #436

@meskill
Copy link
Contributor

meskill commented Oct 10, 2023

Using Github Actions to measure performance could be quite unreliable due the way they are executing. That fact even mentioned in the criterion's FAQ

@tusharmath
Copy link
Contributor Author

Interesting @meskill. Have you used iai? It seems like it's not actively maintained.

@meskill
Copy link
Contributor

meskill commented Oct 10, 2023

Interesting @meskill. Have you used iai? It seems like it's not actively maintained.

I have only heard about this kind of benchmarking tools, but never used them. Indeed the project seems abandoned, but the question has links to other projects that look more wealthy.

@tusharmath
Copy link
Contributor Author

This seems like an alternative to iai: https://github.com/Joining7943/iai-callgrind

@shambu2k
Copy link

shambu2k commented Oct 22, 2023

/attempt #436

Options

@tusharmath
Copy link
Contributor Author

The approach that I have in mind is to use iai kinda tools to identify the following —

Instructions:                1733
  L1 Hits:                     2358
  L2 Hits:                        0
  RAM Hits:                       3
  Total read+write:            2361
  Estimated Cycles:            2463

Since these are exact values, whenever there is an increase in any of the parameters, we could fail the build.

@algora-pbc
Copy link

algora-pbc bot commented Oct 23, 2023

The bounty is up for grabs! Everyone is welcome to /attempt #436 🙌

@alankritdabral
Copy link
Contributor

/attempt #436

Copy link

algora-pbc bot commented Nov 4, 2023

@alankritdabral: The Tailcall Inc. team prefers to assign a single contributor to the issue rather than let anyone attempt it right away. We recommend waiting for a confirmation from a member before getting started.

@tusharmath
Copy link
Contributor Author

@alankritdabral would you be taking the approach that I have described above using IAI?

@alankritdabral
Copy link
Contributor

Yes i am taking the same approah @tusharmath

@alankritdabral
Copy link
Contributor

alankritdabral commented Nov 5, 2023

@tusharmath I am thinking to change each bench file from Criterion->iai-callgrind code so i can have accurate data to compare :
Just wanted to know if you agree with the approach.
benches/json_like_bench.rs
Original code:

use criterion::{black_box, criterion_group, criterion_main, Criterion};
use serde_json::json;

fn benchmark_batched_body(c: &mut Criterion) {
  c.bench_function("test_batched_body", |b| {
    b.iter(|| {
      let input = json!({
          "data": [
              {"user": {"id": "1"}},
              {"user": {"id": "2"}},
              {"user": {"id": "3"}},
              {"user": [
                  {"id": "4"},
                  {"id": "5"}
                  ]
              },
          ]
      });

      black_box(
        serde_json::to_value(tailcall::json::gather_path_matches(
          &input,
          &["data".into(), "user".into(), "id".into()],
          vec![],
        ))
        .unwrap(),
      );
    })
  });
}

criterion_group!(benches, benchmark_batched_body);
criterion_main!(benches);

Edited Code:

use iai_callgrind::{black_box, library_benchmark, library_benchmark_group, main};

fn gather_path_matches(input: &serde_json::Value, path: &[&str]) -> Option<serde_json::Value> {
  let mut current = input;
  for key in path {
      current = match current.get(key) {
          Some(value) => value,
          None => return None, // Handle the case where the key doesn't exist
      };
  }
  Some(current.clone())
}


#[library_benchmark]

fn benchmark_batched_body() {
  let input = json!({
      "data": [
          {"user": {"id": "1"}},
          {"user": {"id": "2"}},
          {"user": {"id": "3"}},
          {"user": [
              {"id": "4"},
              {"id": "5"}
              ]
          },
      ]
  });

  black_box(gather_path_matches(&input, &["data", "user", "id"]));
}


library_benchmark_group!(
    name= batched_body; 
    benchmarks= benchmark_batched_body);

main!(library_benchmark_groups = batched_body);

Results:

~/Desktop/git/tailcall$ cargo bench --bench json_like_bench
Compiling tailcall v0.1.0 (/home/aloo/Desktop/git/tailcall)
Finished bench [optimized] target(s) in 5.80s
Running benches/json_like_bench.rs (target/release/deps/json_like_bench-6b149ebecce48594)
json_like_bench::batched_body::benchmark_batched_body

Instructions:               10798 (-89.16146%)
 L1 Hits:                    15605 (-87.84515%)
 L2 Hits:                        1 (-99.35065%)
 RAM Hits:                     137 (-70.02188%)
 Total read+write:           15743 (-87.79575%)
 Estimated Cycles:           20405 (-85.94213%)



@tusharmath
Copy link
Contributor Author

@alankritdabral As discussed on discord, we need both kind of benchmarks. They both signal orthogonal things. And it's possible that reduction in instructions still results in a performance regression.

@tusharmath
Copy link
Contributor Author

fixed in #762

@github-actions github-actions bot added the type: chore Routine tasks like conversions, reorganization, and maintenance work. label Dec 31, 2023
@epompeii
Copy link
Contributor

epompeii commented Feb 8, 2024

@tusharmath I'm a little late to the party here, but down the road, if you want a more robust continuous benchmarking solution, you might consider checking out Bencher: https://github.com/bencherdev/bencher

@tusharmath
Copy link
Contributor Author

@tusharmath I'm a little late to the party here, but down the road, if you want a more robust continuous benchmarking solution, you might consider checking out Bencher: https://github.com/bencherdev/bencher

This is a nice tool. We are a small team so we take help from contributors to things done. Happy to add a bounty for integrating the tool into our ci.

@epompeii
Copy link
Contributor

epompeii commented Feb 8, 2024

Thank you for the kind words! I would be more than happy to help with the integration. 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment