Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create benchmarking tools for saving run/measurement data (with Falcon7b example) and model-demo utilities for verifying tokens/perf #9659

Merged
merged 2 commits into from
Jun 26, 2024

Conversation

skhorasganiTT
Copy link
Contributor

Ticket

Problem description

  • Currently, model tests (notably the demos, which are our key deliverables) do not have a standardized way of tracking run and measurement data. There are parallel efforts from the dev-ops and data science teams to bring up tools for monitoring such data, however a set of tools for logging and saving the data in the appropriate formats is required for model developers.

What's changed

  • Created a file models/perf/benchmarking_utils.py containing tools for profiling data and saving CSVs in the appropriate formats (the requirements for these CSVs were specified by the data science team). These tools can be used for any type of test, not only demos. Two CSVs are saved for any run, "run_<start_ts>.csv" and "measurement_<start_ts>.csv".
  • Created a file models/demos/utils/llm_demo_utils.py containing a function which adds demo measurements using the benchmarking tools mentioned above. This is only applicable to model demos and defines certain requirements for data produced by the demos.
  • In the same llm_demo_utils.py file mentioned above, created functions for doing output token verification and output perf verification.
  • Modified the Falcon7b demo to save run/measurement CSVs using the tools mentioned above.
  • Added 128/2k seq len options for the Falcon7b demo (perf-mode)

Checklist

  • Post commit CI passes
  • Model regression CI testing passes (if applicable)
  • New/Existing tests provide coverage for changes

cc @uaydonat

@skhorasganiTT
Copy link
Contributor Author

skhorasganiTT commented Jun 24, 2024

models/demos/utils/llm_demo_utils.py Show resolved Hide resolved
models/demos/utils/llm_demo_utils.py Show resolved Hide resolved
models/demos/utils/llm_demo_utils.py Outdated Show resolved Hide resolved
@tt-rkim
Copy link
Collaborator

tt-rkim commented Jun 25, 2024

Two things before I approve:

  • resolve all convos I started
  • could you please show that these are generated on the demo CI, and upload them as an artifact? generated/benchmarks should be a good path to upload as an artifact

@skhorasganiTT
Copy link
Contributor Author

Have you considered using https://pypi.org/project/pytest-benchmark/ and build extra code on top of it ?

Thanks for sharing that Bill, I hadn't considered it. It seems like a good tool for benchmarking functions/tests, but I don't think it will be easy to integrate with our requirements for the CSVs since we require a very specific format for timestamps (which is why I added the BenchmarkProfiler) and we will need to benchmark blocks of code, not only functions.

Copy link
Contributor

@eyonland eyonland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any tests with the following decorator/annotation tags. We are trying to make sure that all demos collect metrics.
Look for existing tests with these markings.
@pytest.mark.models_device_performance_bare_metal
@pytest.mark.models_performance_bare_metal

Note that we document this here: https://tenstorrent.github.io/tt-metal/latest/ttnn/ttnn/demos.html
See for example:
tests/ttnn/integration_tests/resnet/test_performance.py

@skhorasganiTT
Copy link
Contributor Author

Hi @eyonland, those decorators are used to specify which tests should be included in perf pipelines. The demos belong to a separate pipeline and should not use them. Also, the decorators of existing demo tests is orthogonal to this PR and not affected. The purpose of this PR is to create new benchmarking tools for measuring metrics that will be initially adopted by the demos and subsequently by other tests.

@TT-billteng
Copy link
Collaborator

Have you considered using https://pypi.org/project/pytest-benchmark/ and build extra code on top of it ?

Thanks for sharing that Bill, I hadn't considered it. It seems like a good tool for benchmarking functions/tests, but I don't think it will be easy to integrate with our requirements for the CSVs since we require a very specific format for timestamps (which is why I added the BenchmarkProfiler) and we will need to benchmark blocks of code, not only functions.

you could move those blocks of code into functions :)

@skhorasganiTT
Copy link
Contributor Author

you could move those blocks of code into functions :)

We may want to log many steps (such as in the demo) so making everything a function might be overkill (plus the point about the timestamps).

@eyonland
Copy link
Contributor

Hi @eyonland, those decorators are used to specify which tests should be included in perf pipelines. The demos belong to a separate pipeline and should not use them. Also, the decorators of existing demo tests is orthogonal to this PR and not affected. The purpose of this PR is to create new benchmarking tools for measuring metrics that will be initially adopted by the demos and subsequently by other tests.

Could we have a meeting on this? It sounds like we want to deviate from the expectations of demos as described here: https://tenstorrent.github.io/tt-metal/latest/ttnn/ttnn/demos.html
@uaydonat is this a shift in expectations on demos and how we track their e2e and device performance going forward?

Copy link
Contributor

@eyonland eyonland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skhorasganiTT / @uaydonat , based on our conversation, feel free to add the update to this PR for how this new pipeline fits into the docs here: https://tenstorrent.github.io/tt-metal/latest/ttnn/ttnn/demos.html
Thanks for the clarity on this.

…h Falcon7b example) and model-demo utilities for verifying tokens/perf

Signed-off-by: Salar Hosseini <skhorasgani@tenstorrent.com>
Signed-off-by: Salar Hosseini <skhorasgani@tenstorrent.com>
@skhorasganiTT skhorasganiTT merged commit 336292a into main Jun 26, 2024
5 checks passed
@skhorasganiTT skhorasganiTT deleted the skhorasgani/falcon7b_democsv branch July 10, 2024 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants