Create benchmarking tools for saving run/measurement data (with Falcon7b example) and model-demo utilities for verifying tokens/perf #9659

skhorasganiTT · 2024-06-24T21:00:40Z

Ticket

Problem description

Currently, model tests (notably the demos, which are our key deliverables) do not have a standardized way of tracking run and measurement data. There are parallel efforts from the dev-ops and data science teams to bring up tools for monitoring such data, however a set of tools for logging and saving the data in the appropriate formats is required for model developers.

What's changed

Created a file models/perf/benchmarking_utils.py containing tools for profiling data and saving CSVs in the appropriate formats (the requirements for these CSVs were specified by the data science team). These tools can be used for any type of test, not only demos. Two CSVs are saved for any run, "run_<start_ts>.csv" and "measurement_<start_ts>.csv".
Created a file models/demos/utils/llm_demo_utils.py containing a function which adds demo measurements using the benchmarking tools mentioned above. This is only applicable to model demos and defines certain requirements for data produced by the demos.
In the same llm_demo_utils.py file mentioned above, created functions for doing output token verification and output perf verification.
Modified the Falcon7b demo to save run/measurement CSVs using the tools mentioned above.
Added 128/2k seq len options for the Falcon7b demo (perf-mode)

Checklist

Post commit CI passes
Model regression CI testing passes (if applicable)
New/Existing tests provide coverage for changes

cc @uaydonat

skhorasganiTT · 2024-06-24T21:01:27Z

T3k demo tests - https://github.com/tenstorrent/tt-metal/actions/runs/9668590898
All post-commit tests - https://github.com/tenstorrent/tt-metal/actions/runs/9650060689
Docs tests - https://github.com/tenstorrent/tt-metal/actions/runs/9685787015

models/demos/utils/llm_demo_utils.py

tt-rkim · 2024-06-25T13:54:30Z

Two things before I approve:

resolve all convos I started
could you please show that these are generated on the demo CI, and upload them as an artifact? generated/benchmarks should be a good path to upload as an artifact

skhorasganiTT · 2024-06-25T19:33:32Z

Have you considered using https://pypi.org/project/pytest-benchmark/ and build extra code on top of it ?

Thanks for sharing that Bill, I hadn't considered it. It seems like a good tool for benchmarking functions/tests, but I don't think it will be easy to integrate with our requirements for the CSVs since we require a very specific format for timestamps (which is why I added the BenchmarkProfiler) and we will need to benchmark blocks of code, not only functions.

eyonland

I don't see any tests with the following decorator/annotation tags. We are trying to make sure that all demos collect metrics.
Look for existing tests with these markings.
@pytest.mark.models_device_performance_bare_metal
@pytest.mark.models_performance_bare_metal

Note that we document this here: https://tenstorrent.github.io/tt-metal/latest/ttnn/ttnn/demos.html
See for example:
tests/ttnn/integration_tests/resnet/test_performance.py

skhorasganiTT · 2024-06-25T21:13:02Z

Hi @eyonland, those decorators are used to specify which tests should be included in perf pipelines. The demos belong to a separate pipeline and should not use them. Also, the decorators of existing demo tests is orthogonal to this PR and not affected. The purpose of this PR is to create new benchmarking tools for measuring metrics that will be initially adopted by the demos and subsequently by other tests.

TT-billteng · 2024-06-25T22:04:23Z

Have you considered using https://pypi.org/project/pytest-benchmark/ and build extra code on top of it ?

Thanks for sharing that Bill, I hadn't considered it. It seems like a good tool for benchmarking functions/tests, but I don't think it will be easy to integrate with our requirements for the CSVs since we require a very specific format for timestamps (which is why I added the BenchmarkProfiler) and we will need to benchmark blocks of code, not only functions.

you could move those blocks of code into functions :)

skhorasganiTT · 2024-06-25T22:13:34Z

you could move those blocks of code into functions :)

We may want to log many steps (such as in the demo) so making everything a function might be overkill (plus the point about the timestamps).

eyonland · 2024-06-26T14:38:59Z

Hi @eyonland, those decorators are used to specify which tests should be included in perf pipelines. The demos belong to a separate pipeline and should not use them. Also, the decorators of existing demo tests is orthogonal to this PR and not affected. The purpose of this PR is to create new benchmarking tools for measuring metrics that will be initially adopted by the demos and subsequently by other tests.

Could we have a meeting on this? It sounds like we want to deviate from the expectations of demos as described here: https://tenstorrent.github.io/tt-metal/latest/ttnn/ttnn/demos.html
@uaydonat is this a shift in expectations on demos and how we track their e2e and device performance going forward?

eyonland

@skhorasganiTT / @uaydonat , based on our conversation, feel free to add the update to this PR for how this new pipeline fits into the docs here: https://tenstorrent.github.io/tt-metal/latest/ttnn/ttnn/demos.html
Thanks for the clarity on this.

…h Falcon7b example) and model-demo utilities for verifying tokens/perf Signed-off-by: Salar Hosseini <skhorasgani@tenstorrent.com>

Signed-off-by: Salar Hosseini <skhorasgani@tenstorrent.com>

docs/source/ttnn/ttnn/demos.rst

skhorasganiTT requested review from boris-drazic, tt-rkim, djordje-tt, uaydonat, pavlejosipovic, pavlepopovic, s-jovic, eyonland, arakhmati, cfjchu, xanderchin and ttmchiou as code owners June 24, 2024 21:00

skhorasganiTT force-pushed the skhorasgani/falcon7b_democsv branch from 3bddfa4 to a815bfc Compare June 24, 2024 22:08

skhorasganiTT temporarily deployed to dev June 24, 2024 22:09 — with GitHub Actions Inactive

skhorasganiTT had a problem deploying to dev June 24, 2024 22:13 — with GitHub Actions Failure

skhorasganiTT force-pushed the skhorasgani/falcon7b_democsv branch from a815bfc to bb2f7c4 Compare June 25, 2024 00:00

skhorasganiTT temporarily deployed to dev June 25, 2024 00:02 — with GitHub Actions Inactive

skhorasganiTT had a problem deploying to dev June 25, 2024 00:06 — with GitHub Actions Failure

skhorasganiTT temporarily deployed to dev June 25, 2024 04:55 — with GitHub Actions Inactive

skhorasganiTT had a problem deploying to dev June 25, 2024 04:55 — with GitHub Actions Failure

skhorasganiTT force-pushed the skhorasgani/falcon7b_democsv branch from bb2f7c4 to 72e3c2e Compare June 25, 2024 13:10

skhorasganiTT temporarily deployed to dev June 25, 2024 13:11 — with GitHub Actions Inactive

skhorasganiTT had a problem deploying to dev June 25, 2024 13:15 — with GitHub Actions Failure

tt-rkim requested a review from TT-billteng June 25, 2024 13:46

tt-rkim reviewed Jun 25, 2024

View reviewed changes

models/demos/utils/llm_demo_utils.py Show resolved Hide resolved

models/demos/utils/llm_demo_utils.py Show resolved Hide resolved

models/demos/utils/llm_demo_utils.py Outdated Show resolved Hide resolved

skhorasganiTT force-pushed the skhorasgani/falcon7b_democsv branch from 72e3c2e to 7e10863 Compare June 25, 2024 13:58

skhorasganiTT temporarily deployed to dev June 25, 2024 13:59 — with GitHub Actions Inactive

skhorasganiTT had a problem deploying to dev June 25, 2024 14:03 — with GitHub Actions Failure

skhorasganiTT requested a review from uaydonat June 25, 2024 19:35

skhorasganiTT force-pushed the skhorasgani/falcon7b_democsv branch from 358314e to 389d5f1 Compare June 25, 2024 19:37

skhorasganiTT temporarily deployed to dev June 25, 2024 19:39 — with GitHub Actions Inactive

skhorasganiTT temporarily deployed to dev June 25, 2024 19:47 — with GitHub Actions Inactive

tt-rkim approved these changes Jun 25, 2024

View reviewed changes

uaydonat approved these changes Jun 25, 2024

View reviewed changes

eyonland requested changes Jun 25, 2024

View reviewed changes

skhorasganiTT requested a review from eyonland June 25, 2024 21:14

TT-billteng approved these changes Jun 25, 2024

View reviewed changes

arakhmati approved these changes Jun 26, 2024

View reviewed changes

ttmchiou approved these changes Jun 26, 2024

View reviewed changes

eyonland approved these changes Jun 26, 2024

View reviewed changes

skhorasganiTT requested a review from ayerofieiev-tt as a code owner June 26, 2024 19:42

skhorasganiTT temporarily deployed to dev June 26, 2024 19:44 — with GitHub Actions Inactive

skhorasganiTT temporarily deployed to dev June 26, 2024 19:48 — with GitHub Actions Inactive

skhorasganiTT added 2 commits June 26, 2024 16:07

#9648: Create benchmarking tools for saving run/measurement data (wit…

f0fb140

…h Falcon7b example) and model-demo utilities for verifying tokens/perf Signed-off-by: Salar Hosseini <skhorasgani@tenstorrent.com>

#0: Add description of demo tests/pipelines on ttnn docs

836514a

Signed-off-by: Salar Hosseini <skhorasgani@tenstorrent.com>

skhorasganiTT force-pushed the skhorasgani/falcon7b_democsv branch from c7ac40c to 836514a Compare June 26, 2024 20:07

ayerofieiev-tt reviewed Jun 26, 2024

View reviewed changes

docs/source/ttnn/ttnn/demos.rst Show resolved Hide resolved

skhorasganiTT merged commit 336292a into main Jun 26, 2024
5 checks passed

skhorasganiTT deleted the skhorasgani/falcon7b_democsv branch July 10, 2024 17:20

skhorasganiTT mentioned this pull request Jul 24, 2024

Generate benchmarking data for all model demo tests #10677

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create benchmarking tools for saving run/measurement data (with Falcon7b example) and model-demo utilities for verifying tokens/perf #9659

Create benchmarking tools for saving run/measurement data (with Falcon7b example) and model-demo utilities for verifying tokens/perf #9659

skhorasganiTT commented Jun 24, 2024

skhorasganiTT commented Jun 24, 2024 •

edited

Loading

tt-rkim commented Jun 25, 2024

skhorasganiTT commented Jun 25, 2024

eyonland left a comment

skhorasganiTT commented Jun 25, 2024

TT-billteng commented Jun 25, 2024

skhorasganiTT commented Jun 25, 2024

eyonland commented Jun 26, 2024

eyonland left a comment

Create benchmarking tools for saving run/measurement data (with Falcon7b example) and model-demo utilities for verifying tokens/perf #9659

Create benchmarking tools for saving run/measurement data (with Falcon7b example) and model-demo utilities for verifying tokens/perf #9659

Conversation

skhorasganiTT commented Jun 24, 2024

Ticket

Problem description

What's changed

Checklist

skhorasganiTT commented Jun 24, 2024 • edited Loading

tt-rkim commented Jun 25, 2024

skhorasganiTT commented Jun 25, 2024

eyonland left a comment

Choose a reason for hiding this comment

skhorasganiTT commented Jun 25, 2024

TT-billteng commented Jun 25, 2024

skhorasganiTT commented Jun 25, 2024

eyonland commented Jun 26, 2024

eyonland left a comment

Choose a reason for hiding this comment

skhorasganiTT commented Jun 24, 2024 •

edited

Loading