Skip to content

Comments

Use stateful API in fabric worker sync path#19440

Merged
yugaoTT merged 15 commits intomainfrom
yugao/noc2
Mar 22, 2025
Merged

Use stateful API in fabric worker sync path#19440
yugaoTT merged 15 commits intomainfrom
yugao/noc2

Conversation

@yugaoTT
Copy link
Contributor

@yugaoTT yugaoTT commented Mar 20, 2025

@yugaoTT yugaoTT self-assigned this Mar 20, 2025
@yugaoTT yugaoTT force-pushed the yugao/noc2 branch 3 times, most recently from ea4601c to 922f2c2 Compare March 21, 2025 18:10
@yugaoTT yugaoTT marked this pull request as ready for review March 21, 2025 18:10
Copy link
Contributor

@SeanNijjar SeanNijjar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good - minor suggestion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should take the write_cmd_buf here so this can be useable by the write to local noc and the forward to next EDM paths (I believe those use different cmd bufs atm?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean expose the cmd_buf as input arg?
forward to next edm is not using this api (use the stateful one).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that's what I meant. And good point, I forgot about that detail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to highlight that on BH we only do inline writes to stream registers. Due to HW bug inline writes to L1 are emulated by first writing the value into src L1 and then doing a Noc async write

@yugaoTT yugaoTT added this pull request to the merge queue Mar 22, 2025
Merged via the queue into main with commit 2f53464 Mar 22, 2025
30 of 32 checks passed
@yugaoTT yugaoTT deleted the yugao/noc2 branch March 22, 2025 14:21
github-merge-queue bot pushed a commit that referenced this pull request Mar 22, 2025
update fabric perf tests to use IRAM by defaults and update targets
accordingly.

Note this has additional changes which are from this pr:
#19440

I have built this branch on top because that will be merged soon and it
updates the targets

Additionally, several additional unicast tests were added (and a small
performance bug in the test kernel resolved).

New numbers are as follows (for 4k packet size):
| Test Case | Bandwidth GB/s/dir | Comment |
|-|-|-|
|
![image](https://github.com/user-attachments/assets/ed373382-91f5-4748-b9ed-613e96590e6f)
| 11.03 | This was the old unicast test but performance has improved due
to fix in worker kernel, unicast 1 hop |
|
![image](https://github.com/user-attachments/assets/16bf7606-62b5-4cf2-bbd6-06c30c8fc272)
| 9.74 | Unicast, multiple producers/dir, multiple hops, bidir |
|
![image](https://github.com/user-attachments/assets/9d65f78c-cdb9-4a87-ad59-abadeecf3cf7)
| 9.35 | Currently unclear why this performs worse than above. There
should be enough buffering for latency hiding but maybe not | Unicast,
single producer/dir, multiple hops, bidir |


### Checklist
- [x] [All post
commit](https://github.com/tenstorrent/tt-metal/actions/workflows/all-post-commit-workflows.yaml)
CI: https://github.com/tenstorrent/tt-metal/actions/runs/14010116751
- [x] New/Existing tests provide coverage for changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants