v0.89.0
What's New
- Circular buffer allocation is now tracked per-core for more accurate L1 pressure analysis, with support for modern core-range formats
- Drastically improved performance on tensor per core allocation rendering and reporting with backwards compatibility support
- Performance table columns can be shown or hidden on demand, with footer totals that stay consistent through column toggles
- Performance table reload is smoother and preserves table state between data refreshes
- Stack trace click-throughs are now available in additional contexts across the visualizer
- Multi-host profiler reports are supported with or without rank suffixes on descriptor files
- Linked operation nodes can be located directly from the operation graph, and duplicate edges between the same endpoints are now collapsed
- Remote sync surfaces clearer errors and falls back to scp when sftp transfer fails
What's Changed
- Improve sync error messaging and add scp fallback by @dcblundell in #1623
- Remove uvicorn dep by @mert-kurttutan in #1627
- Add linked node locate capability by @aidemsined in #1620
- Deduplicate displayed edges by endpoint by @aidemsined in #1621
- Allow users to toggle perf table columns by @dcblundell in #1624
- More gracefully (re)load performance table data by @dcblundell in #1630
- dev: relax deps version by @mert-kurttutan in #1625
- Support per-core CB allocation calculation and core range parsing by @aidemsined in #1628
- Support pre-aggregated buffer-pages chunks for new and legacy formats by @aidemsined in #1631
- Add stack trace click throughs in additional contexts by @dcblundell in #1639
- Support profiler report files with and without rank suffixes by @smountenay-tt in #1642
- Updated performance table footer totals computation by @dcblundell in #1640
New Contributors
- @mert-kurttutan made their first contribution in #1627
Full Changelog: v0.88.0...v0.89.0