Today, when we benchmark tests, all the tests are clumped together, and they are not being compared with each other clearly. Can we group by file / target such that sorts are compared vs sorts, and so on?
Today running a blanket benchmark (note how many different algorithms are compared in the same table; hard to see which sort is the best):

For now, a workaround is limiting the set of benchmarks:
# Only run benchmarks, for sorting directory, which contain "lg" (i.e., the large scale benches).
uv run pytest --benchmark-only tests/sort/ -k "lg"

Today, when we benchmark tests, all the tests are clumped together, and they are not being compared with each other clearly. Can we group by file / target such that sorts are compared vs sorts, and so on?
Today running a blanket benchmark (note how many different algorithms are compared in the same table; hard to see which sort is the best):

For now, a workaround is limiting the set of benchmarks: