Skip `find_existing_run` call if head and tail pairs sorted differently #143495

AngelicosPhosphoros · 2025-07-05T15:32:29Z

This would help to avoid running comparator for all elements when user pushed an element to end of sorted Vec, breaking order only in last element.

r? @Voultapher

rustbot · 2025-07-05T15:32:33Z

Failed to set assignee to Voultapher: invalid assignee

Note: Only org members with at least the repository "read" role, users with write permissions, or people who have commented on the PR may be assigned.

AngelicosPhosphoros · 2025-07-05T15:34:12Z

r? libs

Kobzol · 2025-07-05T17:27:51Z

@bors2 try @rust-timer queue

rust-bors · 2025-07-05T17:27:55Z

⌛ Trying commit 5735153 with merge f359b11…

To cancel the try build, run the command @bors2 try cancel.

…_run_detection_in_sort_unstable, r=<try> Skip `find_existing_run` call if head and tail pairs sorted differently This would help to avoid running comparator for all elements when user pushed an element to end of sorted Vec, breaking order only in last element. r? `@Voultapher`

rust-bors · 2025-07-05T19:48:45Z

☀️ Try build successful (CI)
Build commit: f359b11 (f359b11e57d9db9db47aefd5d5d19819ca964698, parent: 6dec76f1c2809fded082dd44d3752d3f6220d767)

AngelicosPhosphoros · 2025-07-05T20:02:29Z

@bors2 try cancel

rust-bors · 2025-07-05T20:02:32Z

@AngelicosPhosphoros: 🔑 Insufficient privileges: not in try users

This would help to avoid running comparator for all elements when user pushed an element to end of sorted Vec, breaking order only in last element.

rust-timer · 2025-07-05T21:00:40Z

Finished benchmarking commit (f359b11): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results (secondary -2.8%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-2.8%	[-2.8%, -2.8%]	1
All ❌✅ (primary)	-	-	0

Cycles

Results (primary 1.8%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	1.8%	[1.3%, 2.4%]	2
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	1.8%	[1.3%, 2.4%]	2

Binary size

Results (primary 0.2%, secondary 0.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	0.3%	[0.0%, 1.1%]	4
Regressions ❌ (secondary)	0.0%	[0.0%, 0.0%]	4
Improvements ✅ (primary)	-0.2%	[-0.2%, -0.2%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.2%	[-0.2%, 1.1%]	5

Bootstrap: 461.008s -> 460.509s (-0.11%)
Artifact size: 372.14 MiB -> 372.18 MiB (0.01%)

the8472 · 2025-07-06T12:33:18Z

@bors2 try @rust-timer queue

rust-bors · 2025-07-06T12:33:22Z

⌛ Trying commit 67ac435 with merge 0ccfeea…

To cancel the try build, run the command @bors2 try cancel.

…_run_detection_in_sort_unstable, r=<try> Skip `find_existing_run` call if head and tail pairs sorted differently This would help to avoid running comparator for all elements when user pushed an element to end of sorted Vec, breaking order only in last element. r? `@Voultapher`

rust-bors · 2025-07-06T14:47:52Z

☀️ Try build successful (CI)
Build commit: 0ccfeea (0ccfeea66d37d52cc691133f803b3b143f50efba, parent: e804cd4a5f1a5b658ddca245c80bef96a576c018)

rust-timer · 2025-07-06T19:03:36Z

Finished benchmarking commit (0ccfeea): comparison URL.

Overall result: ❌✅ regressions and improvements - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	0.3%	[0.2%, 0.3%]	2
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-0.0%	[-0.0%, -0.0%]	1
All ❌✅ (primary)	-	-	0

Max RSS (memory usage)

Results (secondary -2.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-2.1%	[-2.1%, -2.1%]	1
All ❌✅ (primary)	-	-	0

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

Results (primary 0.1%, secondary 0.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	0.2%	[0.0%, 0.9%]	6
Regressions ❌ (secondary)	0.0%	[0.0%, 0.0%]	4
Improvements ✅ (primary)	-0.1%	[-0.1%, -0.1%]	2
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.1%	[-0.1%, 0.9%]	8

Bootstrap: 462.855s -> 461.018s (-0.40%)
Artifact size: 372.14 MiB -> 372.14 MiB (0.00%)

Voultapher · 2025-07-06T19:20:33Z

I'll try to look at this PR coming week.

Voultapher · 2025-07-07T17:32:49Z

@AngelicosPhosphoros thanks for tagging me, here is a review of the idea rather than the code itself, which is fine other than maybe the h0 etc. variable names.

I can totally understand the motivation for this change. Why waste N-2 comparisons if it can be avoided by doing 2-4 additional comparisons? At first glance it's a nice algorithmic improvement with no downsides, but upon closer inspection it leaves me with mixed feelings. Before going further let's look at some real performance figures, since the motivation for this change is rooted in improving performance. All tests were performed with rustc 1.90.0-nightly (28f1c8079 2025-06-24) on my main Zen 3 machine. random is the full random pattern and random_s95 is 95% sorted followed by 5% unsorted, simulates append + sort as described here. random_snl_x is a derivate of random_s where everything but the last x elements is sorted. I've also include slice::sort for reasons that will become apparent shortly.

As we can see there is a small but noticeable improvement for random_snl_1, which simulates the case that exactly one element is appended to an already sorted vector. The effect is of similar strength for both u64 and String despite String being more expensive to compare in general, but other constants seem to outweigh this case. random_snl_2 already runs into a 50% chance that the new heuristic will fail to detect that a full scan will be futile. This is enough to nullify the improvement in practice. From this we can hypothesize that this improvement will only be meaningful for the very specific case that exactly one element was added to an already sorted input and then slice::sort_unstable called.

Zooming out, there is a larger issue. Essentially this is trying to optimize a known and documented performance sub-optimality in a way that only works for a very narrow use-case. The documentation for slice::sort_unstable currently contains the following:

It is typically faster than stable sorting, except in a few special cases, e.g., when the slice is partially sorted.

If users can predict this use-case they are much better served with slice::sort which gracefully and efficiently handles any kind of pre-sorted sub-segments as seen in the benchmark results.

With all this combined I'm not convinced that this change - which represents a small but non-zero increase in code complexity - should be merged. It's a non-ideal situation that the generally faster slice::sort_unstable loses out to slice::sort in the quite common sort + append workload, especially for users with code structured in a way that makes it hard to prefer one over the other. There are some more robust approaches that could potentially improve this situation, namely bidirectional initial scanning or even better some from of in-place rotation based merging. @orlp and I initially decided against pursuing these ideas to keep binary-size and compile-times in check, but it certainly doesn't seem impossible to achieve even with a tight budget.

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jul 5, 2025

rustbot assigned Mark-Simulacrum Jul 5, 2025

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 5, 2025

This comment has been minimized.

Sign in to view

AngelicosPhosphoros marked this pull request as draft July 5, 2025 19:58

rustbot removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 5, 2025

Skip find_existing_run call if head and tail pairs sorted differently

67ac435

This would help to avoid running comparator for all elements when user pushed an element to end of sorted Vec, breaking order only in last element.

AngelicosPhosphoros force-pushed the angelicos_phosphoros/skip_run_detection_in_sort_unstable branch from 5735153 to 67ac435 Compare July 5, 2025 20:34

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 5, 2025

AngelicosPhosphoros marked this pull request as ready for review July 6, 2025 00:09

rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 6, 2025

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 6, 2025

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 6, 2025

Skip find_existing_run call if head and tail pairs sorted differently #143495

Are you sure you want to change the base?

Skip find_existing_run call if head and tail pairs sorted differently #143495

Conversation

AngelicosPhosphoros commented Jul 5, 2025

Uh oh!

rustbot commented Jul 5, 2025

Uh oh!

AngelicosPhosphoros commented Jul 5, 2025

Uh oh!

Kobzol commented Jul 5, 2025

Uh oh!

This comment has been minimized.

rust-bors bot commented Jul 5, 2025

Uh oh!

rust-bors bot commented Jul 5, 2025

Uh oh!

This comment has been minimized.

AngelicosPhosphoros commented Jul 5, 2025

Uh oh!

rust-bors bot commented Jul 5, 2025

Uh oh!

rust-timer commented Jul 5, 2025

Overall result: no relevant changes - no action needed

Uh oh!

the8472 commented Jul 6, 2025

Uh oh!

This comment has been minimized.

rust-bors bot commented Jul 6, 2025

Uh oh!

rust-bors bot commented Jul 6, 2025

Uh oh!

This comment has been minimized.

rust-timer commented Jul 6, 2025

Overall result: ❌✅ regressions and improvements - no action needed

Uh oh!

Voultapher commented Jul 6, 2025

Uh oh!

Voultapher commented Jul 7, 2025

Uh oh!

Uh oh!

Skip `find_existing_run` call if head and tail pairs sorted differently #143495

Skip `find_existing_run` call if head and tail pairs sorted differently #143495