Skip to content

wpwoodjr/rust-merge-sort

Repository files navigation

Benchmarks for proposed two stage merge sort vs std TimSort

This repo includes all files used for benchmarking a proposed two stage merge sort with pre-sorted prefix optimization, vs Rust's current implementation of TimSort. The two stages are:

  • Top-down recursive depth-first merge, which helps data locality
  • Small slices sort using a fast insertion sort, then merge

The new sort benefits from pre-sorted prefix optimization for forward and reverse sorted sub-slices. The total running time is O(n * log(n)) worst-case.

TL;DR

The proposed two stage merge sort indicates a speedup of 13.1% for random unsorted data, with 95% of sort runs being faster. For random data, including unsorted and forward / reverse sorted variants, the results indicate a speedup of 20.2%, with 92% of sort runs being faster. Other data patterns are faster too, except for the sawtooth pattern, which is about the same. The number of comparisons performed is -2.92% less over all tests.

Background info

This benchmark is based on https://github.com/notriddle/quickersort/blob/master/examples/perf_txt.rs, (C) 2015 Michael Howell michael@notriddle.com

Updates to the benchmark include:

  • test that new sort result is identical to standard sort (including stability)
  • randomize vec sizes within decade (eg decade 100 has random sizes between 0 and 999)
  • option for multiple benchmark runs
  • option for quick eq testing with multiple vec sizes
  • track comparison ratio between newsort and standard sort
  • determine number of iterations using timer instead of fixed count
  • add "reverse sorted" variant
  • add parallel sort option
  • call stdsort from a module or from lib-std

How the becnhmark was run

The benchmark was run on six OS's / CPUs:

  • Ubuntu 20.04.3 LTS / Intel(R) Core(TM) i7-8559U CPU @ 2.70GHz
  • WIN 10 KVM on Ubuntu / Intel(R) Core(TM) i7-8559U CPU @ 2.70GHz
  • macOS Big Sur Version 11.6 / MacBook Pro (16-inch, 2019) 2.4 GHz 8-Core Intel Core i9
  • Debian Crostini Linux / Google PixelBook Intel(R) 7th Gen Core(TM) i7-7Y75
  • Debian Crostini Linux / Asus Chromebook Flip CX5 Intel(R) 11th Gen Core(TM) i5-1130G7
  • Debian Crostini Linux / HP Chromebook X2 11 Qualcomm(R) Snapdragon(TM) 7c

For each OS/CPU, the benchmark was run on vecs of i16, i32, i64, i128, and String. The vecs were of random lengths from 0 to 10,000,000.

The results were collected in a separate log file for each OS/CPU/vec type combination, and then collected into a single all log file for each OS/CPU combination. For instance, this command and its output, which was run on Ubuntu:

$ ./do-all-builds-std ub-ssf ressw2
i32
resstd/ub-ssf-i32.log
   Compiling newsort v1.0.0 (/home/wpwoodjr/rust/newsort)
    Finished release [optimized] target(s) in 9.80s
Std: Std, Newsort: Newsort
Range 10 to 1000000
String array size = 1095696
Running benchmark 1 of 1: strings... 10's... 100's... 1000's... 10000's... 100000's... 1000000's...
benchmark completed; new to standard comparisons ratio: 0.9609

real	8m19.979s
user	8m13.343s
sys	0m6.539s
Tue 02 Nov 2021 05:02:49 PM EDT
i16
resstd/ub-ssf-i16.log
   Compiling newsort v1.0.0 (/home/wpwoodjr/rust/newsort)
    Finished release [optimized] target(s) in 9.68s
Std: Std, Newsort: Newsort
Range 10 to 1000000
String array size = 1095696
Running benchmark 1 of 1: strings... 10's... 100's... 1000's... 10000's... 100000's... 1000000's...
benchmark completed; new to standard comparisons ratio: 0.9974

real	8m18.201s
user	8m13.943s
sys	0m4.164s
Tue 02 Nov 2021 05:16:16 PM EDT
i64
resstd/ub-ssf-i64.log
   Compiling newsort v1.0.0 (/home/wpwoodjr/rust/newsort)
    Finished release [optimized] target(s) in 9.62s
Std: Std, Newsort: Newsort
Range 10 to 1000000
String array size = 1095696
Running benchmark 1 of 1: strings... 10's... 100's... 1000's... 10000's... 100000's... 1000000's...
benchmark completed; new to standard comparisons ratio: 0.9800

real	8m57.096s
user	8m41.635s
sys	0m15.386s
Tue 02 Nov 2021 05:30:23 PM EDT
i128
resstd/ub-ssf-i128.log
   Compiling newsort v1.0.0 (/home/wpwoodjr/rust/newsort)
    Finished release [optimized] target(s) in 9.70s
Std: Std, Newsort: Newsort
Range 10 to 1000000
String array size = 1095696
Running benchmark 1 of 1: strings... 10's... 100's... 1000's... 10000's... 100000's... 1000000's...
benchmark completed; new to standard comparisons ratio: 0.9794

real	10m0.251s
user	9m27.921s
sys	0m32.248s
Tue 02 Nov 2021 05:45:33 PM EDT
$

resulted in these files:

$ ls -lat ressw2/ub*
-rw-rw-r-- 1 wpwoodjr wpwoodjr 807544 Oct 26 15:31 ressw2/ub-ssf-all.log
-rw-rw-r-- 1 wpwoodjr wpwoodjr 201886 Oct 26 15:31 ressw2/ub-ssf-i128.log
-rw-rw-r-- 1 wpwoodjr wpwoodjr 201886 Oct 26 15:16 ressw2/ub-ssf-i64.log
-rw-rw-r-- 1 wpwoodjr wpwoodjr 201886 Oct 26 15:03 ressw2/ub-ssf-i16.log
-rw-rw-r-- 1 wpwoodjr wpwoodjr 201886 Oct 26 14:49 ressw2/ub-ssf-i32.log

Results

After running the benchmark on all OS/CPUs, the OS/CPU all files were collected into one big all file and statistics run as follows:

$ log=ressw2/all* ./do-stats
filter: /
all                               nrec     % speedup     is faster     cmp ratio
  ressw2/all-ssf-all.log:   	 34584         33.1%           80%        -2.92%

forward sorted                    nrec     % speedup     is faster     cmp ratio
  ressw2/all-ssf-all.log:   	  4944         30.5%           87%         0.00%

not forward sorted or plateau     nrec     % speedup     is faster     cmp ratio
  ressw2/all-ssf-all.log:   	 23736         14.4%           78%        -0.15%

reverse sorted:                   nrec     % speedup     is faster     cmp ratio
  ressw2/all-ssf-all.log:   	  4944         62.6%           87%       -14.27%

rand                              nrec     % speedup     is faster     cmp ratio
  ressw2/all-ssf-all.log:   	  6888         20.2%           92%        -2.14%

rand, not forw/rev sorted:        nrec     % speedup     is faster     cmp ratio
  ressw2/all-ssf-all.log:   	  4920         13.1%           95%        -0.67%

shuffle ident:                    nrec     % speedup     is faster     cmp ratio
  ressw2/all-ssf-all.log:   	   984         11.4%           70%         0.80%

sawtooth ident:                   nrec     % speedup     is faster     cmp ratio
  ressw2/all-ssf-all.log:   	   984          3.9%           49%         4.76%

stagger ident:                    nrec     % speedup     is faster     cmp ratio
  ressw2/all-ssf-all.log:   	   984          7.6%           75%         3.40%

strings:                          nrec     % speedup     is faster     cmp ratio
  ressw2/all-ssf-all.log:   	   144          6.6%           78%        -3.61%

strings, not forward sorted:      nrec     % speedup     is faster     cmp ratio
  ressw2/all-ssf-all.log:   	   120          6.6%           78%        -4.33%

From the above benchmark, the proposed two stage merge sort indicates a speedup of 13.1% for random unsorted data, with 95% of sort runs being faster. For random data, including unsorted and forward / reverse sorted variants, the results indicate a speedup of 20.2%, with 92% of sort runs being faster. Other data patterns are faster too, except for the sawtooth pattern, which is about the same. The number of comparisons performed is -2.92% less over all tests.

These stats compare results by pattern / variant:

$ log=ressw2/all* ./do-stats-by-pattern 
filter: /
sawtooth                                          nrec     % speedup     is faster     cmp ratio
  all            ressw2/all-ssf-all.log:   	  6888         15.4%           68%         1.76%
  ident          ressw2/all-ssf-all.log:   	   984          3.9%           49%         4.76%
  reverse        ressw2/all-ssf-all.log:   	   984         -0.5%           44%        11.76%
  reverse_front  ressw2/all-ssf-all.log:   	   984         14.0%           54%         2.07%
  reverse_back   ressw2/all-ssf-all.log:   	   984          4.2%           54%         4.04%
  sorted         ressw2/all-ssf-all.log:   	   984         30.6%           88%         0.00%
  reverse_sorted ressw2/all-ssf-all.log:   	   984         42.3%           89%        -9.85%
  dither         ressw2/all-ssf-all.log:   	   984         13.4%           95%        -0.47%
rand                                              nrec     % speedup     is faster     cmp ratio
  all            ressw2/all-ssf-all.log:   	  6888         20.2%           92%        -2.14%
  ident          ressw2/all-ssf-all.log:   	   984         12.9%           95%        -0.73%
  reverse        ressw2/all-ssf-all.log:   	   984         13.1%           95%        -0.63%
  reverse_front  ressw2/all-ssf-all.log:   	   984         13.3%           95%        -0.87%
  reverse_back   ressw2/all-ssf-all.log:   	   984         12.8%           95%        -0.36%
  sorted         ressw2/all-ssf-all.log:   	   984         29.7%           86%         0.00%
  reverse_sorted ressw2/all-ssf-all.log:   	   984         45.9%           82%       -11.61%
  dither         ressw2/all-ssf-all.log:   	   984         13.5%           96%        -0.77%
stagger                                           nrec     % speedup     is faster     cmp ratio
  all            ressw2/all-ssf-all.log:   	  6888         14.2%           82%         1.15%
  ident          ressw2/all-ssf-all.log:   	   984          7.6%           75%         3.40%
  reverse        ressw2/all-ssf-all.log:   	   984          8.3%           78%         2.26%
  reverse_front  ressw2/all-ssf-all.log:   	   984          8.6%           77%         2.63%
  reverse_back   ressw2/all-ssf-all.log:   	   984         10.3%           86%         0.60%
  sorted         ressw2/all-ssf-all.log:   	   984         31.2%           88%         0.00%
  reverse_sorted ressw2/all-ssf-all.log:   	   984         21.5%           82%        -3.11%
  dither         ressw2/all-ssf-all.log:   	   984         11.8%           91%         2.29%
plateau                                           nrec     % speedup     is faster     cmp ratio
  all            ressw2/all-ssf-all.log:   	  6888         99.0%           83%       -14.15%
  ident          ressw2/all-ssf-all.log:   	   984         26.5%           82%         1.87%
  reverse        ressw2/all-ssf-all.log:   	   984        183.4%           94%       -40.39%
  reverse_front  ressw2/all-ssf-all.log:   	   984        245.3%           97%       -38.97%
  reverse_back   ressw2/all-ssf-all.log:   	   984         23.6%           82%         2.00%
  sorted         ressw2/all-ssf-all.log:   	   984         30.9%           87%         0.00%
  reverse_sorted ressw2/all-ssf-all.log:   	   984        186.3%           98%       -41.32%
  dither         ressw2/all-ssf-all.log:   	   984         -3.1%           39%        17.73%
shuffle                                           nrec     % speedup     is faster     cmp ratio
  all            ressw2/all-ssf-all.log:   	  6888         17.2%           76%        -1.21%
  ident          ressw2/all-ssf-all.log:   	   984         11.4%           70%         0.80%
  reverse        ressw2/all-ssf-all.log:   	   984          8.4%           63%        -0.42%
  reverse_front  ressw2/all-ssf-all.log:   	   984         30.9%           74%        -4.91%
  reverse_back   ressw2/all-ssf-all.log:   	   984          8.4%           64%         1.39%
  sorted         ressw2/all-ssf-all.log:   	   984         30.6%           87%         0.00%
  reverse_sorted ressw2/all-ssf-all.log:   	   984         18.0%           82%        -5.53%
  dither         ressw2/all-ssf-all.log:   	   984         12.5%           95%         0.19%
strings                                           nrec     % speedup     is faster     cmp ratio
  all            ressw2/all-ssf-all.log:   	   144          6.6%           78%        -3.61%
  ident          ressw2/all-ssf-all.log:   	    24          4.8%           83%        -2.96%
  reverse        ressw2/all-ssf-all.log:   	    24          3.0%           71%        -2.45%
  reverse_front  ressw2/all-ssf-all.log:   	    24          3.4%           67%        -3.68%
  reverse_back   ressw2/all-ssf-all.log:   	    24          1.2%           71%        -1.07%
  sorted         ressw2/all-ssf-all.log:   	    24          6.4%           79%         0.00%
  reverse_sorted ressw2/all-ssf-all.log:   	    24         20.7%          100%       -11.48%
  dither         ressw2/all-ssf-all.log:   	     0

These stats compare results by variant / pattern:

$ log=ressw2/all* ./do-stats-by-variant 
filter: /
ident                                             nrec     % speedup     is faster     cmp ratio
  all            ressw2/all-ssf-all.log:   	  4944         12.4%           74%         2.00%
  sawtooth       ressw2/all-ssf-all.log:   	   984          3.9%           49%         4.76%
  rand           ressw2/all-ssf-all.log:   	   984         12.9%           95%        -0.73%
  stagger        ressw2/all-ssf-all.log:   	   984          7.6%           75%         3.40%
  plateau        ressw2/all-ssf-all.log:   	   984         26.5%           82%         1.87%
  shuffle        ressw2/all-ssf-all.log:   	   984         11.4%           70%         0.80%
  strings        ressw2/all-ssf-all.log:   	    24          4.8%           83%        -2.96%
reverse                                           nrec     % speedup     is faster     cmp ratio
  all            ressw2/all-ssf-all.log:   	  4944         42.4%           75%        -5.47%
  sawtooth       ressw2/all-ssf-all.log:   	   984         -0.5%           44%        11.76%
  rand           ressw2/all-ssf-all.log:   	   984         13.1%           95%        -0.63%
  stagger        ressw2/all-ssf-all.log:   	   984          8.3%           78%         2.26%
  plateau        ressw2/all-ssf-all.log:   	   984        183.4%           94%       -40.39%
  shuffle        ressw2/all-ssf-all.log:   	   984          8.4%           63%        -0.42%
  strings        ressw2/all-ssf-all.log:   	    24          3.0%           71%        -2.45%
reverse_front                                     nrec     % speedup     is faster     cmp ratio
  all            ressw2/all-ssf-all.log:   	  4944         62.1%           79%        -7.99%
  sawtooth       ressw2/all-ssf-all.log:   	   984         14.0%           54%         2.07%
  rand           ressw2/all-ssf-all.log:   	   984         13.3%           95%        -0.87%
  stagger        ressw2/all-ssf-all.log:   	   984          8.6%           77%         2.63%
  plateau        ressw2/all-ssf-all.log:   	   984        245.3%           97%       -38.97%
  shuffle        ressw2/all-ssf-all.log:   	   984         30.9%           74%        -4.91%
  strings        ressw2/all-ssf-all.log:   	    24          3.4%           67%        -3.68%
reverse_back                                      nrec     % speedup     is faster     cmp ratio
  all            ressw2/all-ssf-all.log:   	  4944         11.8%           76%         1.52%
  sawtooth       ressw2/all-ssf-all.log:   	   984          4.2%           54%         4.04%
  rand           ressw2/all-ssf-all.log:   	   984         12.8%           95%        -0.36%
  stagger        ressw2/all-ssf-all.log:   	   984         10.3%           86%         0.60%
  plateau        ressw2/all-ssf-all.log:   	   984         23.6%           82%         2.00%
  shuffle        ressw2/all-ssf-all.log:   	   984          8.4%           64%         1.39%
  strings        ressw2/all-ssf-all.log:   	    24          1.2%           71%        -1.07%
sorted                                            nrec     % speedup     is faster     cmp ratio
  all            ressw2/all-ssf-all.log:   	  4944         30.5%           87%         0.00%
  sawtooth       ressw2/all-ssf-all.log:   	   984         30.6%           88%         0.00%
  rand           ressw2/all-ssf-all.log:   	   984         29.7%           86%         0.00%
  stagger        ressw2/all-ssf-all.log:   	   984         31.2%           88%         0.00%
  plateau        ressw2/all-ssf-all.log:   	   984         30.9%           87%         0.00%
  shuffle        ressw2/all-ssf-all.log:   	   984         30.6%           87%         0.00%
  strings        ressw2/all-ssf-all.log:   	    24          6.4%           79%         0.00%
reverse_sorted                                    nrec     % speedup     is faster     cmp ratio
  all            ressw2/all-ssf-all.log:   	  4944         62.6%           87%       -14.27%
  sawtooth       ressw2/all-ssf-all.log:   	   984         42.3%           89%        -9.85%
  rand           ressw2/all-ssf-all.log:   	   984         45.9%           82%       -11.61%
  stagger        ressw2/all-ssf-all.log:   	   984         21.5%           82%        -3.11%
  plateau        ressw2/all-ssf-all.log:   	   984        186.3%           98%       -41.32%
  shuffle        ressw2/all-ssf-all.log:   	   984         18.0%           82%        -5.53%
  strings        ressw2/all-ssf-all.log:   	    24         20.7%          100%       -11.48%
dither                                            nrec     % speedup     is faster     cmp ratio
  all            ressw2/all-ssf-all.log:   	  4920          9.6%           83%         3.79%
  sawtooth       ressw2/all-ssf-all.log:   	   984         13.4%           95%        -0.47%
  rand           ressw2/all-ssf-all.log:   	   984         13.5%           96%        -0.77%
  stagger        ressw2/all-ssf-all.log:   	   984         11.8%           91%         2.29%
  plateau        ressw2/all-ssf-all.log:   	   984         -3.1%           39%        17.73%
  shuffle        ressw2/all-ssf-all.log:   	   984         12.5%           95%         0.19%
  strings        ressw2/all-ssf-all.log:   	     0

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published