Skip to content

Ryzen (CPU model unknown)

travisdowns edited this page May 20, 2018 · 2 revisions
Welcome to uarch-bench (e1d92fb-dirty)
Median CPU speed: 1.499 GHz
Running benchmarks groups using timer clock

** Running benchmark group Default Group **
                     Benchmark   Cycles    Nanos
           Dependent add chain     1.00     0.67
         Independent add chain     0.25     0.17
        Dependent imul 64->128     3.00     2.00
         Dependent imul 64->64     3.00     2.00
      Independent imul 64->128     2.00     1.33
          Same location stores     1.00     0.67
      Disjoint location stores     1.00     0.67
      Dependent push/pop chain     7.00     4.67
     Inependent push/pop chain     1.00     0.67

** Inverse throughput for load/16-bit **
offset      0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15
  0 :     1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
 16 :     1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
 32 :     1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
 48 :     1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0

** Inverse throughput for load/32-bit **
offset      0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15
  0 :     0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5
 16 :     0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  1.0  1.0  1.0
 32 :     0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5
 48 :     0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  1.0  1.0  1.0

** Inverse throughput for load/64-bit **
offset      0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15
  0 :     0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5
 16 :     0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  1.0  1.0  1.0  1.0  1.0  1.0  1.0
 32 :     0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5
 48 :     0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  1.0  1.0  1.0  1.0  1.0  1.0  1.0

** Inverse throughput for load/128-bit **
offset      0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15
  0 :     0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5
 16 :     0.5  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
 32 :     0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5  0.5
 48 :     0.5  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0

** Inverse throughput for load/256-bit **
offset      0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15
  0 :     1.0  1.5  1.5  1.5  1.5  1.5  1.5  1.5  1.5  1.5  1.5  1.5  1.5  1.5  1.5  1.5
 16 :     1.0  1.5  1.5  1.5  1.5  1.5  1.5  1.5  1.5  1.5  1.5  1.5  1.5  1.5  1.5  1.5
 32 :     1.0  1.5  1.5  1.5  1.5  1.5  1.5  1.5  1.5  1.5  1.5  1.5  1.5  1.5  1.5  1.5
 48 :     1.0  1.5  1.5  1.5  1.5  1.5  1.5  1.5  1.5  1.5  1.5  1.5  1.5  1.5  1.5  1.5

** Inverse throughput for store/16-bit **
offset      0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15
  0 :     1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  5.0
 16 :     1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  5.0
 32 :     1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  5.0
 48 :     1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  5.0

** Inverse throughput for store/32-bit **
offset      0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15
  0 :     1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  5.0  5.0  5.0
 16 :     1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  5.0  5.0  5.0
 32 :     1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  5.0  5.0  5.0
 48 :     1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  5.0  5.0  5.0

** Inverse throughput for store/64-bit **
offset      0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15
  0 :     1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  5.0  5.0  5.0  2.0  5.0  5.0  5.0
 16 :     1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  5.0  5.0  5.0  2.0  5.0  5.0  5.0
 32 :     1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  5.0  5.0  5.0  2.0  5.0  5.0  5.0
 48 :     1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  5.0  5.0  5.0  2.0  5.0  5.0  5.0

** Inverse throughput for store/128-bit **
offset      0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15
  0 :     1.0  5.0  5.0  5.0  2.0  5.0  5.0  5.0  2.0  5.0  5.0  5.0  2.0  5.0  5.0  5.0
 16 :     1.0  5.0  5.0  5.0  2.0  5.0  5.0  5.0  2.0  5.0  5.0  5.0  2.0  5.0  5.0  5.0
 32 :     1.0  5.0  5.0  5.0  2.0  5.0  5.0  5.0  2.0  5.0  5.0  5.0  2.0  5.0  5.0  5.0
 48 :     1.0  5.0  5.0  5.0  2.0  5.0  5.0  5.0  2.0  5.0  5.0  5.0  2.0  5.0  5.0  5.0

** Inverse throughput for store/256-bit **
offset      0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15
  0 :     2.0  7.0  7.0  7.0  4.0  7.0  7.0  7.0  4.0  7.0  7.0  7.0  4.0  7.0  7.0  7.0
 16 :     2.0  7.0  7.0  7.0  4.0  7.0  7.0  7.0  4.0  7.0  7.0  7.0  4.0  7.0  7.0  7.0
 32 :     2.0  7.0  7.0  7.0  4.0  7.0  7.0  7.0  4.0  7.0  7.0  7.0  4.0  7.0  7.0  7.0
 48 :     2.0  7.0  7.0  7.0  4.0  7.0  7.0  7.0  4.0  7.0  7.0  7.0  4.0  7.0  7.0  7.0

** Running benchmark group Parallel load/prefetches from fixed-size regions **
                     Benchmark   Cycles    Nanos
         16-KiB parallel-loads     0.53     0.35
    16-KiB parallel-prefetcht0     0.50     0.33
    16-KiB parallel-prefetcht1     0.50     0.33
    16-KiB parallel-prefetcht2     0.50     0.33
   16-KiB parallel-prefetchnta     0.50     0.33
         32-KiB parallel-loads     0.53     0.35
    32-KiB parallel-prefetcht0     0.50     0.33
    32-KiB parallel-prefetcht1     0.50     0.33
    32-KiB parallel-prefetcht2     0.50     0.33
   32-KiB parallel-prefetchnta     0.50     0.33
         64-KiB parallel-loads     2.00     1.33
    64-KiB parallel-prefetcht0     0.50     0.33
    64-KiB parallel-prefetcht1     0.50     0.33
    64-KiB parallel-prefetcht2     0.50     0.33
   64-KiB parallel-prefetchnta     0.50     0.33
        128-KiB parallel-loads     2.00     1.33
   128-KiB parallel-prefetcht0     0.50     0.33
   128-KiB parallel-prefetcht1     0.50     0.33
   128-KiB parallel-prefetcht2     0.50     0.33
  128-KiB parallel-prefetchnta     0.50     0.33
        256-KiB parallel-loads     2.00     1.33
   256-KiB parallel-prefetcht0     0.50     0.33
   256-KiB parallel-prefetcht1     0.50     0.33
   256-KiB parallel-prefetcht2     0.50     0.33
  256-KiB parallel-prefetchnta     0.50     0.33
        512-KiB parallel-loads     2.01     1.34
   512-KiB parallel-prefetcht0     0.50     0.33
   512-KiB parallel-prefetcht1     0.50     0.33
   512-KiB parallel-prefetcht2     0.50     0.34
  512-KiB parallel-prefetchnta     0.50     0.33
       2048-KiB parallel-loads     2.06     1.37
  2048-KiB parallel-prefetcht0     0.51     0.34
  2048-KiB parallel-prefetcht1     0.51     0.34
  2048-KiB parallel-prefetcht2     0.51     0.34
 2048-KiB parallel-prefetchnta     0.50     0.33

** Running benchmark group Serial loads from fixed-size regions **
                     Benchmark   Cycles    Nanos
           16-KiB serial loads     4.00     2.67
           24-KiB serial loads     4.00     2.67
           30-KiB serial loads     4.00     2.67
           31-KiB serial loads     4.00     2.67
           32-KiB serial loads     4.00     2.67
           33-KiB serial loads     5.11     3.41
           34-KiB serial loads     5.76     3.84
           35-KiB serial loads     8.53     5.69
           40-KiB serial loads    12.06     8.04
           48-KiB serial loads    12.16     8.11
           56-KiB serial loads    12.06     8.05
           64-KiB serial loads    12.05     8.03
           80-KiB serial loads    12.06     8.04
           96-KiB serial loads    12.07     8.05
          112-KiB serial loads    12.08     8.06
          128-KiB serial loads    12.05     8.04
          196-KiB serial loads    12.06     8.05
          252-KiB serial loads    12.06     8.04
          256-KiB serial loads    12.06     8.04
          260-KiB serial loads    12.19     8.13
          384-KiB serial loads    17.28    11.53
          512-KiB serial loads    21.46    14.31
         1024-KiB serial loads    35.88    23.93
         2048-KiB serial loads    39.74    26.51

** Running benchmark group Store forwaring latency and throughput **
                     Benchmark   Cycles    Nanos
 Store forward latency delay 0     6.99     4.66
 Store forward latency delay 1     6.99     4.66
 Store forward latency delay 2     6.99     4.66
 Store forward latency delay 3     6.99     4.66
 Store forward latency delay 4     6.99     4.66
 Store forward latency delay 5     6.31     4.21
  Store fwd tput concurrency 1     6.99     4.66
  Store fwd tput concurrency 2     3.50     2.33
  Store fwd tput concurrency 3     2.33     1.55
  Store fwd tput concurrency 4     1.75     1.17
  Store fwd tput concurrency 5     1.40     0.93
  Store fwd tput concurrency 6     1.17     0.78
  Store fwd tput concurrency 7     1.06     0.71
  Store fwd tput concurrency 8     1.00     0.67
  Store fwd tput concurrency 9     1.00     0.67
 Store fwd tput concurrency 10     1.00     0.67

** Running benchmark group Store forwaring latency and throughput **

---------- Oneshot calibration start --------------
                     Benchmark   Cycles    Nanos
       Oneshot overhead min       89.96    60.00
Oneshot overhead median (used)   104.95    70.00
       Oneshot overhead max      104.95    70.00
---------- Oneshot calibration end   --------------

oneshot-dummy @ 0x0x494100
                     Benchmark   Sample   Cycles    Nanos
           Empty oneshot bench        1     0.00     0.00
           Empty oneshot bench        2     0.00     0.00
           Empty oneshot bench        3     0.00     0.00
           Empty oneshot bench        4   -14.99   -10.00
           Empty oneshot bench        5   -14.99   -10.00
           Empty oneshot bench        6     0.00     0.00
           Empty oneshot bench        7   -14.99   -10.00
           Empty oneshot bench        8   -14.99   -10.00
           Empty oneshot bench        9   -14.99   -10.00
           Empty oneshot bench       10     0.00     0.00
           Empty oneshot bench       11   -14.99   -10.00
           Empty oneshot bench       12   -14.99   -10.00
           Empty oneshot bench       13     0.00     0.00
           Empty oneshot bench       14     0.00     0.00
           Empty oneshot bench       15   -14.99   -10.00
           Empty oneshot bench       16   -14.99   -10.00
           Empty oneshot bench       17     0.00     0.00
           Empty oneshot bench       18     0.00     0.00
           Empty oneshot bench       19   -14.99   -10.00
           Empty oneshot bench       20   -14.99   -10.00

oneshot-latency-2 @ 0x0x4a11c0
                     Benchmark   Sample   Cycles    Nanos
   StFwd oneshot lat (delay 2)        1144767.62 96560.00
   StFwd oneshot lat (delay 2)        2144482.76 96370.00
   StFwd oneshot lat (delay 2)        3144481.26 96369.00
   StFwd oneshot lat (delay 2)        4144482.76 96370.00
   StFwd oneshot lat (delay 2)        5144467.77 96360.00
   StFwd oneshot lat (delay 2)        6144481.26 96369.00
   StFwd oneshot lat (delay 2)        7144482.76 96370.00
   StFwd oneshot lat (delay 2)        8144481.26 96369.00
   StFwd oneshot lat (delay 2)        9144482.76 96370.00
   StFwd oneshot lat (delay 2)       10144467.77 96360.00
   StFwd oneshot lat (delay 2)       11144481.26 96369.00
   StFwd oneshot lat (delay 2)       12144482.76 96370.00
   StFwd oneshot lat (delay 2)       13144482.76 96370.00
   StFwd oneshot lat (delay 2)       14144481.26 96369.00
   StFwd oneshot lat (delay 2)       15144482.76 96370.00
   StFwd oneshot lat (delay 2)       16151587.71101109.00
   StFwd oneshot lat (delay 2)       17144482.76 96370.00
   StFwd oneshot lat (delay 2)       18144482.76 96370.00
   StFwd oneshot lat (delay 2)       19144481.26 96369.00
   StFwd oneshot lat (delay 2)       20144482.76 96370.00

oneshot-latency-1 @ 0x0x4a0e40
                     Benchmark   Sample   Cycles    Nanos
   StFwd oneshot lat (delay 1)        1144767.62 96560.00
   StFwd oneshot lat (delay 1)        2144482.76 96370.00
   StFwd oneshot lat (delay 1)        3144481.26 96369.00
   StFwd oneshot lat (delay 1)        4144482.76 96370.00
   StFwd oneshot lat (delay 1)        5144482.76 96370.00
   StFwd oneshot lat (delay 1)        6144481.26 96369.00
   StFwd oneshot lat (delay 1)        7144482.76 96370.00
   StFwd oneshot lat (delay 1)        8144482.76 96370.00
   StFwd oneshot lat (delay 1)        9144481.26 96369.00
   StFwd oneshot lat (delay 1)       10144482.76 96370.00
   StFwd oneshot lat (delay 1)       11144466.27 96359.00
   StFwd oneshot lat (delay 1)       12144467.77 96360.00
   StFwd oneshot lat (delay 1)       13144482.76 96370.00
   StFwd oneshot lat (delay 1)       14144481.26 96369.00
   StFwd oneshot lat (delay 1)       15144482.76 96370.00
   StFwd oneshot lat (delay 1)       16144482.76 96370.00
   StFwd oneshot lat (delay 1)       17144481.26 96369.00
   StFwd oneshot lat (delay 1)       18144482.76 96370.00
   StFwd oneshot lat (delay 1)       19144482.76 96370.00
   StFwd oneshot lat (delay 1)       20144481.26 96369.00

oneshot-latency-0 @ 0x0x4a0ac0
                     Benchmark   Sample   Cycles    Nanos
   StFwd oneshot lat (delay 0)        1144691.15 96509.00
   StFwd oneshot lat (delay 0)        2144482.76 96370.00
   StFwd oneshot lat (delay 0)        3144482.76 96370.00
   StFwd oneshot lat (delay 0)        4144481.26 96369.00
   StFwd oneshot lat (delay 0)        5144482.76 96370.00
   StFwd oneshot lat (delay 0)        6144482.76 96370.00
   StFwd oneshot lat (delay 0)        7144481.26 96369.00
   StFwd oneshot lat (delay 0)        8144467.77 96360.00
   StFwd oneshot lat (delay 0)        9152052.47101419.00
   StFwd oneshot lat (delay 0)       10144482.76 96370.00
   StFwd oneshot lat (delay 0)       11144482.76 96370.00
   StFwd oneshot lat (delay 0)       12144481.26 96369.00
   StFwd oneshot lat (delay 0)       13144482.76 96370.00
   StFwd oneshot lat (delay 0)       14144482.76 96370.00
   StFwd oneshot lat (delay 0)       15144481.26 96369.00
   StFwd oneshot lat (delay 0)       16144482.76 96370.00
   StFwd oneshot lat (delay 0)       17144466.27 96359.00
   StFwd oneshot lat (delay 0)       18144467.77 96360.00
   StFwd oneshot lat (delay 0)       19144467.77 96360.00
   StFwd oneshot lat (delay 0)       20144466.27 96359.00


** Running benchmark group Store forward attempts **
oneshot-dummy @ 0x0x494100
                     Benchmark   Sample   Cycles    Nanos
           Empty oneshot bench        1   -14.99   -10.00
           Empty oneshot bench        2   -14.99   -10.00
           Empty oneshot bench        3   -14.99   -10.00
           Empty oneshot bench        4   -14.99   -10.00
           Empty oneshot bench        5     0.00     0.00
           Empty oneshot bench        6     0.00     0.00
           Empty oneshot bench        7   -14.99   -10.00
           Empty oneshot bench        8   -14.99   -10.00
           Empty oneshot bench        9   -14.99   -10.00
           Empty oneshot bench       10     0.00     0.00
           Empty oneshot bench       11   -14.99   -10.00
           Empty oneshot bench       12   -14.99   -10.00
           Empty oneshot bench       13     0.00     0.00
           Empty oneshot bench       14     0.00     0.00
           Empty oneshot bench       15   -14.99   -10.00
           Empty oneshot bench       16   -14.99   -10.00
           Empty oneshot bench       17     0.00     0.00
           Empty oneshot bench       18   -14.99   -10.00
           Empty oneshot bench       19   -14.99   -10.00
           Empty oneshot bench       20   -14.99   -10.00

stfwd-try1 @ 0x0x4a0780
                     Benchmark   Sample   Cycles    Nanos
                    stfwd-try1        1   674.66   450.00
                    stfwd-try1        2    89.96    60.00
                    stfwd-try1        3    89.96    60.00
                    stfwd-try1        4    89.96    60.00
                    stfwd-try1        5    89.96    60.00
                    stfwd-try1        6    74.96    50.00
                    stfwd-try1        7    89.96    60.00
                    stfwd-try1        8    74.96    50.00
                    stfwd-try1        9    89.96    60.00
                    stfwd-try1       10    74.96    50.00
                    stfwd-try1       11    74.96    50.00
                    stfwd-try1       12    74.96    50.00
                    stfwd-try1       13    74.96    50.00
                    stfwd-try1       14    74.96    50.00
                    stfwd-try1       15    89.96    60.00
                    stfwd-try1       16    74.96    50.00
                    stfwd-try1       17    89.96    60.00
                    stfwd-try1       18    74.96    50.00
                    stfwd-try1       19    89.96    60.00
                    stfwd-try1       20    89.96    60.00

stfwd-try2 @ 0x0x4a02c0
                     Benchmark   Sample   Cycles    Nanos
          stfwd-try2 100 loads        1   614.69   410.00
          stfwd-try2 100 loads        2  3658.17  2440.00
          stfwd-try2 100 loads        3   254.87   170.00
          stfwd-try2 100 loads        4   254.87   170.00
          stfwd-try2 100 loads        5   254.87   170.00
          stfwd-try2 100 loads        6   254.87   170.00
          stfwd-try2 100 loads        7   269.87   180.00
          stfwd-try2 100 loads        8   254.87   170.00
          stfwd-try2 100 loads        9   254.87   170.00
          stfwd-try2 100 loads       10   254.87   170.00
          stfwd-try2 100 loads       11   254.87   170.00
          stfwd-try2 100 loads       12   269.87   180.00
          stfwd-try2 100 loads       13   254.87   170.00
          stfwd-try2 100 loads       14   254.87   170.00
          stfwd-try2 100 loads       15   254.87   170.00
          stfwd-try2 100 loads       16   254.87   170.00
          stfwd-try2 100 loads       17   269.87   180.00
          stfwd-try2 100 loads       18   254.87   170.00
          stfwd-try2 100 loads       19   254.87   170.00
          stfwd-try2 100 loads       20   254.87   170.00

stfwd-try2-4 @ 0x0x49d200
                     Benchmark   Sample   Cycles    Nanos
            stfwd-try2 4 loads        1    74.96    50.00
            stfwd-try2 4 loads        2   164.92   110.00
            stfwd-try2 4 loads        3   -14.99   -10.00
            stfwd-try2 4 loads        4     0.00     0.00
            stfwd-try2 4 loads        5     0.00     0.00
            stfwd-try2 4 loads        6   -14.99   -10.00
            stfwd-try2 4 loads        7   -14.99   -10.00
            stfwd-try2 4 loads        8   -14.99   -10.00
            stfwd-try2 4 loads        9     0.00     0.00
            stfwd-try2 4 loads       10     0.00     0.00
            stfwd-try2 4 loads       11   -14.99   -10.00
            stfwd-try2 4 loads       12   -14.99   -10.00
            stfwd-try2 4 loads       13   -14.99   -10.00
            stfwd-try2 4 loads       14     0.00     0.00
            stfwd-try2 4 loads       15     0.00     0.00
            stfwd-try2 4 loads       16   -14.99   -10.00
            stfwd-try2 4 loads       17   -14.99   -10.00
            stfwd-try2 4 loads       18   -14.99   -10.00
            stfwd-try2 4 loads       19     0.00     0.00
            stfwd-try2 4 loads       20     0.00     0.00

stfwd-try2-10 @ 0x0x49d240
                     Benchmark   Sample   Cycles    Nanos
           stfwd-try2 10 loads        1    44.98    30.00
           stfwd-try2 10 loads        2   389.81   260.00
           stfwd-try2 10 loads        3     0.00     0.00
           stfwd-try2 10 loads        4     0.00     0.00
           stfwd-try2 10 loads        5     0.00     0.00
           stfwd-try2 10 loads        6     0.00     0.00
           stfwd-try2 10 loads        7   -14.99   -10.00
           stfwd-try2 10 loads        8   -14.99   -10.00
           stfwd-try2 10 loads        9   -14.99   -10.00
           stfwd-try2 10 loads       10     0.00     0.00
           stfwd-try2 10 loads       11     0.00     0.00
           stfwd-try2 10 loads       12     0.00     0.00
           stfwd-try2 10 loads       13     0.00     0.00
           stfwd-try2 10 loads       14     0.00     0.00
           stfwd-try2 10 loads       15   -14.99   -10.00
           stfwd-try2 10 loads       16   -14.99   -10.00
           stfwd-try2 10 loads       17     0.00     0.00
           stfwd-try2 10 loads       18     0.00     0.00
           stfwd-try2 10 loads       19     0.00     0.00
           stfwd-try2 10 loads       20     0.00     0.00

stfwd-try2-20 @ 0x0x4a01c0
                     Benchmark   Sample   Cycles    Nanos
           stfwd-try2 20 loads        1   509.75   340.00
           stfwd-try2 20 loads        2   734.63   490.00
           stfwd-try2 20 loads        3    14.99    10.00
           stfwd-try2 20 loads        4    29.99    20.00
           stfwd-try2 20 loads        5    14.99    10.00
           stfwd-try2 20 loads        6    29.99    20.00
           stfwd-try2 20 loads        7    14.99    10.00
           stfwd-try2 20 loads        8    14.99    10.00
           stfwd-try2 20 loads        9    14.99    10.00
           stfwd-try2 20 loads       10    14.99    10.00
           stfwd-try2 20 loads       11    14.99    10.00
           stfwd-try2 20 loads       12    14.99    10.00
           stfwd-try2 20 loads       13    14.99    10.00
           stfwd-try2 20 loads       14    29.99    20.00
           stfwd-try2 20 loads       15    14.99    10.00
           stfwd-try2 20 loads       16    29.99    20.00
           stfwd-try2 20 loads       17    14.99    10.00
           stfwd-try2 20 loads       18    14.99    10.00
           stfwd-try2 20 loads       19    14.99    10.00
           stfwd-try2 20 loads       20    14.99    10.00

stfwd-try2-1000 @ 0x0x49d2c0
                     Benchmark   Sample   Cycles    Nanos
         stfwd-try2 1000 loads        1 32188.91 21470.00
         stfwd-try2 1000 loads        2 36236.88 24170.00
         stfwd-try2 1000 loads        3  2968.52  1980.00
         stfwd-try2 1000 loads        4  2968.52  1980.00
         stfwd-try2 1000 loads        5  2953.52  1970.00
         stfwd-try2 1000 loads        6  2953.52  1970.00
         stfwd-try2 1000 loads        7  2968.52  1980.00
         stfwd-try2 1000 loads        8  2968.52  1980.00
         stfwd-try2 1000 loads        9  2968.52  1980.00
         stfwd-try2 1000 loads       10  2953.52  1970.00
         stfwd-try2 1000 loads       11  2953.52  1970.00
         stfwd-try2 1000 loads       12  2968.52  1980.00
         stfwd-try2 1000 loads       13  2968.52  1980.00
         stfwd-try2 1000 loads       14  2968.52  1980.00
         stfwd-try2 1000 loads       15  2953.52  1970.00
         stfwd-try2 1000 loads       16  2953.52  1970.00
         stfwd-try2 1000 loads       17  2968.52  1980.00
         stfwd-try2 1000 loads       18  2968.52  1980.00
         stfwd-try2 1000 loads       19  2968.52  1980.00
         stfwd-try2 1000 loads       20  2953.52  1970.00

stfwd-try2-1000w @ 0x0x49d2c0
                     Benchmark   Sample   Cycles    Nanos
    stfwd-try2 1000 loads warm        1  2983.51  1990.00
    stfwd-try2 1000 loads warm        2 36236.88 24170.00
    stfwd-try2 1000 loads warm        3 36236.88 24170.00
    stfwd-try2 1000 loads warm        4 36236.88 24170.00
    stfwd-try2 1000 loads warm        5 36251.87 24180.00
    stfwd-try2 1000 loads warm        6 36236.88 24170.00
    stfwd-try2 1000 loads warm        7 36236.88 24170.00
    stfwd-try2 1000 loads warm        8 36236.88 24170.00
    stfwd-try2 1000 loads warm        9 36236.88 24170.00
    stfwd-try2 1000 loads warm       10 36235.38 24169.00
    stfwd-try2 1000 loads warm       11 36236.88 24170.00
    stfwd-try2 1000 loads warm       12 36236.88 24170.00
    stfwd-try2 1000 loads warm       13 36236.88 24170.00
    stfwd-try2 1000 loads warm       14 36251.87 24180.00
    stfwd-try2 1000 loads warm       15 36236.88 24170.00
    stfwd-try2 1000 loads warm       16 36236.88 24170.00
    stfwd-try2 1000 loads warm       17 36236.88 24170.00
    stfwd-try2 1000 loads warm       18 36236.88 24170.00
    stfwd-try2 1000 loads warm       19 36236.88 24170.00
    stfwd-try2 1000 loads warm       20 36235.38 24169.00

stfwd-try2b @ 0x0x4a02c0
                     Benchmark   Sample   Cycles    Nanos
          stfwd-try2 100 loads        1   254.87   170.00
          stfwd-try2 100 loads        2   254.87   170.00
          stfwd-try2 100 loads        3   254.87   170.00
          stfwd-try2 100 loads        4   269.87   180.00
          stfwd-try2 100 loads        5   254.87   170.00
          stfwd-try2 100 loads        6   254.87   170.00
          stfwd-try2 100 loads        7   254.87   170.00
          stfwd-try2 100 loads        8   254.87   170.00
          stfwd-try2 100 loads        9   269.87   180.00
          stfwd-try2 100 loads       10   254.87   170.00
          stfwd-try2 100 loads       11   254.87   170.00
          stfwd-try2 100 loads       12   254.87   170.00
          stfwd-try2 100 loads       13   254.87   170.00
          stfwd-try2 100 loads       14   269.87   180.00
          stfwd-try2 100 loads       15   254.87   170.00
          stfwd-try2 100 loads       16   254.87   170.00
          stfwd-try2 100 loads       17   254.87   170.00
          stfwd-try2 100 loads       18   254.87   170.00
          stfwd-try2 100 loads       19   269.87   180.00
          stfwd-try2 100 loads       20   254.87   170.00

stfwd-try2c @ 0x0x49d180
                     Benchmark   Sample   Cycles    Nanos
                 trained loads        1   209.90   140.00
                 trained loads        2   209.90   140.00
                 trained loads        3   149.93   100.00
                 trained loads        4   149.93   100.00
                 trained loads        5    74.96    50.00
                 trained loads        6    74.96    50.00
                 trained loads        7    74.96    50.00
                 trained loads        8    59.97    40.00
                 trained loads        9    59.97    40.00
                 trained loads       10    59.97    40.00
                 trained loads       11    29.99    20.00
                 trained loads       12    44.98    30.00
                 trained loads       13    29.99    20.00
                 trained loads       14    44.98    30.00
                 trained loads       15    44.98    30.00
                 trained loads       16    44.98    30.00
                 trained loads       17    44.98    30.00
                 trained loads       18    29.99    20.00
                 trained loads       19    44.98    30.00
                 trained loads       20    44.98    30.00


** Running benchmark group Miscellaneous tests **
                     Benchmark   Cycles    Nanos
               32-bit add-loop     2.50     1.67
               64-bit add-loop     2.50     1.67
    Can port7 be used by loads     1.50     1.00
          Test micro-fused add     1.00     0.67
                 Add-JO fusion     1.00     0.67
                  Flag merge 1     1.24     0.83
                  Flag merge 2     1.17     0.78
                  Flag merge 3     1.24     0.83
           Loop weirdness fast     6.99     4.66

** Running benchmark group Fusion tests from dendibakh blog **
                     Benchmark   Cycles    Nanos
    Crosses 64-byte i-boundary   300.83   200.65
   No cross 64-byte i-boundary   173.95   116.02
              Fused (original)     1.38     0.92
           Fused (simple addr)     1.36     0.91
Fused (add [reg + reg * 4], 1)     1.38     0.92
          Fused (add [reg], 1)     1.36     0.91
            Unfused (original)     1.61     1.07
               Fused summation     2.15     1.44
             Unfused summation     1.63     1.08

** Running benchmark group BMI false-dependency tests **
                     Benchmark   Cycles    Nanos
          dest-dependent tzcnt     0.50     0.34
          dest-dependent lzcnt     0.25     0.17
         dest-dependent popcnt     0.25     0.17

** Running benchmark group retpoline tests **
                     Benchmark   Cycles    Nanos
   Dense retpoline call  pause    55.60    37.08
   Dense retpoline call lfence    55.48    37.01
     Dense indirect pred calls     4.15     2.77
   Dense indirect unpred calls    21.38    14.26
Sparse retpo indep call  pause    13.69     9.13
Sparse retpo indep call lfence    15.43    10.29
  Sparse retpo dep call  pause    46.79    31.21
  Sparse retpo dep call lfence    47.29    31.54

** Running benchmark group Tests written in C++ **
                     Benchmark   Cycles    Nanos
    Dependent inline divisions    16.99    11.33
    Dependent 64-bit divisions    16.99    11.33
  Independent inline divisions    14.53     9.69
         Independent divisions    14.53     9.69
       Linked-list w/ sentinel     9.74     6.49
          Linked-list w/ count    10.14     6.76

** Running benchmark group Vector unit bypass latency **
                     Benchmark   Cycles    Nanos
 movdqa [mem] -> paddb latency    10.99     7.33
 movdqu [mem] -> paddb latency    10.99     7.33
 movups [mem] -> paddb latency    10.99     7.33
 movupd [mem] -> paddb latency    10.99     7.33
 movq rax,xmm0 -> xmm0,rax lat     6.00     4.00
 movq rax,xmm0 -> xmm0,rax lat     6.00     4.00

** Running benchmark group Vector load-load latency **
                     Benchmark   Cycles    Nanos
      aligned  movdqu load lat     9.99     6.67
      aligned vmovdqu load lat     9.99     6.67
      aligned   lddqu load lat     9.99     6.67
      aligned  vlddqu load lat     9.99     6.67
   misaligned  movdqu load lat    10.99     7.33
   misaligned vmovdqu load lat    10.99     7.33
   misaligned   lddqu load lat    10.99     7.33
   misaligned  vlddqu load lat    10.99     7.33

** Running benchmark group Call/ret benchmarks **
                     Benchmark   Cycles    Nanos
            calls sparsed by 0     4.12     2.75
            calls sparsed by 1     4.19     2.79
            calls sparsed by 2     4.12     2.75
            calls sparsed by 3     4.25     2.83
            calls sparsed by 4     4.31     2.87
            calls sparsed by 5     5.00     3.33
            calls sparsed by 6     6.00     4.00
            calls sparsed by 7     7.00     4.67
            calls chained by 0     4.06     2.71
            calls chained by 1     4.06     2.71
            calls chained by 2     4.06     2.71
            calls chained by 3     4.06     2.71
           calls to pushpop fn     7.00     4.67
           calls to addrsp0 fn    13.99     9.33
           calls to addrsp8 fn    13.99     9.33

** Running benchmark group Oneshot Group **
dep-add-oneshot @ 0x0x494380
                     Benchmark   Sample   Cycles    Nanos
         Oneshot dep add chain        1     1.51     1.01
         Oneshot dep add chain        2     0.70     0.47
         Oneshot dep add chain        3     0.70     0.47
         Oneshot dep add chain        4     0.70     0.47
         Oneshot dep add chain        5     0.70     0.47
         Oneshot dep add chain        6     0.70     0.47
         Oneshot dep add chain        7     0.70     0.47
         Oneshot dep add chain        8     0.70     0.47
         Oneshot dep add chain        9     0.70     0.47
         Oneshot dep add chain       10     0.70     0.47
         Oneshot dep add chain       11     0.70     0.47
         Oneshot dep add chain       12     0.70     0.47
         Oneshot dep add chain       13     0.70     0.47
         Oneshot dep add chain       14     0.70     0.47
         Oneshot dep add chain       15     0.70     0.47
         Oneshot dep add chain       16     0.70     0.47
         Oneshot dep add chain       17     0.70     0.47
         Oneshot dep add chain       18     0.70     0.47
         Oneshot dep add chain       19     0.70     0.47
         Oneshot dep add chain       20     0.70     0.47

indep-add-oneshot @ 0x0x495ac0
                     Benchmark   Sample   Cycles    Nanos
       Oneshot indep add chain        1     2.51     1.68
       Oneshot indep add chain        2     0.19     0.12
       Oneshot indep add chain        3     0.26     0.18
       Oneshot indep add chain        4     0.22     0.15
       Oneshot indep add chain        5     0.22     0.15
       Oneshot indep add chain        6     0.22     0.15
       Oneshot indep add chain        7     0.22     0.15
       Oneshot indep add chain        8     0.26     0.18
       Oneshot indep add chain        9     0.22     0.15
       Oneshot indep add chain       10     0.22     0.15
       Oneshot indep add chain       11     0.22     0.15
       Oneshot indep add chain       12     0.22     0.15
       Oneshot indep add chain       13     0.26     0.18
       Oneshot indep add chain       14     0.22     0.15
       Oneshot indep add chain       15     0.22     0.15
       Oneshot indep add chain       16     0.22     0.15
       Oneshot indep add chain       17     0.22     0.15
       Oneshot indep add chain       18     0.26     0.18
       Oneshot indep add chain       19     0.22     0.15
       Oneshot indep add chain       20     0.22     0.15

dep-add128 @ 0x0x4941c0
                     Benchmark   Sample   Cycles    Nanos
128 dependent add instructions        1     3.98     2.66
128 dependent add instructions        2     0.59     0.39
128 dependent add instructions        3     0.59     0.39
128 dependent add instructions        4     0.59     0.39
128 dependent add instructions        5     0.59     0.39
128 dependent add instructions        6     0.59     0.39
128 dependent add instructions        7     0.59     0.39
128 dependent add instructions        8     0.59     0.39
128 dependent add instructions        9     0.70     0.47
128 dependent add instructions       10     0.70     0.47
128 dependent add instructions       11     0.70     0.47
128 dependent add instructions       12     0.70     0.47
128 dependent add instructions       13     0.70     0.47
128 dependent add instructions       14     0.70     0.47
128 dependent add instructions       15     0.70     0.47
128 dependent add instructions       16     0.70     0.47
128 dependent add instructions       17     0.59     0.39
128 dependent add instructions       18     0.59     0.39
128 dependent add instructions       19     0.59     0.39
128 dependent add instructions       20     0.59     0.39

oneshot-dummy-touch @ 0x0x494180
                     Benchmark   Sample   Cycles    Nanos
   Empty touched oneshot bench        1    44.98    30.00
   Empty touched oneshot bench        2   -14.99   -10.00
   Empty touched oneshot bench        3   -14.99   -10.00
   Empty touched oneshot bench        4     0.00     0.00
   Empty touched oneshot bench        5     0.00     0.00
   Empty touched oneshot bench        6   -14.99   -10.00
   Empty touched oneshot bench        7   -14.99   -10.00
   Empty touched oneshot bench        8     0.00     0.00
   Empty touched oneshot bench        9     0.00     0.00
   Empty touched oneshot bench       10   -14.99   -10.00
   Empty touched oneshot bench       11   -14.99   -10.00
   Empty touched oneshot bench       12     0.00     0.00
   Empty touched oneshot bench       13     0.00     0.00
   Empty touched oneshot bench       14   -14.99   -10.00
   Empty touched oneshot bench       15   -14.99   -10.00
   Empty touched oneshot bench       16     0.00     0.00
   Empty touched oneshot bench       17   -14.99   -10.00
   Empty touched oneshot bench       18   -14.99   -10.00
   Empty touched oneshot bench       19   -14.99   -10.00
   Empty touched oneshot bench       20     0.00     0.00

oneshot-dummy-notouch @ 0x0x494140
                     Benchmark   Sample   Cycles    Nanos
 Empty untouched oneshot bench        1    74.96    50.00
 Empty untouched oneshot bench        2   -14.99   -10.00
 Empty untouched oneshot bench        3     0.00     0.00
 Empty untouched oneshot bench        4     0.00     0.00
 Empty untouched oneshot bench        5   -14.99   -10.00
 Empty untouched oneshot bench        6   -14.99   -10.00
 Empty untouched oneshot bench        7     0.00     0.00
 Empty untouched oneshot bench        8     0.00     0.00
 Empty untouched oneshot bench        9   -14.99   -10.00
 Empty untouched oneshot bench       10   -14.99   -10.00
 Empty untouched oneshot bench       11     0.00     0.00
 Empty untouched oneshot bench       12   -14.99   -10.00
 Empty untouched oneshot bench       13   -14.99   -10.00
 Empty untouched oneshot bench       14   -14.99   -10.00
 Empty untouched oneshot bench       15     0.00     0.00
 Empty untouched oneshot bench       16   -14.99   -10.00
 Empty untouched oneshot bench       17   -14.99   -10.00
 Empty untouched oneshot bench       18     0.00     0.00
 Empty untouched oneshot bench       19     0.00     0.00
 Empty untouched oneshot bench       20   -14.99   -10.00
Clone this wiki locally