Skip to content

A10 7850K with DDR3 at 2133 MHz

wyldckat edited this page Jan 1, 2016 · 7 revisions

Table of Contents

Introduction

This page registers the performance achieved with the AMD A10-7850K, sporting 2 DDR3 modules of 8GB each at 2133 MHz.

Notes
The use of mpirun is merely as a helper application. The avxtest* binaries are not running cooperatively.
Keep in mind that these results are not statistically balanced, since they are the result after a single run.

Runtimes

These were executed on Ubuntu 15.10 x86_64. Built with the native options:

    g++ -O3 -march=native avxtest.cpp -o avxtest
    g++ -O3 -march=native avxtest64.cpp -o avxtest64

1 core

32-bit:
   ./avxtest
  • x86:
    • Time taken (ms): 38771.457031
  • AVX:
    • Time taken (ms): 5620.1640625
64-bit:
   ./avxtest64
  • x86_64:
    • Time taken (ms): 35007.7949999999983
  • AVX:
    • Time taken (ms): 11375.2019999999993

2 cores

32-bit:
   mpirun -n 2 ./avxtest
  • x86:
    • Time taken (ms): 39284.503906
    • Time taken (ms): 39318.621094
  • AVX:
    • Time taken (ms): 5788.0839844
    • Time taken (ms): 5784.5478516
64-bit:
   mpirun -n 2 ./avxtest64
  • x86_64:
    • Time taken (ms): 35840.3660000000018
    • Time taken (ms): 35871.1630000000005
  • AVX:
    • Time taken (ms): 11422.7570000000014
    • Time taken (ms): 11434.7350000000006

4 cores

32-bit:
   mpirun -n 4 ./avxtest
  • x86:
    • Time taken (ms): 40336.046875
    • Time taken (ms): 40380.726562
    • Time taken (ms): 40352.101562
    • Time taken (ms): 40398.40625
  • AVX:
    • Time taken (ms): 9202.1962891
    • Time taken (ms): 9115.0634766
    • Time taken (ms): 9115.4482422
    • Time taken (ms): 9178.2919922
64-bit:
   mpirun -n 4 ./avxtest64
  • x86_64:
    • Time taken (ms): 36870.6850000000049
    • Time taken (ms): 36970.1229999999996
    • Time taken (ms): 37072.8969999999972
    • Time taken (ms): 37073.6549999999988
  • AVX:
    • Time taken (ms): 14274.6990000000005
    • Time taken (ms): 14295.7170000000006
    • Time taken (ms): 14362.3040000000001
    • Time taken (ms): 14448.5300000000007

Summary

Architecture/Mode 1 core 2 cores (std-dev) 4 cores (std-dev)
x86 (ms) 38771.457 39301.5625 (24.124) 40366.82031225 (28.016)
x86_64 (ms) 35007.795 35855.7645 (21.777) 36996.84 (97.150)
AVX float (ms) 5620.164 5786.315918 (2.500) 9152.750000025 (44.381)
AVX double (ms) 11375.202 11428.746 (8.470) 14345.3125 (78.291)
- - - -
Core frequency (MHz)
(cpufreq-aperf)
3959 (3811 AVX) 3885 (3774 AVX) 3885 (3663 AVX)
downscale ratio (c1/cx) 1 1.0190 (1.0098 AVX) 1.0190 (1.0404 AVX)
x86 1 1.0137 1.0411
x86_64 1 1.0242223053 1.0568172031
AVX float 1 1.0295635355 1.6285556792
AVX double 1 1.0047070812 1.2611039786

Inferences

  1. The downscale ratio on the x86/x86_64 calculations are somewhat within the expectable downscale range, although for 4 logical cores the scale given by cpufreq-aperf doesn't seem to correlate as well, possibly because the internal scheduler had to leverage operations between the logical and the real cores.
    • Had to revise the values based on cpufreq-aperf, because the frequency values at cpu-world.com weren't matching up all that well, which could be due to a lot of reasons.
  2. Using 2 logical cores and AVX calculations, it worked as expected and within the expectable downscale range.
  3. Using 4 logical cores and AVX calculations, isn't a clear picture as to why the performance downscale was so wide.
    • The AVX 64-bit operated within reason, i.e. 26% loss of performance as compared to using 2 real cores, is within the 50% expectable margin. Or at least is seems fairly optimized, since the 4 logical cores are essentially acting as schedulers for the 2 real AVX cores.
    • The AVX 32-bit did not operate within reason. The 63% loss of performance as compared to using 2 real cores, seems to imply that something wasn't properly optimized somewhere, either in the assembly code or in the internal CPU schedulers.
  4. For an additional reference, cpubenchmark.net gives an index of 5566 to this CPU.

Addendum

Tested building it with a custom GCC 5.3.0, to see if it was a problem with the compiler and there were no substantial performance improvements.