Skip to content
A collection of simple programs to demo simd and cache tricks
C++ Makefile
Branch: master
Clone or download
Type Name Latest commit message Commit time
Failed to load latest commit information.
src add legendre polynomial example Apr 13, 2018
Makefile add legendre polynomial example Apr 13, 2018

A collection of simple demos for SIMD and cache tutorials.

To demo simd tricks, cache blocking tricks, and more to be added.

To compile:

Set the three switches to 'yes' or 'no' at the beginning of Makefile. 'VECREPORT' works only with intel compilers


Demonstrates memcpy() vs simple loop. Compiler is smart enough to recognize the simple loop pattern and auto optimize the loop

No external dependence


Demonstrates different array operations. The simplest one could be the fasted because the double loop could be auto-switched by the compiler to significantly improve the reuse of data in cache.

No external dependence


Demonstrates different fast inverse square root routines.

No external dependence


Demonstrates cache blocking.

The Eigen dgemm function relies on eigen to compile.


Demonstrates the kernel sum performance.

The pvfmm kernel sum function relies pvfmm to compile.


Demonstrates the effect of HornerForm in polynomial evaluations.


  1. What Every Programmer Should Know About Memory
  2. Intel® 64 and IA-32 Architectures Optimization Reference Manual
  3. Intel instrinsics guide:
  4. Agner Fog’s website:
  5. Online Compiler Explorer:
You can’t perform that action at this time.