Important: these are instructor notes, remove this file before showing the materials to the students. The notes can be added after the lecture, of course.
- Puzzling issues (how swapping two nested for-loops makes out for a >4× slowdown
- a more thorough benchmark is here (ignore the blue vertical lines for now)
- Have a look at the historical evolution of speeds of different components in a computer:
- the CPU clock rate
- the memory (RAM) clock rate
- the memory (RAM) transfer rate
- some storage media access rates
- the need for a hierarchical access to data for the CPU should be clear now
- draw a sketch of the CPU/registers/L1-cache/L2-cache/L3-cache/RAM/Disk/Network hierarchy (something on the lines of this)
- understand the trade-offs involved (speed, costs, heat dissipation, capacity)
- try
lstopo
on my machine, talk about temporal and spacial locality of data - now go back to the Python benchmark and interpret the vertical blue lines
- measure size and timings for the memory hierarchy on my machine with a low level C benchmark
- explain the concepts of latency (also known as delay) and bandwidth (also known as speed or throughput) and their relationship:
latency ≈ (time-when-data-available-on-output-pins – time-when-data-requested) #measured in ns, depends on clock ticks ⟶ clock_ticks × number-of-ticks-per-operation
bandwidth ≈ clock_tick x data-transfer/tick x bus-width
- it is difficult to “defeat” the cache and the CPU intelligence (think about registers for speculative execution and the latest news about Spectre and Meltdown vulnerabilities in modern Intel CPUs)
- by the way, this is the reason why, for matrix multiplication, the clever ATLAS or openBLAS implementations are much faster than your manual C, or Cython or numba implementation ;-)
- if you use a “proper”
numpy
installation, it is already linked with an optimized BLAS implementation - if there is time, you may want to explain what BLAS, LAPACK, ATLAS, openBLAS is all about, what
numpy
andscipy
do about it, why life is so much easier in this case in the Linux world