Skip to content

ASPP/MemoryBoundComputations

Repository files navigation

Memory-bound Problems

Important: these are instructor notes, remove this file before showing the materials to the students. The notes can be added after the lecture, of course.

Introduction

  • Puzzling issues (how swapping two nested for-loops makes out for a >4× slowdown
  • a more thorough benchmark is here (ignore the blue vertical lines for now)

A digression in CPU architecture and the memory hierarchy

  • Have a look at the historical evolution of speeds of different components in a computer:
    • the CPU clock rate
    • the memory (RAM) clock rate
    • the memory (RAM) transfer rate
    • some storage media access rates
  • the need for a hierarchical access to data for the CPU should be clear now
  • draw a sketch of the CPU/registers/L1-cache/L2-cache/L3-cache/RAM/Disk/Network hierarchy (something on the lines of this)
  • understand the trade-offs involved (speed, costs, heat dissipation, capacity)
  • try lstopo on my machine, talk about temporal and spacial locality of data
  • now go back to the Python benchmark and interpret the vertical blue lines
  • measure size and timings for the memory hierarchy on my machine with a low level C benchmark
  • explain the concepts of latency (also known as delay) and bandwidth (also known as speed or throughput) and their relationship:
    • latency ≈ (time-when-data-available-on-output-pins – time-when-data-requested) #measured in ns, depends on clock ticks ⟶ clock_ticks × number-of-ticks-per-operation
    • bandwidth ≈ clock_tick x data-transfer/tick x bus-width

Concluding remarks

  • it is difficult to “defeat” the cache and the CPU intelligence (think about registers for speculative execution and the latest news about Spectre and Meltdown vulnerabilities in modern Intel CPUs)
  • by the way, this is the reason why, for matrix multiplication, the clever ATLAS or openBLAS implementations are much faster than your manual C, or Cython or numba implementation ;-)
  • if you use a “proper” numpy installation, it is already linked with an optimized BLAS implementation
  • if there is time, you may want to explain what BLAS, LAPACK, ATLAS, openBLAS is all about, what numpy and scipy do about it, why life is so much easier in this case in the Linux world

About

Materials for the Memory Bound Computations

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •