Memory-bound Problems

Important: these are instructor notes, remove this file before showing the materials to the students. The notes can be added after the lecture, of course.

Introduction

Puzzling issues (how swapping two nested for-loops makes out for a >4× slowdown
a more thorough benchmark is here (ignore the blue vertical lines for now)

A digression in CPU architecture and the memory hierarchy

Have a look at the historical evolution of speeds of different components in a computer:
- the CPU clock rate
- the memory (RAM) clock rate
- the memory (RAM) transfer rate
- some storage media access rates
the need for a hierarchical access to data for the CPU should be clear now
draw a sketch of the CPU/registers/L1-cache/L2-cache/L3-cache/RAM/Disk/Network hierarchy (something on the lines of this)
understand the trade-offs involved (speed, costs, heat dissipation, capacity)
try lstopo on my machine, talk about temporal and spacial locality of data
now go back to the Python benchmark and interpret the vertical blue lines
measure size and timings for the memory hierarchy on my machine with a low level C benchmark
explain the concepts of latency (also known as delay) and bandwidth (also known as speed or throughput) and their relationship:
- latency ≈ (time-when-data-available-on-output-pins – time-when-data-requested) #measured in ns, depends on clock ticks ⟶ clock_ticks × number-of-ticks-per-operation
- bandwidth ≈ clock_tick x data-transfer/tick x bus-width

Concluding remarks

it is difficult to “defeat” the cache and the CPU intelligence (think about registers for speculative execution and the latest news about Spectre and Meltdown vulnerabilities in modern Intel CPUs)
by the way, this is the reason why, for matrix multiplication, the clever ATLAS or openBLAS implementations are much faster than your manual C, or Cython or numba implementation ;-)
if you use a “proper” numpy installation, it is already linked with an optimized BLAS implementation
if there is time, you may want to explain what BLAS, LAPACK, ATLAS, openBLAS is all about, what numpy and scipy do about it, why life is so much easier in this case in the Linux world

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
benchmark_low_level		benchmark_low_level
benchmark_python		benchmark_python
speed		speed
README.md		README.md
puzzling_issues.ipynb		puzzling_issues.ipynb
spatial_locality.png		spatial_locality.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Memory-bound Problems

Introduction

A digression in CPU architecture and the memory hierarchy

Concluding remarks

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

ASPP/MemoryBoundComputations

Folders and files

Latest commit

History

Repository files navigation

Memory-bound Problems

Introduction

A digression in CPU architecture and the memory hierarchy

Concluding remarks

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages