# Memory management

Numerical software tends to use as much memory as a workstation has.  The memory has two major uses: (i) to hold the required huge amount of data, and (ii) to gain speed.

1. Linux memory model: stack, heap, and memory map
2. C memory management API
3. C++ memory management API
4. STL allocator API
3. Object counter

Modern computers use a hierarchical memory system.  Registers locate in the processor chip and are the fastest and scarest memory.  There is no additional cycle needed for the CPU to access the bits in registers.

Farther from the CPU, we have cache memory in multiple levels.  It takes 1 to 30 cycles to get data from cache memory to CPU, depending on the level.  Then we reach the main memory.  Data in main memory takes 50-200 cycles of latency before getting to CPU.

# Register, stack, heap, and memory map

All data in a computer program takes space in memory.  Depending on the usage, we will allocate them in different places.  The fundamental data types, the numbers to be crunched, will eventually go into the register file.  Temporary small objects are allocated on the stack.  Data to be shared among functions go to dynamically allocated memory.  Depending on the size, the memory manager may choose to use heap or memory map (mmap).

When talking about memory management, we usually mean dynamic memory management.  Large chunks of static memory in an executable image shouldn't be used.

# C dynamic memory

The C programming language defines 5 API for manipulate dynamic memory.

* `void * malloc(size_t size);`
* `void * calloc(size_t num, size_t size);`
* `void * realloc(void * ptr, size_t new_size);`
* `void free(void* ptr);`
* `void * aligned_alloc(size_t alignment, size_t size);`

For conveniece, we call a library or part of a library that implements the dynamic memory management APIs a memory manager.  Although we should focus on C++ code, it is crucial to know how a C memory manager works, because

1. C++ memory managers are implemented by C.
2. C memory management API can do what C++ cannot.
3. C memory managers sometimes are faster than C++.

In [None]:
!make cmem; echo "--- built; run:"; ./cmem

# C++ dynamic memory: new and delete

Objects in C++ have 4 storage durations:

1. static
2. thread
3. automatic
4. dynamic

The first 3 of them, static, thread, and automatic storage durations, are distinguished by the declarations.  The last one, dynamic storage duration, is managed by `operator new/delete` and our focus in memory management.

There are 3 frequent use cases of the `new/delete` expression:

1. Single object allocation.
2. Array allocation.
3. Placement new.

Precisely speaking, only the first two cases are fully related to memory management.  The third use case doesn't directly allocate or deallocate memory, but allows to use the `new/delete` expression for constructing objects on an already-allocated block of memory.

In [None]:
!make cppmem; echo "--- built; run:"; ./cppmem

# STL allocator

STL uses another set of template API for allocating the memory for most of its container.  By default, the STL containers use ``std::allocator`` class template for memory allocation.  We are allowed to provide custom allocators to the containers.

In [None]:
!make alloc; echo "--- built; run:"; ./alloc

# Instance counter

In some cases, we want to know how many intances are created of certain classes.  One quick way is to add an instance counter for the specific class.  We may immediately know the number at any given time point of the execution.

In [None]:
!make icount; echo "--- built; run:"; ./icount

# Exercises

1. Calling `delete` on the address returned by `new[]` may cause problems.  Write a program and analyze what the problems may be.
2. When using a single thread, what is the runtime overhead of the instance counting technique?  Write a program and analyze.