In [None]:
%matplotlib inline
%env CPATH /System/Library/Frameworks/Accelerate.framework/Frameworks/vecLib.framework/Headers

import traceback
import sys
import os

import numpy as np
import matplotlib.pyplot as plt
from IPython.core.magic import register_cell_magic
from IPython.display import display, Audio, Image

# C++ and computer architecture

C++ is for performance.  This lecture includes the fundamental knowledge for writing efficient code for numerical applications.

1. Fundamental data types
2. Object-oriented programming
3. Standard template library (STL)

## Why C++?

C++ is particularly difficult to learn and use.  If all we need is speed, why not invent an easier language that is equally fast?

Two reasons render the new-language approach incomplete.  Speed is first.  It's just difficult to invent something that is as fast as C++ in all aspects.  C++ is compatible to the language of operating system and accesses assembly easily.  Other languages may outperform C++ in some occasions, but not most.

Another reason is maintenance.  A numerical system usually takes years to develop.  When proven useful, it will be maintained for decades.  Such a long-living system needs a lot of support from the compiler, which is offered by C++.

# Fundamental data types

1. Use command-line tools: compiler, linker, header files, external libraries, shared objects
2. Integer: signness, pointer, array indexing
3. Floating-point: format, rounding, exception handling

# Compile and link

In [None]:
!rm -f helloworld.o
!ls helloworld.o

Compiler is a complex system.  The most common way to use it is to execute the compiler driver from the command-line.  For example, in GCC, the C++ compiler driver is `g++`.  It accepts a lot of command-line arguments.

When running the compiler with the argument `-c`, it takes the source code and output the object file.  The object file contains the machine code, but doesn't include the functions not defined in the source file.

The argument `-o` is used to specify the output file name.

In [None]:
# Compile source file to object file
!g++ -c helloworld.cpp -o helloworld.o
!file helloworld.o

In [None]:
# Object file isn't executable yet
!chmod a+x helloworld.o
!./helloworld.o

By dropping the `-c` argument, and supplying the object file as the input, `g++` will link the object file and necessary libraries into the executable.

In [None]:
# Link into executable; dropping -c
!g++ helloworld.o -o helloworld
!file helloworld

In [None]:
!./helloworld

## Separate compilation units

#### Header file

`hello.hpp`:

```cpp
#pragma once
void hello();
```

`hello.cpp`:

```cpp
#include <iostream>
#include "hello.hpp"
void hello() { std::cout << "hello with standalone compiling unit" << std::endl; }
```

In [None]:
# There's no "main()"; cannot link
!g++ hello.cpp -o hello

In [None]:
# hello.cpp needs to be built as an object file
!g++ -c hello.cpp -o hello.o

### Main program

`hellomain.hpp`:

```cpp
#include "hello.hpp"
int main(int argc, char ** argv) { hello(); }
```

In [None]:
# hellomain.cpp doesn't have everything it needs; cannot link
!g++ hellomain.cpp -o hellomain

### Link object files

In [None]:
# hellomain.cpp also needs to be built as an object file
!g++ -c hellomain.cpp -o hellomain.o

In [None]:
# Link into executable
!g++ hellomain.o hello.o -o hellomain

In [None]:
# Then it can run
!./hellomain

## Create and use a static library

In [None]:
!g++ -c hello.cpp -o hello.o
!ar rcs libhello.a hello.o
!g++ hellomain.o -L. -lhello -o hellomain2
!./hellomain2

## Create and use a shared object

In [None]:
!g++ -c hello.cpp -o hello.o
!g++ -shared hello.o -o libshared_hello.so
!g++ hellomain.o -L. -lshared_hello -o hellomain3
!./hellomain3

### External libraries are included in the same way

In [None]:
!g++ distance.cpp -o distance -lcblas
!./distance

# C++ integer types

Width of the fundamental types is specified by the standard to assist writing portable code.

* Boolean types: `bool`
* Integer (signed) types: `short` (16+ bits), `int` (16+ bits), `long` (32+ bits), `long long` (64+ bits)
  * Unsigned integer: `unsigned short` (16+ bits), `unsigned int` (16+ bits), `unsigned long` (32+ bits), `unsigned long long` (64+ bits)
* Character type: `char`, `unsigned char` (8 bits)

In [None]:
!g++ types.cpp -o types
!./types

## Fixed-width integer types

The C++ standard library provides the fixed-width (bit) integer types which are the same as C in the header file `<cstdint>`:

* Signed integer: `int8_t`, `int16_t`, `int32_t`, `int64_t`.
* Unsigned integer: `uint8_t`, `uint16_t`, `uint32_t`, `uint64_t`.

Fixed width matters for numerical code more than hardware architecture does.  It's easier for numerical code to break by changed width of indexing integer than by changed addressing space.

`cstdint.cpp`:

```cpp
#include <iostream>
#include <cstdint>
int main(int argc, char ** argv)
{
    std::cout << "unit is byte" << std::endl;
    std::cout << "sizeof(int8_t): " << sizeof(int8_t) << std::endl;
    std::cout << "sizeof(uint8_t): " << sizeof(uint8_t) << std::endl;
    std::cout << "sizeof(int16_t): " << sizeof(int16_t) << std::endl;
    std::cout << "sizeof(uint16_t): " << sizeof(uint16_t) << std::endl;
    std::cout << "sizeof(int32_t): " << sizeof(int32_t) << std::endl;
    std::cout << "sizeof(uint32_t): " << sizeof(uint32_t) << std::endl;
    std::cout << "sizeof(int64_t): " << sizeof(int64_t) << std::endl;
    std::cout << "sizeof(uint64_t): " << sizeof(uint64_t) << std::endl;
    return -1;
}```

In [None]:
!g++ cstdint.cpp -o cstdint
!./cstdint

## Signness

Care should be taken when signed and unsigned integers are both used in code.  Comparison result between signed and unsigned integers is sometimes surprising.

The common wisdom advises to not mixing signed and unsigned integer, but in numerical code negative indices are commonplace.  Be especially careful about the sign.

`signness.cpp`:
    
```cpp
#include <iostream>
#include <cstdint>
int main(int, char **)
{
    long sint = -1;
    unsigned long uint = 1;
    std::cout << "sint: " << sint << std::endl;
    std::cout << "uint: " << uint << std::endl;
    if (sint > uint) { std::cout << "sint > uint, although it can't be" << std::endl; }
    return -1;
}```

In [None]:
!g++ signness.cpp -o signness
!./signness

In [None]:
!g++ signness.cpp -Wsign-compare -Werror -o signness

## Pointer and array indexing

`arrays.cpp`:

```cpp
#include <iostream>
#include <cstdint>
int main(int, char **)
{
    int32_t data[100];
    int32_t * pdata = data;
    int32_t * odata = pdata + 50;
    for (size_t it=0; it<100; ++it) { data[it] = it + 5000; }
    std::cout << "data[10]: " << data[10] << std::endl;
    std::cout << "pdata[10]: " << pdata[10] << std::endl;
    std::cout << "*(data+20): " << *(data+20) << std::endl;
    std::cout << "*(pdata+20): " << *(pdata+20) << std::endl;
    std::cout << "data[50]: " << data[50] << std::endl;
    std::cout << "odata[0]: " << odata[0] << std::endl;
    std::cout << "data[40]: " << data[40] << std::endl;
    std::cout << "odata[-10]: " << odata[-10] << std::endl;
    std::cout << "*(data+40): " << *(data+40) << std::endl;
    std::cout << "*(odata-10): " << *(odata-10) << std::endl;
    return -1;
}```

In [None]:
!g++ arrays.cpp -o arrays -Wall -Wextra -Werror
!./arrays

# Floating-point

x86 architecture follows the IEEE 754-1985 standard for floating-point.  A floating-point value uses 3 fields to represent: sign, exponent (biased) (denoted by $p$), and fraction (denoted by $f$ and $f<1$).  The formula is:

$$
(1+f)_2 \times 2^p
$$

Note that the number is binary-based.

x86 follows IEEE 754 standard for floating-point.  There are two commonly used floating-point formats: single and double precision.  The C++ type names are `float` and `double`, respectively.

## Single-precision (`float`)

Single-precision floating-point value uses 32 bits (4 bytes).  The first 23 bits are fraction.  The following 8 bits are exponent.  The last (highest) bit is sign; 0 is positive while 1 is negative.  In C++, the type name is `float`.

Conisder a decimal number 2.75, which we use as an example to show how the get the fields.  Write it by using the base of 2:

\begin{align*}
%
2.75 &= 2 + \frac{1}{2} + \frac{1}{2^2} = 1\times2^1 + 0\times2^0 + 1\times2^{-1} + 1\times2^{-2} \\
&= (10.11)_2 = (1.011)_2 \times 2^1 .
%
\end{align*}

The bit fields for its IEEE 754 single-precision floating-point are:

| sign (1 bit)   | exponent (8 bits)    | fraction (23 bits)             |
|----------------|----------------------|--------------------------------|
| `0`            | `1000 0000`          | `011 0000 0000 0000 0000 0000` |

The exponent bias for single-precision floating-point is 127 ($(\mathtt{111 \, 1111})_2$).

The floating-point value is usually inexact.  For example, `0.3`, although it is rational, cannot be exactly represented as a single-precision floating-point.  Because the single-precision is 2-based, you should not follow the arithmic intuition learned from the 10-based number system.

In [None]:
!g++ float.cpp -o float
!./float

$(3)_{10} = (1.1)_2\times2^1$

## Double-precision (`double`)

Double-precision floating-point value uses 64 bits (8 bytes).  The first 52 bits are fraction.  The following 11 bits are exponent.  The last (highest) bit is sign; 0 is positive while 1 is negative.  In C++, the type name is `double`.

Use the same example of 2.75 for the double-precision floating-point.  Write $2.75 = (1.011)_2 \times 2^1$.  The exponent bias for single-precision floating-point is 1023 ($(\mathtt{11 \, 1111 \, 1111})_2$).  The bit fields are:

| sign (1 bit)   | exponent (11 bits)   | fraction (52 bits)                                                 |
|----------------|----------------------|--------------------------------------------------------------------|
| `0`            | `100 0000 0000`      | `0110 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000` |

Compared with the single-precision version:

| sign (1 bit)   | exponent (8 bits)    | fraction (23 bits)             |
|----------------|----------------------|--------------------------------|
| `0`            | `1000 0000`          | `011 0000 0000 0000 0000 0000` |

## Numeric limits

Both C and C++ provides constants for the value limit of each type.  In C++, the constants are available through include file `limits`.

```cpp
#include <iostream>
#include <cstdint>
#include <limits>
int main(int, char **)
{
    std::cout << "type\t\tlowest()\tmin()\t\tmax()\t\tepsilon()" << std::endl << std::endl;
    std::cout << "float\t\t"
              << std::numeric_limits<float>::lowest() << "\t"
              << std::numeric_limits<float>::min() << "\t"
              << std::numeric_limits<float>::max() << "\t"
              << std::numeric_limits<float>::epsilon() << "\t"
              << std::endl;
    std::cout << "double\t\t"
              << std::numeric_limits<double>::lowest() << "\t"
              << std::numeric_limits<double>::min() << "\t"
              << std::numeric_limits<double>::max() << "\t"
              << std::numeric_limits<double>::epsilon() << "\t"
              << std::endl;
    std::cout << "int32_t\t\t"
              << std::numeric_limits<int32_t>::lowest() << "\t"
              << std::numeric_limits<int32_t>::min() << "\t"
              << std::numeric_limits<int32_t>::max() << "\t"
              << std::numeric_limits<int32_t>::epsilon() << "\t"
              << std::endl;
    std::cout << "uint32_t\t"
              << std::numeric_limits<uint32_t>::lowest() << "\t\t"
              << std::numeric_limits<uint32_t>::min() << "\t\t"
              << std::numeric_limits<uint32_t>::max() << "\t"
              << std::numeric_limits<uint32_t>::epsilon() << "\t"
              << std::endl;
    std::cout << "int64_t\t\t"
              << std::numeric_limits<int64_t>::lowest() << "\t"
              << std::numeric_limits<int64_t>::min() << "\t"
              << std::numeric_limits<int64_t>::max() << "\t"
              << std::numeric_limits<int64_t>::epsilon() << "\t"
              << std::endl;
    std::cout << "uint64_t\t"
              << std::numeric_limits<uint64_t>::lowest() << "\t\t"
              << std::numeric_limits<uint64_t>::min() << "\t\t"
              << std::numeric_limits<uint64_t>::max() << "\t"
              << std::numeric_limits<uint64_t>::epsilon() << "\t"
              << std::endl;
    return -1;
}```

In [None]:
!g++ nlimits.cpp -o nlimits
!./nlimits

## Exception handling

The pragma "`#pragma STDC FENV_ACCESS ON`" turns on floating-point exception handling in CPU.  C++ defines the following floating-point exception that is supported by the hardware:

| macro | math error condition | description |
| -- | -- | -- |
| FE_DIVBYZERO | pole error | math result was infinite or undefined |
| FE_INEXACT | inexact result | rounding was required for the operation |
| FE_INVALID | domain error | the argument was outside the domain in which the math operation |
| FE_OVERFLOW | range error | the result was too large to be representable |
| FE_UNDERFLOW | range error | the result became subnormal due to loss of precision |
| FE_ALL_EXCEPT | -- | bitwise OR of all supported floating-point exceptions |

`fpexc.cpp`:

```cpp
#include <iostream>
#include <cfenv>
#include <cmath>
#include <limits>
int main(int, char **)
{
    float v1;

    feclearexcept(FE_ALL_EXCEPT);
    v1 = 0.3;
    std::cout << "result: " << v1/0 << std::endl;
    if (fetestexcept(FE_DIVBYZERO)) { std::cout << "  FE_DIVBYZERO" << std::endl; }

    feclearexcept(FE_ALL_EXCEPT);
    v1 = 2;
    std::cout << "std::sqrt(2): " << std::sqrt(v1) << std::endl;
    if (fetestexcept(FE_INEXACT)) { std::cout << "  FE_INEXACT" << std::endl; }

    feclearexcept(FE_ALL_EXCEPT);
    v1 = 2;
    std::cout << "std::acos(2): " << std::acos(v1) << std::endl;
    if (fetestexcept(FE_INVALID)) { std::cout << "  FE_INVALID" << std::endl; }

    feclearexcept(FE_ALL_EXCEPT);
    v1 = std::numeric_limits<float>::max();
    std::cout << "std::numeric_limits<float>::max() * 2: " << v1 * 2 << std::endl;
    if (fetestexcept(FE_OVERFLOW)) { std::cout << "  FE_OVERFLOW" << std::endl; }

    feclearexcept(FE_ALL_EXCEPT);
    v1 = std::numeric_limits<float>::min();
    std::cout << "std::numeric_limits<float>::min() / 10: " << v1 / 10 << std::endl;
    if (fetestexcept(FE_UNDERFLOW)) { std::cout << "  FE_UNDERFLOW" << std::endl; }

    return -1;
}```

In [None]:
!g++ fpexc.cpp -o fpexc
!./fpexc

# Object-oriented programming

1. Class, constructor and destructor
2. Data encapsulation, accessor, reference type
3. Polymorphism
   1. Dynamic type
4. CRTP

# Standard template library (STL)

1. std::vector, its internal and why the buffer address is dangerous
2. std::array, std::deque, std::list
3. std::map, std::unordered_map, std::set, std::unordered_set

# Homework

1. In `array.cpp`, what may happen if you write the following code?
```cpp
data[-1] = 0;
```
2. Given 2 single-precision floating-point values, 0.3 and -0.3.  Reinterpret (not integer to floating-point casting) their data (bits) as 32-bit unsigned integers.  What is the integer value after performing XOR of the two integers?  Change the floating-point values to 183.2 and -183.2.  What is the value after XOR again?