# 1. From `.cpp` to `.o` (object files)
Running `g++ -c main.cpp`...

## 1.1 Parsing & Translation
- Compiler ***parses*** source code into an **Abstract Syntax Tree** (AST)
- Templates, inline functions, etc, are expanded/instantiated

## 1.2 Code Generation
- Compiler ***translate***
    - **high-level contructs** 
    - into **assembly** instructions (e.g. `mov`, `add`, `jmp`)
    - for the **specific CPU**

## 1.3 Object File Structure: Structured Binary Format (SBF)
- Object file **isnt** just raw assembly:
    - it has binary machine code + metadata
- Object file wrapped into a **Structured Binary Format**:
    - `ELF` - Executable & Linkable Format - Linux, BSD
    - `COFF/PE` - Portable Executable - Windows
    - `Mach-O` - MacOS - macOS
- Each **SBF** contains:
    - `.text` - machine code for functions
    - `.data/.bss` - global/static variables
    - `relcation tables` - placeholders for addys not yet known (e.g. external functions)
    - `symbol tables` - fn & variable names (for linking/debugging)

### 1.3.1 Standard Formats: Executable Files
The .o (object) and final executables are not just random bytes.  

They follow a **binary format** that ***OS knows how to read***. 

The **loader program** built into kernel, knows how to parse the format, the format defines:

- where **machine code** is: `.text` section  
- where **initialised data** is: `.data` section
- where **unitialised data** is: `.bss` section
- what **shared libraries** are needed: e.g. `libc.so` or `msvcrt.dll`
- where **execution** starts: `entry point` e.g. `_start`

### 1.3.2 Loader (Inside the OS Kernel)

Run in Linux `./program`:
- Shell calls `execve() system call`: ask kernel replace current process image with a new program.
- Kernel reads executable file header:

### 2. ELF Loader
### 2.1 ELF Magic Numbers: First 4 Bytes
`ELF`, first 4 bytes are magic bytes tells OS it is an `ELF` file:

- `7f` `45` `4c` `46` or 
- `7F 'E' 'L' 'F'`

**Overall**: 

- Load code from offset `X` into mem_addy `Y`
- Map `n` pages of writable memory for `.data`
- Set `stack pointer` to here
- Start **execution** at addy `Z`


### 2.2 ELF Header: After First 4 Bytes

- At offset `0x18`, the loader reads the entry point address → that’s where it will set the instruction pointer (RIP) when starting the program.

- At offset `0x20`, it learns where the program header table is → this table describes what memory mappings to create.


- Fixed-Size Binary Struct, e.g. ELF64: 

| Offset | Size | Field                 | Meaning                                                |
| ------ | ---- | --------------------- | ------------------------------------------------------ |
| 0x00   | 4    | Magic (0x7FELF)       | File type marker                                       |
| 0x04   | 1    | Class                 | 32-bit vs 64-bit                                       |
| 0x05   | 1    | Endianness            | Little vs big endian                                   |
| 0x10   | 2    | Type                  | Relocatable (o), Executable, Shared library, Core dump |
| 0x12   | 2    | Machine               | e.g., x86-64 = `0x3E`                                  |
| 0x18   | 8    | Entry point           | Address where execution starts                         |
| 0x20   | 8    | Program header offset | Where to find load instructions                        |
| 0x28   | 8    | Section header offset | Where to find symbols/debug info                       |
| …      | …    | …                     | …                                                      |

### 2.2 Program Header Table
The program header table is like a recipe for the loader. Each entry describes one memory region to set up

| Field  | Meaning                                                  |
| ------ | -------------------------------------------------------- |
| Type   | Is this LOAD, DYNAMIC, INTERP, NOTE, etc.?               |
| Offset | File offset where this segment’s data lives              |
| Vaddr  | Virtual memory address to map it at                      |
| Filesz | How many bytes to copy from the file                     |
| Memsz  | How many total bytes in memory (extra space gets zeroed) |
| Flags  | Read/Write/Execute permissions                           |

So an entry might say:

“LOAD segment: take bytes 0x1000–0x1FFF from file, map them at virtual address 0x400000, mark them executable.” → that’s your .text (code) section.

Another might say: “Take bytes at offset 0x2000–0x2FFF, map to memory at 0x600000, mark writable.” → your .data section.

Another says: “Allocate memory of size 0x800 at 0x601000, zero it.” → your .bss section.


### 1.3.3.1 `PE`Loader - Windows
- `#!` or `PE` on Windows

In [28]:
# Misc Python
format(ord('E'),'x'),format(ord('L'),'x'),format(ord('F'),'x') # ltrs to hex
2**32 - 1 # (32-bit) 8 General Purpose Registers (GPRs) - 4gb or  4294967295 bits.
2**64 - 1 # (64-bit) 16 GPRs 64 bit int - 18446744073709551615, actually 48 lower bits used (still terabytes of addressable memory)

# Each register can be accessed in multiple sizes:
# RAX (64-bit), EAX (32-bit), AX (16-bit), AL (8-bit low), AH (8-bit high).
 
# ctrl 

18446744073709551615

# 2. Linking `.o` -> Executable
Running `g++ main.o other.o -o program`
- Linker resolves symbols: 
    - replace placeholders with real memory addresses (where `printf` lives in `libc`)
- Combines sections:
    - merges `.text`, `.data`, etc from all object files
- Produces executable:
    - runnable file with code, data & startup routine
        - `_start` -> ``main`

# 3. Running The Program
## 3.1 Loader (Part of OS)
- Load executable into memory
- Maps code (`.text`) into instruction memory ??
- Maps data (`.data/.bss`) into RAM
- Sets up **stack** & **heap**
- Passes control into entry point (`_start`)
## 3.2 Registers & Execution
## 3.3 Instruction Set & CPU Bits

