## The Fundamental Computer System
Three basic parts: the computing units, the memory units, and the communications layers between them.

### Computing Units

- IPC (Instruction Per Cycle): The number of operations it can done in one cycle (ex. vectorization, SIMD - single instruction, multiple data)
- Clock Speed: How many cycles it can do in one second
- Hyperthreading: A virtual second CPU to the host OS
 - Interleave two threads of instructions into the execution units on a single CPU (ex. floating-point operations and integer operations)
- Out-of-order execution: Not depend on the previous results (ex. waiting for a memory access)
- Multicore architecture
 - Amdahl's law: Some routines that must run on one core will be the bottleneck for the final speedup of adding more cores.
 - GIL (Global Interpreter Lock, one instruction at a time) in Python is a major hurdle with utilizing multiple cores
  - Even with 100 question askers, only one could ask a question at a time
  - Can be avoided by `multiprocessing`, `numexpr`, `Cython` or distributed computing 
  
### Memory Units

- Used to store bits: variables, the pixels of an image
- The read/write speed is depend on the way the data is being read
- *Sequential Read* is much better than *Random Data*
- Latency is big on the hard drive and is small on RAM
- The types of various memory units: Spinning hard drive, Solid stae hard drive, RAM, L1/L2 cache

### Communications Layers

- frontside bus: between RAM and the L1/L2 cache
- backside bus: from cache to CPU 
- external bus: the routes from hardware devices to the CPU and system memory
- GPU as a peripheral device communiates through the PCI bus.
- the network is limited than the frontside bus which transfers gigabits per second
- **bus width**: how much data can be moved in one transfer
- **bud frequencey**: how many transfer it can do per second

## Putting the Fundamental Elements Together

### Idealized Computing Versus the Python Virtual Machine

In [1]:
# checks if a number is prime
import math
def check_prime(number):
    sqrt_number = math.sqrt(number)
    number_float = float(number)
    for i in range(2, int(sqrt_number) + 1):
        if (number_float / i).is_integer():
            return False
    return True

print("check_prime(10000000) = ", check_prime(10000000))
print("check_prime(10000019) = ", check_prime(10000019))

check_prime(10000000) =  False
check_prime(10000019) =  True


- The value of `number` stored in RAM
- Send the value to the CPU to calculate `sqrt_number` and `number_float`
- Optimization: 
 - Reads from the L1/L2 cache: for minimizing the number of reads of the value of `number` from RAM
 - The faster backside bus: for minimizing the number of date transfers through the frontside bus
- Vectorizatoin:
 - send it both `number_float` and several values of `i` to check at the same time
 - devide them and check the result for each of the `number_float/i` pairs

In [None]:
# concept code
import math
def check_prime(number):
    sqrt_number = math.sqrt(number)
    number_float = float(number)
    numbers = range(2, int(sqrt_number) + 1)
    for i in range(0, len(numbers), 5):
        # the following line is not valid Python code
        result = (number_float / numbers[i:(i+5)]).is_integer()
        if any(result):
    return False

### Python's virtual machine

- A benefit of Python comes with a huge performance cost
 - ex) no worry about allocating memory for array, how to arrange that memory or in what sequence it is being sent

- `search_fast` is better than `search_slow` in performace
 - by skipping the unnecessary computations.even though both code is O(n)

In [None]:
def search_fast(haystack, needle):
    for item in haystack:
        if item == needle:
            return True
    return False

def search_slow(haystack, needle):
    return_value = False
    for item in haystack:
        if item == needle:
            return_value = True
    return return_value

- Python's abstraction
 - Python's GC creates memory fragmentation that can hurt the transfers to the CPU caches (ex. L1/L2)
 - No way to change the layout of a data structure optimatically and directly in memory
- Python is not dynamic types and not compiled
 - No chance to optimize its intructions or data structures with compiler.
 - Not compiled and code functionality which can be changed during runtime. 
- Because of GIL, only once core can be used at a time even in parallellized code.