# CS328 Lecture 5: Processor Architecture & Efficiency

## Benchmark problem

This file compares 6 different ways of computing the sum of positive elements in an array. 
Before you can execute the cells in this notebook, you must install:

1. **CMake**
   
   - On Linux, install it with the package manager of your distribution (apt,
     yum, ..). On Ubuntu/Debian, run "sudo apt install cmake".

   - On macOS, install it through Homebrew (https://brew.sh)

   - On Windows, install it via the official installer (http://cmake.org)

2. **A C++ compiler** (XCode on macOS, GCC/Clang on Linux, Visual Studio on Windows)

3. **nanobind**: To install it, run "python -m pip install nanobind" on the command line

Following this, run ``cmake`` or ``cmake-gui`` to generate a build file that you can compile with
your C++ compiler.

General information on compiling C++ projects via CMake on various platforms is
available [here](https://preshing.com/20170511/how-to-build-a-cmake-based-project/).

In [None]:
import numpy as np

In [None]:
# Generate a huge input array of random numbers (4 gigabytes)

n = 1024*1024*1024
x = np.float32(np.random.randn(n))

In [None]:
# Python implementation
def psum(x):
    r = 0
    for i in range(len(x)):
        if x[i] > 0:
            r += x[i]
    return r

In [None]:
# Let's see Python's internal representation of this function
import dis
dis.dis(psum)

In [None]:
# Warning: this cell takes a long time!
%time psum(x);

In [None]:
# NumPy implementation
%time np.sum(np.maximum(x, np.float32(0)));

In [None]:
# Now, let's import our Python extension library!
import psum as p

In [None]:
# Extremely naïve C++ implementation (scalar)
%time p.psum_0(x);

In [None]:
# Somewhat naïve C++ implementation (scalar)
%time p.psum_1(x);

In [None]:
# Vectorized implementation (OK)
%time p.psum_2(x);

In [None]:
# Vectorized implementation (Better)
%time p.psum_3(x);