# Optimizing

Running faster code.

## Vectorize

[Discrete signal energy](https://en.wikipedia.org/wiki/Energy_(signal_processing):

$$ E_{s} \ \ = \ \ \langle x(n), x(n)\rangle \ \  =  \sum_{n=-\infty}^{\infty}{|x(n)|^2}$$

[Discrete signal energy](https://en.wikipedia.org/wiki/Energy_(signal_processing):
$$ E_{s} \ \ = \ \ \langle x(n), x(n)\rangle \ \  =  \sum_{n=-\infty}^{\infty}{|x(n)|^2}$$
can be computed as a particular case of the [dot product](https://en.wikipedia.org/wiki/Dot_product):
$$ \langle x(n), y(n)\rangle \ \  =  \sum_{n=-\infty}^{\infty}{x(n)y(n)}$$
where both signals are the same.

In [None]:
import numpy as np

def non_vectorized_dot_product(x, y):
    """Return the sum of x[i] * y[j] for all pairs of indices i, j.

    Example:
    
        >>> my_dot_product(np.arange(20), np.arange(20))
    
    """
    result = 0
    for i in range(len(x)):
        result += x[i] * y[i]
    return result

signal = np.random.random(1000)

In [None]:
%timeit my_dot_product(signal, signal)

In [None]:
my_dot_product(signal, signal)

In [None]:
%timeit np.sum(signal*signal)

In [None]:
np.sum(signal*signal)

Another example:

In [None]:
# https://softwareengineering.stackexchange.com/questions/254475/how-do-i-move-away-from-the-for-loop-school-of-thought
def cleanup(x, missing=-1, value=0):
    """Return an array that's the same as x, except that where x ==
    missing, it has value instead.

    >>> cleanup(np.arange(-3, 3), value=10)
    ... # doctest: +NORMALIZE_WHITESPACE
    array([-3, -2, 10, 0, 1, 2])

    """
    result = []
    for i in range(len(x)):
        if x[i] == missing:
            result.append(value)
        else:
            result.append(x[i])
    return np.array(result)

array = np.arange(-8,8)
print(array)
print(cleanup(array, value=10, missing=0))

In [None]:
array = np.arange(-1000,1000)
%timeit cleanup(array, value=10, missing=0)
print(array[995:1006])
print(cleanup(array, value=10, missing=0)[995:1006])

In [None]:
# https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.where.html
# See "notes".
value = [10]*2000
%timeit [xv if c else yv for (c,xv,yv) in zip(array == 0, value, array)]
print([xv if c else yv for (c,xv,yv) in zip(array == 0, value, array)][995:1006])

In [None]:
# https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.where.html
%timeit np.where(array == 0, 10, array)
print(np.where(array == 0, 10, array)[995:1006])

## Use in-place operations

In [None]:
a = np.random.random(500000)
print(a[0:10])
b = np.copy(a)
%timeit global a; a = 10*a
a = 10*a
print(a[0:10])

In [None]:
a = np.copy(b)
print(a[0:10])
%timeit global a ; a *= 10
a *= 10
print(a[0:10])

## Maximize locality in memory acess

In [None]:
a = np.random.rand(100,50)
b = np.copy(a)

In [None]:
def mult(x, val):
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            x[i][j] /= val
%timeit -n 1 -r 1 mult(a, 10)

In [None]:
a = np.copy(b)

def mult2(x, val):
    for j in range(x.shape[1]):
        for i in range(x.shape[0]):
            x[i][j] /= val
            
%timeit -n 1 -r 1 mult2(a, 10)

In [None]:
# http://www.scipy-lectures.org/advanced/optimizing/
# https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.sum.html

In [None]:
c = np.zeros((500, 1000), order='C')

In [None]:
%timeit c.sum(axis=0)
c.sum(axis=0).shape

In [None]:
%timeit c.sum(axis=1)
c.sum(axis=1).shape

## Delegating in C
When you want to speed-up your code or simply when you need to reuse C code, it is possible to use from Python. There are several alternatives:

1. [Cython](http://cython.org/): A superset of Python to allow you call C functions and load Python variables with C ones. 
2. [SWIG (Simplified Wrapper Interface Generator)](http://www.swig.org/): A software development tool to connect C/C++ programs with other languages (included Python).
3. [Ctypes](http://python.net/crew/theller/ctypes/): A Python package that can be used to call shared libraries (`.ddl`/`.so`/`.dylib`) from Python.
4. [Python-C-API](https://docs.python.org/3.6/c-api/index.html): A low-level interface between (compiled) C code and Python.

We will show how to use Python-C-API because is the most flexible and efficient alternative. However, it is also the hardest to code.

### The C code to reuse in Python

In [None]:
!cat sum_array_lib.c

In [None]:
!cat sum_array.c

In [None]:
!gcc -O3 sum_array.c -o sum_array
!./sum_array

### The module

In [None]:
!cat sum_array_module.c

### Module compilation

In [None]:
!cat setup.py

In [None]:
!python setup.py build_ext --inplace

In [None]:
import sum_array_module
import numpy as np
a = np.arange(100000)
%timeit sum_array_module.sumArray(a)

However, remember: vectorize when possible!

In [None]:
%timeit np.sum(a)