## Membership Testing

In [1]:
import numpy as np

Lists are implemented as variable-length arrays.

In [2]:
letters = 'abcdefghijklmnopqrstuvwxyz'

letters_list = [x+y+z for x in letters for y in letters for z in letters]
# Time how long it takes to find ‘aaa’ and 'zzz'in letters_list.

print('in list')
%timeit -n 100 'aaa' in letters_list
%timeit -n 100 'zzz' in letters_list

in list
46.9 ns ± 2.71 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
267 µs ± 1.28 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


Dictionaries are implemented by Hashing the key and storing the value

_Dicts_ and _sets_ are fast when looking up elements. Therefore, if you need to check membership very often, use _dict_ or _set_ rather than _list_ or _array_.

Be mindful of the datastructures you use as it can have a big impact on performance.

In [3]:
letters_dict = {x: x for x in letters_list}
# Time how long it takes to find ‘aaa’ and 'zzz'in letters_dict.

print('in dict')
%timeit -n 100 'aaa' in letters_dict
%timeit -n 100 'zzz' in letters_dict

in dict
44.2 ns ± 1.65 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
43.9 ns ± 1.85 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)


NumPy array membership testing:

In [4]:
x = np.random.random((3,3))
print(x)
print("Locations Found: ", np.where( x > 0.5 ))

[[0.19234096 0.17246625 0.17903226]
 [0.91333091 0.52795107 0.4519976 ]
 [0.78328263 0.5111211  0.12368523]]
Locations Found:  (array([1, 1, 2, 2]), array([0, 1, 0, 1]))


In [5]:
searchkeys = [0, 2, 5]
y = np.arange(9).reshape(3,3)
print(y)

[[0 1 2]
 [3 4 5]
 [6 7 8]]


In [6]:
ix = np.isin(y, searchkeys)
print(ix)
print(np.where(ix))

[[ True False  True]
 [False False  True]
 [False False False]]
(array([0, 0, 1]), array([0, 2, 2]))


Tuples membership testing:

In [7]:
a = (1,2,3)
print(3 in a)

# usually very quick

True


----
## String Concatenation

Strings in Python are immutable, so we can’t do something like, “change all the ‘a’s to ‘b’s” in any given string. Instead, you have to create a new string with the desired properties. This continual copying can lead to significant inefficiencies.

In [8]:
%%timeit -n 100
def make_string(a_list):
    mystring = ''
    for x in a_list:
        mystring += x + ' '
    return mystring

mylist = [x for x in 'abcdefghijklmnopqrstuvwxyz']

my_str = make_string(mylist)

3.65 µs ± 23.2 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [9]:
%%timeit -n 100

mylist = [x for x in 'abcdefghijklmnopqrstuvwxyz']

my_str = ' '.join(mylist)

1.12 µs ± 31 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)


----
## Optimizing a calculator
Considere the code below that implements a simple calculator. 

- Time the code to identify which functions are taking longer to run
- Optimize the code to speedup the most critical funcions
- Compute the speedup ratio as $\frac{T_{oringial}}{T_{optimized}}$

In [10]:
# -----------------------------------------------------------------------------
# calculator.py
# ----------------------------------------------------------------------------- 
import numpy as np

def add(x,y):
    """
    Add two arrays using a Python loop.
    x and y must be two-dimensional arrays of the same shape.
    """
    m,n = x.shape
    z = np.zeros((m,n))
    for i in range(m):
        for j in range(n):
            z[i,j] = x[i,j] + y[i,j]
    return z


def multiply(x,y):
    """
    Multiply two arrays using a Python loop.
    x and y must be two-dimensional arrays of the same shape.
    """
    m,n = x.shape
    z = np.zeros((m,n))
    for i in range(m):
        for j in range(n):
            z[i,j] = x[i,j] * y[i,j]
    return x*y # np.multiply(x,y) or np.dot(x,y)

def sqrt(x):
    """
    Take the square root of the elements of an arrays using a Python loop.
    """
    from math import sqrt
    m,n = x.shape
    z = np.zeros((m,n))
    for i in range(m):
        for j in range(n):
            z[i,j] = sqrt(x[i,j])
    return z


def hypotenuse(x,y):
    """
    Return sqrt(x**2 + y**2) for two arrays, a and b.
    x and y must be two-dimensional arrays of the same shape.
    """
    xx = multiply(x,x)
    yy = multiply(y,y)
    zz = add(xx, yy)
    return sqrt(zz)

M = 1000
N = 1000

A = np.random.random((M,N))
B = np.random.random((M,N))

%timeit -n 1 hypotenuse(A,B)

1.64 s ± 6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [11]:
# -----------------------------------------------------------------------------
# calculator.py
# ----------------------------------------------------------------------------- 
import numpy as np

def mod_add(x,y):
    """
    Add two arrays using a Python loop.
    x and y must be two-dimensional arrays of the same shape.
    """
    return np.add(x,y)


def mod_multiply(x,y):
    """
    Multiply two arrays using a Python loop.
    x and y must be two-dimensional arrays of the same shape.
    """
    return np.multiply(x,y)

def mod_sqrt(x):
    """
    Take the square root of the elements of an arrays using a Python loop.
    """
    return np.sqrt(x)


def mod_hypotenuse(x,y):
    """
    Return sqrt(x**2 + y**2) for two arrays, a and b.
    x and y must be two-dimensional arrays of the same shape.
    """
    xx = mod_multiply(x,x)
    yy = mod_multiply(y,y)
    zz = mod_add(xx, yy)
    return mod_sqrt(zz)

# M = 1000
# N = 1000

mod_A = np.random.random((M,N))
mod_B = np.random.random((M,N))

%timeit -n 1 mod_hypotenuse(A,B)

6.54 ms ± 1.78 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
