# Computation on NumPy Arrays: Universal Functions

Computation on NumPy arrays can be very fast, or it can be very slow. 

The key to making it fast is to use __vectorized__ operations, generally implemented through NumPy's universal functions (ufuncs). 

The relative sluggishness of Python generally manifests itself in situations where many small operations are being repeated – for instance looping over arrays to operate on each element. 

For example, imagine we have an array of values and we'd like to compute the reciprocal of each. 

A straightforward approach might look like this:

In [4]:
import numpy as np
np.random.seed(0)

In [5]:
def compute_reciprocals(values):
    output = np.empty(len(values))
    
    for i in range(len(values)):
        output[i] = 1.0 / values[i]
        
    return output

In [7]:
values = np.random.randint(1, 10, size=5)
values

array([4, 6, 3, 5, 8])

In [8]:
compute_reciprocals(values)

array([0.25      , 0.16666667, 0.33333333, 0.2       , 0.125     ])

In [9]:
big_array = np.random.randint(1, 100, size=1000000)

%timeit compute_reciprocals(big_array)

4.11 s ± 48.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


It takes several seconds to compute these million operations and to store the result! When even cell phones have processing speeds measured in Giga-FLOPS (i.e., billions of numerical operations per second), this seems almost absurdly slow.

## UFuncs

For many types of operations, NumPy provides a convenient interface into just this kind of statically typed, compiled routine. This is known as a __vectorized__ operation. 

This can be accomplished by simply performing an operation on the array, which will then be applied to each element. 

This vectorized approach is designed to push the loop into the compiled layer that underlies NumPy, leading to much faster execution.

In [10]:
print(compute_reciprocals(values))
print(1.0 / values)

[0.25       0.16666667 0.33333333 0.2        0.125     ]
[0.25       0.16666667 0.33333333 0.2        0.125     ]


In [11]:
%timeit (1.0 / big_array)

7.19 ms ± 111 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [16]:
%%time
(1.0 / big_array)

Wall time: 12 ms


array([0.01408451, 0.01123596, 0.01123596, ..., 0.01219512, 0.01724138,
       0.01408451])

In [17]:
np.arange(5) / np.arange(1, 6)

array([0.        , 0.5       , 0.66666667, 0.75      , 0.8       ])

In [18]:
x = np.arange(9).reshape((3, 3))
2 ** x

array([[  1,   2,   4],
       [  8,  16,  32],
       [ 64, 128, 256]], dtype=int32)

Ufuncs exist in two flavors: unary ufuncs, which operate on a single input, and binary ufuncs, which operate on two inputs.

In [19]:
x = np.arange(4)
print("x     =", x)
print("x + 5 =", x + 5)
print("x - 5 =", x - 5)
print("x * 2 =", x * 2)
print("x / 2 =", x / 2)
print("x // 2 =", x // 2)  # floor division

x     = [0 1 2 3]
x + 5 = [5 6 7 8]
x - 5 = [-5 -4 -3 -2]
x * 2 = [0 2 4 6]
x / 2 = [0.  0.5 1.  1.5]
x // 2 = [0 0 1 1]


In [20]:
print("-x     = ", -x)
print("x ** 2 = ", x ** 2)
print("x % 2  = ", x % 2)

-x     =  [ 0 -1 -2 -3]
x ** 2 =  [0 1 4 9]
x % 2  =  [0 1 0 1]


In [21]:
-(0.5*x + 1) ** 2

array([-1.  , -2.25, -4.  , -6.25])

Each of these arithmetic operations are simply convenient __wrappers__ around specific functions built into NumPy; 

for example, the + operator is a wrapper for the __add__ function:

In [22]:
np.add(x, 2)

array([2, 3, 4, 5])

Operator	Equivalent ufunc	Description

    +	np.add	Addition (e.g., 1 + 1 = 2)
    -	np.subtract	Subtraction (e.g., 3 - 2 = 1)
    -	np.negative	Unary negation (e.g., -2)
    *	np.multiply	Multiplication (e.g., 2 * 3 = 6)
    /	np.divide	Division (e.g., 3 / 2 = 1.5)
    //	np.floor_divide	Floor division (e.g., 3 // 2 = 1)
    **	np.power	Exponentiation (e.g., 2 ** 3 = 8)
    %	np.mod	Modulus/remainder (e.g., 9 % 4 = 1)

In [23]:
x = np.array([-2, -1, 0, 1, 2])

np.absolute(x)

array([2, 1, 0, 1, 2])

In [24]:
np.abs(x)

array([2, 1, 0, 1, 2])

This ufunc can also handle complex data, in which the absolute value returns the magnitude:

In [25]:
x = np.array([3 - 4j, 4 - 3j, 2 + 0j, 0 + 1j])
np.abs(x)

array([5., 5., 2., 1.])

## Trigonometric functions

NumPy provides a large number of useful ufuncs, and some of the most useful for the data scientist are the trigonometric functions. 

In [26]:
np.pi

3.141592653589793

In [30]:
theta = np.linspace(0, 180, 5)
theta

array([  0.,  45.,  90., 135., 180.])

In [31]:
print("theta      = ", theta)
print("sin(theta) = ", np.sin(theta))
print("cos(theta) = ", np.cos(theta))
print("tan(theta) = ", np.tan(theta))

theta      =  [  0.  45.  90. 135. 180.]
sin(theta) =  [ 0.          0.85090352  0.89399666  0.08836869 -0.80115264]
cos(theta) =  [ 1.          0.52532199 -0.44807362 -0.99608784 -0.59846007]
tan(theta) =  [ 0.          1.61977519 -1.99520041 -0.08871576  1.33869021]


## Exponents and logarithms

In [32]:
x = [1, 2, 3]
print("x     =", x)
print("e^x   =", np.exp(x))
print("2^x   =", np.exp2(x))
print("3^x   =", np.power(3, x))

x     = [1, 2, 3]
e^x   = [ 2.71828183  7.3890561  20.08553692]
2^x   = [2. 4. 8.]
3^x   = [ 3  9 27]


In [33]:
x = [1, 2, 4, 10]
print("x        =", x)
print("ln(x)    =", np.log(x))
print("log2(x)  =", np.log2(x))
print("log10(x) =", np.log10(x))

x        = [1, 2, 4, 10]
ln(x)    = [0.         0.69314718 1.38629436 2.30258509]
log2(x)  = [0.         1.         2.         3.32192809]
log10(x) = [0.         0.30103    0.60205999 1.        ]
