# 3. Vectorized operations

In [100]:
import numpy

## Basic operations on arrays with the same shape

The basic operations on arrays are applied elementwise.
The basic operations are addition, subtraction, multiplication, division and power.
The simplest case is when the shapes of the arrays are exactly the same, then an elementwise operation is straightforward. 

In [101]:
# basic operations between two arrays with the same shape:
x = numpy.array([10, 20, 30, 40])
y = numpy.array([5, 7, 52, 34])

print("y - x = ", y - x)
print("x + y = ", x + y)
print("x * y = ", x * y)
print("x / y = ", x / y)

y - x =  [ -5 -13  22  -6]
x + y =  [15 27 82 74]
x * y =  [  50  140 1560 1360]
x / y =  [ 2.          2.85714286  0.57692308  1.17647059]


## Basic operations on arrays with different shapes

Besides operations between arrays of the same shape, also operations between arrays of different shapes are allowed, but are not always possible. Operations on arrays with different shapes is often called broadcasting.

There are some different types of broadcasting:
- Basic operations between an array and a constant, then there are no restrictions on the shape.

- Basic operations between an array and a row vector, then the number of columns in the array has to be the same as the length of the row vector.

- Basic operations between an array and a column vector, then the number of rows in the array has to be the same as the length of the column vector.

When applying operations between an array and a row or column vector the shapes are still important.
For example, let $x$ be a $2\times 3$ array, let $y$ be a row vector with 3 elements, and let $z$ be a column vector with 2 elements.

When applying operations between array $x$ and row vector $y$, then the operations are applied for each row, and the number of column in the array has to be the same as the length of the row vector.

When applying operations between array $x$ and column vector $z$, then the operations are applied for each column, and the number of rows in the array has to be the same as the length of the column vector. 
When operations are applied between arrays of different shapes and the number of rows or columns is not the same, then this will return an error message.

For more information about Broadcasting:
http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

In [102]:
# constant term
x = numpy.array([20, 25, 30, 35])
print("x - 2 = ", x - 2)
print("x * 2 = ", x * 2)
print("x **2 = ", x**2)

x - 2 =  [18 23 28 33]
x * 2 =  [40 50 60 70]
x **2 =  [ 400  625  900 1225]


In [103]:
# operations between array and vector
x = numpy.array([[1, 2, 3], [4, 5, 6]])
y = numpy.array([5, 5, 5]) # row vector
z = numpy.array([[1], [2]]) # column vector

print(x)
print(y)
print(z)

[[1 2 3]
 [4 5 6]]
[5 5 5]
[[1]
 [2]]


In [104]:
# array and row vector
print("Operations between x and y which are applied for each row")
print("x + y = \n", x+y)
print("x * y = \n", x*y)

Operations between x and y which are applied for each row
x + y = 
 [[ 6  7  8]
 [ 9 10 11]]
x * y = 
 [[ 5 10 15]
 [20 25 30]]


In [105]:
# array and column vector
print("Operations between x and z which are applied for each column")
print("x + z = \n", x+z)
print("x * z = \n", x*z)

Operations between x and z which are applied for each column
x + z = 
 [[2 3 4]
 [6 7 8]]
x * z = 
 [[ 1  2  3]
 [ 8 10 12]]


## Vector transformations

Simple examples of operations on vectors are:

- Standardization: `z = (x - mean(x)) / stdev(x)`. Standardized values (z-scores) have zero mean and unit standard deviation. Standardization is often used before applying machine learning algorithms. 

- Feature scaling: `y = (x - min(x)) / (max(x) - min(x))`, this brings the score in the range 0 to 1.

- Convertion between different scales of measurements. Some examples: from Fahrenheit to Celsius, or from Dollars to Euros, or from Inches to Centimetres. 

These transformations can be applied to the whole array, but also to only one column or row. 
When applying to only one row or column of an array, then indexing can be used to indicate the vector.

### Exercise 3.1

Define function `standardize` which converts a vector of numbers to z-scores.


In [106]:
def standardize(x):
    return ...

### Exercise 3.2
- Define function `to_cm` which takes a vector of measurements in inches and converts them to centimeters.
- Define function `to_celsius` which takes a vector of measurements in Fahrenheit and converts them to Celsius: C = (F-32)/1.8


## Boolean operations on arrays

Boolean conditions can also applied to the arrays. They are applied to every element in the array. Several different conditions can be used, such as: equal to (==), not equal to (!=), greater than (>= or >), or smaller than (<= or <). 

In [107]:
# boolean operations on arrays
x = numpy.array([10, 20, 30, 14, 15, 16])
y = numpy.array([7, 5, 5, 7, 5, 7]) 
print("(x > 15) = ", x>15)
print("(y == 7) = ", y==7)

(x > 15) =  [False  True  True False False  True]
(y == 7) =  [ True False False  True False  True]


## Mathematical functions applied on vectors

A lot of mathematical functions can be applied to arrays and they are applied elementwise, such as:
- numpy.sqrt(x): square root
- numpy.sin(x): sine
- numpy.cos(x): cosine
- numpy.tan(x): tangent
- numpy.exp(x): exponential
- numpy.log(x): natural logarithm

In [108]:
x = numpy.array([1, 2, 3, 4])
print("x = ", x)
print("sqrt(x) = ", numpy.sqrt(x))
print("sin(x) = ", numpy.sin(x) )
print("cos(x) = ", numpy.cos(x) )
print("tan(x) = ", numpy.tan(x) )
print("exp(x) = ", numpy.exp(x) )
print("log(x) = ", numpy.log(x) )

x =  [1 2 3 4]
sqrt(x) =  [ 1.          1.41421356  1.73205081  2.        ]
sin(x) =  [ 0.84147098  0.90929743  0.14112001 -0.7568025 ]
cos(x) =  [ 0.54030231 -0.41614684 -0.9899925  -0.65364362]
tan(x) =  [ 1.55740772 -2.18503986 -0.14254654  1.15782128]
exp(x) =  [  2.71828183   7.3890561   20.08553692  54.59815003]
log(x) =  [ 0.          0.69314718  1.09861229  1.38629436]


## Reductions

Some functions can be applied to the entire array or to only one dimensio:

- x.sum() and numpy.cumsum(x)
- x.min() and x.argmin()
- x.max() and x.argmax()

These functions have a parameter which is called axis. When `axis=0` then sum per column (or the minimum etc) per column is returned. When `axis=1` then sum per row is returned. In higher dimensional arrays, the same logic applies. 

One important thing to notice is that when the `argmin` or `argmax` functions are applied, then the index of the minimum or maximum is returned, but this index is the linear index and not the index in all the dimensions (see  [2b_arrays.ipynb](2b_arrays.ipynb))


In [109]:
x = numpy.array([[1, 6, 5], [2, 7, 8]])

# functions applied to the entire array:
print("sum:", x.sum())
print("minimum:", x.min(), "and index of minimum:", x.argmin())
print("maximum:", x.max(), "and index of maximum:", x.argmax())

sum: 29
minimum: 1 and index of minimum: 0
maximum: 8 and index of maximum: 5


In [110]:
# functions applied to only one dimension of the array:
print("column sums:", x.sum(axis=0))
print("row sums:", x.sum(axis=1))
print("minimum per column:", x.min(axis=0))
print("maximum per row:", x.max(axis=1))

column sums: [ 3 13 13]
row sums: [12 17]
minimum per column: [1 6 5]
maximum per row: [6 8]


### Exercise 3.3
Define function `scale` which takes a vector of numbers and brings them to the range from 0 to 1:
$$\mathrm{scale}(x_i) = \frac{x_i - min(x)}{max(x) - min(x)}$$

In [111]:
def scale(x):
    return (x - x.min())/(x.max() - x.min())

### Exercise 3.4

The function `softmax` is often used in machine learning and statistics to convert a vector of arbitrary numbers into a vector of probabilities summing up to $1$. Softmax is computed by computing the exponential of each number, and then dividing each number by the sum of the exponentials:
$$ \mathrm{softmax}(x_i): \frac{\exp(x_i)}{\sum_{k=1}^N \exp(x_k)}$$

Implement the softmax function. Verify that in the resulting vector all number are between 0 and 1. Verify that the resulting numbers sum up to $1$.



In [112]:
def softmax(x):
    ...

## Sorting
The arrays can be sorted which is similiar as sorting lists in Python. The functions `sort` and `argsort` can be applied to arrays.
When applied to a 2-dimensional array the sort operation will apply per row and therefore also the indices are based on the position in the row.

In [113]:
# sorting an 1-dimensional array:
print("Applied to 1-dimensional array")
x = numpy.array([5, 3, 6, 2, 6, 8])
print("unsorted x:", x)
y = x.argsort()
x.sort()
print("sorted x: ", x)
print("indices of argsort:", y)

Applied to 1-dimensional array
unsorted x: [5 3 6 2 6 8]
sorted x:  [2 3 5 6 6 8]
indices of argsort: [3 1 0 2 4 5]


In [114]:
# sorting an 2-dimensional array:
print("Applied to 2-dimensional array")
x = numpy.array([[5, 3, 6], [2, 6, 8]])
print("unsorted x:", x)
y = x.argsort()
x.sort()
print("sorted x: ", x)
print("indices of argsort:", y)

Applied to 2-dimensional array
unsorted x: [[5 3 6]
 [2 6 8]]
sorted x:  [[3 5 6]
 [2 6 8]]
indices of argsort: [[1 0 2]
 [0 1 2]]


## Reversing

There is a special indexing syntax in `numpy` to obtain a view of the array in the reverse order. 

In [115]:
a = numpy.random.randint(0,10,5)
print(a)
print()
print(a[::-1])

[8 2 8 3 7]

[7 3 8 2 8]


### Exercise 3.5

The file `winequality-red.csv` contains measurements of wine samples, together with a quality rating. You can load this data into a structured array like this:

In [116]:
data = numpy.genfromtxt("winequality-red.csv", names=True, delimiter=';')

- Sort the data according to the quality rating, from lowest to highest
- Now sort the wines from highest to lowest

### Rounding 

If you want to round every element in the array then the following rounding functions can be used:
- numpy.round(x, decimals = 2 )
- numpy.floor(x)
- numpy.ceil(x)


In [117]:
# rounding 
x = 10*numpy.random.random((1,5))
print("not rounded:", x)

x1 = numpy.round(x, decimals = 2)
print("round:", x1)

x2 = numpy.floor(x)
print("floor:", x2)

x3 = numpy.ceil(x)
print("ceil:", x3)

not rounded: [[ 6.62056169  9.99372912  5.54082401  5.97427961  1.36261238]]
round: [[ 6.62  9.99  5.54  5.97  1.36]]
floor: [[ 6.  9.  5.  5.  1.]]
ceil: [[  7.  10.   6.   6.   2.]]


### Statistics

To apply some basic statistical functions to the numpy array x, the following functions can be useful:
- numpy.median(x) : median
- numpy.mean(x) : mean
- numpy.average(x, axis= , weights= ) : (weighted) average
- numpy.std(x) : standard deviation
- numpy.var(x) : variance
- numpy.cov(x) : covariance matrix
- numpy.corrcoef(x) : Pearson product-moment correlation coefficients

These functions can be applied to the entire array, or to only one axis. When applied to one axis, then the parameter axis can be used. Similar functions exists which ignore NAN, these functions are called: `nanmedian`, `nanmean`, `nanstd`, `nanvar`. 

For more statistical functions in numpy: http://docs.scipy.org/doc/numpy/reference/routines.statistics.html

### Exercise 3.6

Define function `print_summary` which takes a structured vector of numerical values and prints, for each column, basic statistics:

- name (name of the column in the input array)
- mean 
- median
- min (minimum value)
- max (maximum value)
- std (standard deviation)

For example:
```
column: fixed_acidity
mean: 8.31963727329581
median: 7.9
min: 4.6
max: 15.9
std: 1.7405518001102729
column: volatile_acidity
mean: 0.5278205128205128
median: 0.52
min: 0.12
max: 1.58
...
```

### Exercise 3.6b
Modify the above function so that it takes an additional argument where the user can specify the number of decimal digits to display. For example, print(data, decimals=2):
```
column: fixed_acidity
mean: 8.32
median: 7.9
min: 4.6
max: 15.9
std: 1.74
....
```


## Python modules

A Python module is a collection of reusable functions. You can create a module by putting some function definitions in a file with the extension `.py`. For example, put some of the functions you defined above in a file called `functions.py`. You can then use them from any notebook or other Python code by importing like this:

```python
from functions import * 
```
This will import all functions from this module, and they can be used directly.

The alternative is:

```python
import functions as F
```
where `F` is some shortened name. If your module have the function `scale`, you will then call it as `F.scale`.

Try this in a new notebook.


**For assignment 1 you will need to submit a Python module with a number of function definitions.** Make sure you understand this concept.