# Numpy -  multidimensional data arrays

`numpy` is a python package that provides high-performance multidimensional numeric data structures for Python. It is implemented in C and Fortran so and all calculations are vectorized (formulated with vectors and matrices) for achieving performance. The key data structure is the n-dimensional array, also called [numpy.ndarray](http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html). `numpy` also provides other useful routines and functions to deal with array-like data structures. 

In [None]:
import numpy

## Creating `numpy` arrays

The easiest way to create a `numpy` array is from an existing list or tuple using the `numpy.array` function. Simple vectors are created easily using a list, while matrices and higher dimensional arrays can be created using nested lists. Note that the elements of a numpy array can only be of numeric types, i.e., `int`, `float` or `complex`.

In [None]:
# Creating a vector
a = numpy.array([1,2,3,4])
b = numpy.array([[1, 2], [3, 4]])

numpy provides a much larger selection of numeric types than python. You can pass in a second argument to `numpy.array` that sets the numeric type of the data in the array. The second argument can also be explicitly defined using the `dtype` keyword. The available types are listed in the [numpy documentation](http://docs.scipy.org/doc/numpy/user/basics.types.html). The most common ones have some single character shortcuts, e.g. 'd' (double precision floating point number), and 'i' (int32). 

In [None]:
a1 = array([1,2,3,4,5,6],'d')
print(a1)

In [None]:
M = array([[1, 2], [3, 4]], dtype=complex)
print(M)

## Properties of the numpy arrays

The `numpy.ndarray` object defines some useful properties such as `ndarray.size` (total number of elements in the array), `ndarray.shape` (dimensionality), and `ndarray.dtype` (data type of each element). Note that in the above examples, the objects `a`, `b` and `a1` are all of the type `numpy.ndarray`, but differ in their shape and size properties.

In [None]:
# object type
print(type(a),type(b),type(a1))

In [None]:
# dimensionality of array
print(a.shape,b.shape,a1.shape)

In [None]:
# total number of elements in array
print(a.size,b.size,a1.size)

In [None]:
# data type of elements
print(a.dtype,b.dtype,a1.dtype)

In [None]:
# number of dimensions
print(b.ndim) 

In [None]:
# number of bytes
print(b.nbytes)

In [None]:
# bytes per element
print(b.itemsize)

### Useful functions for creating `numpy` arrays

`numpy` also provides many functions that can generates arrays of different forms/patterns. Here are a few examples

#### arange

In [None]:
# create a range - numpy.arange(start,end,step)
# note that end is not included in the range
x = numpy.arange(0, 10, 1) 
print(x)

In [None]:
x = numpy.arange(-1, 1, 0.1)
print(x)

#### linspace and logspace

In [None]:
# equally spaced numbers - numpy.linspace(start,end,number of points)
# note that end is included in the interval
numpy.linspace(0, 10, 25)

In [None]:
# numbers equally spaced on a logscale - numpy.logspace(start,end,number of points, base)
# note that end is included in the interval. If the base keyword argument is not specified, it defaults to base10
numpy.logspace(0, 10, 10, base=e)

### mgrid

In [None]:
# Generate a grid from base vectors
# similar to the meshgrid command in MATLAB
x, y = numpy.mgrid[0:5, 0:5] 

In [None]:
print("x = {0} \n\nand\n\ny = {1}".format(x,y))

#### random data
`numpy` also provides a random module that has functions to generate random numbers

In [None]:
# generate a 3x3 matrix of uniform random numbers in [0,1]
numpy.random.rand(3,3)

In [None]:
# generate a 3x3 matrix of standard normal distributed random numbers
random.randn(3,3)

#### zeros and ones

In [None]:
numpy.zeros((3,3))

In [None]:
numpy.ones((3,3))

In [None]:
numpy.identity(4,'d')

In [None]:
numpy.eye(3)

## Advantages of `numpy.ndarray`
* Numpy arrays are statically typed, i.e., the type of the elements is determined when array is created making them much more efficient than python lists. Static typing of the array also allows for implementation of various algorithms in a fast compiled language such as C/Fortran.
* Improved memory efficiency over python lists.
* Easy integration with existing optimized implementations of mathematical functions and libraries such as BLAS/ATLAS


### Type casting

Since Numpy arrays are *statically typed*, the type of an array does not change once created. But we can explicitly cast an array of some type to another using the `astype` functions (see also the similar `asarray` function). This always create a new array of new type:

In [None]:
a = numpy.eye(3)
print(a.dtype)
b = a.astype(bool)
print(b.dtype)

## Manipulating arrays

Similar to python lists, numpy arrays are indexed using the `[]` notation. Indexing begins from 0.

In [None]:
# b is a vector, and has only one dimension, taking one index
print(a[0])

In [None]:
# b is a 2x2 matrix that takes two indices 
print(b[1,1])

If we omit one of the dimensions of a multidimensional array, the N-1 dimensional array is returned.

In [None]:
# b is a 2x2 matrix, b[1] refers to the entire second row
print(b[1])
print(b[1,])
print(b[1,:])
# b[:,1] refers to the second column
print(b[:,1])

numpy arrays are mutable, so we can assign new values to individual elements by explicitly using the indexing:

In [None]:
b[0,0] = 5
print(b)

In [None]:
# assigning entire rows or columns with the same number 
b[1,:] = 0
print(b)
b[:,1] = -1
print(b)

Similar to that in lists or strings, numpy arrays can be sliced using the `[lower:upper:step]` syntax. Note that array slices are mutable, so any change you make to them will reflect on the original array from which the slice was created. The reason for this is because of python does not create a new "variable" but rather creates an object reference to the slice. 

In [None]:
A = array([1,2,3,4,5])
print(a[1:3])

In [None]:
a[1:3] = [-2,-3]
print(a)

Exactly like python lists or strings, negative indices count from the end of the array.

In [None]:
a = array([1,2,3,4,5])
print(a[-3:])

In case of multidimensional arrays, slicing works exactly the same way for each dimension

In [None]:
a = array([[n+m for n in range(5)] for m in range(5)])
print(a[1:3, 1:3])

In [None]:
print(a[::2, ::2])

You can also use another array/list of indices to slice arrays. This "fancy indexing" feature is quite useful to extract array slices based on conditional expressions. 

In [None]:
rows = [0, 1, 2]
print(a[row_indices])

In [None]:
cols = [0, 2, -1] 
print(a[:, col_indices]

We can also index masks: If the index mask is an Numpy array of with data type `bool`, then an element is selected (True) or not (False) depending on the value of the index mask at the position each element. 

In [None]:
b = array([n for n in range(5)])
mask = array([True, False, True, False, False])
print(b[mask])

The index mask highlighted above can be converted to position index using the `where` function. The mask feature is also really useful to conditionally extract elements from an array.

In [None]:
b = numpy.arange(0, 10, 0.5)
mask = (5 < b) * (b < 7.5)
indices = numpy.where(mask)
print(b[mask])
print(indices)

In [None]:
print(x[indices]) 

## Linear algebra

### Scalar operations

Simple scalar operations are performed using the standard arithmetic operators.  

In [None]:
v1 = numpy.arange(0,5)
m1 = numpy.ones(4,4)
m2 = numpy.ones(4,4)*3.5

In [None]:
print(v1+2)

In [None]:
print(v1*2)

In [None]:
print(m1*2)

### Element-wise array operations

Unlike MATLAB, the default behavior of arithmetic operators when acting on 2 `numpy` arrays is to perform elementwise operations. The dimensions of the two array operands must be either exactly the same, or must be compatible in the matrix-multiplication sense. 

In [None]:
print(m1*m2)

In [None]:
v1 * v1

If we multiply arrays with different but compatible shapes, we get an element-wise multiplication of each row:

In [None]:
print(m1.shape,v1.shape)

In [None]:
m1*v1

### Matrix algebra

`numpy` provides a host of functions to perform various matrix operations.

In [None]:
# matrix multiplication
print(numpy.dot(m1,m2))

In [None]:
# inner product
print(numpy.dot(v1,v1))

In [None]:
print(numpy.dot(v1,m1))

Numpy also provides a special type for two dimensional arrays called `matrix`. We can typecast any strictly 2-D numpy array to `matrix`-type. Note that this typecasting changes the behavior of the standard arithmetic operators `+, -, *` to perform matrix algebra instead of element-wise operations. The `matrix`-type also has some special attributes that makes it easy to work with

In [None]:
mmat = numpy.matrix(m1)


In [None]:
print(mmat)

# Transpose
print(mmat.T)

# Conjugate transpose
print(mmat.H)

# Inverse
print(mmat.I)

In [None]:
print(mmat*mmat)

In [None]:
# Convert from row to column vector
vmat = numpy.matrix(v1).T

In [None]:
# Inner product in matrix notation
vmat.T * vmat

In [None]:
# Standard matrix algebra with column vectors and matrices
mmat*vmat + vmat 

Some other really useful functions for matrix operations are - `numpy.cross`, `numpy.inv`, `numpy.det`, `numpy.kron`, `numpy.tensordot`.

### Statistics

Numpy provides a number of functions to calculate statistics of data that is stored in arrays.

In [None]:
m = numpy.random.rand(1000,5)*2-1

#### mean

In [None]:
# the temperature data is in column 3
numpy.mean(m[:,3])

#### standard deviations and variance

In [None]:
numpy.std(m[:,3]), numpy.var(m[:,3])

#### min and max

In [None]:
# lowest daily average temperature
m[:,3].min

In [None]:
# highest daily average temperature
m[:,3].max

Using the `axis` argument we can perform row-wise or column-wise analysis.

In [None]:
# global max
m.max()

In [None]:
# max in each column
m.max(axis=0)

In [None]:
# max in each row
m.max(axis=1)

#### sum, prod, and trace

In [None]:
a = numpy.arange(0, 5)
print(a)

In [None]:
# sum up all elements
print(numpy.sum(a))
print(a.sum())

In [None]:
# sum up along a particular axis
b = numpy.array([[ i-j for i in range(3) ] for j in range(3) ]) + numpy.eye(3)*numpy.array([1,2,3])
print(b)
print("column sum = {0}".format(numpy.sum(b,axis=0)))
print("row sum    = {0}".format(numpy.sum(b,axis=1)))

In [None]:
# product of all elements
print(numpy.prod(a+1))
print((a+1).prod())

In [None]:
# trace
print(numpy.trace(b))

## Reshaping, resizing and stacking arrays

The shape of an numpy array can be modified without copying the data into a new array. This is very convenient and also makes it really quicj to performing certain analysis even on large data sets. Note that arrays are mutable.

### Reshaping

In [None]:
A = numpy.random.rand(5,5)
print(A)

In [None]:
n, m = A.shape

In [None]:
B = A.reshape((1,n*m))
print(B)

In [None]:
# Modify the array - note that array slices are mutable
B[0,0:5] = 5 
print(B)

In [None]:
# A is also changed, since B is not strictly a copy of A
A 

There are few other functions/methods that can be used to reshape arrays that create a copy of the data. For example, we can convert a higher dimension array to a vector using the `flatten` method. Since a copy is being created, this operation could be fast/slow depending on the size of the original array.

In [None]:
B = A.flatten()
print(A)

In [None]:
B[0:5] = 10
print(A)

### tile and repeat

In [None]:
a = numpy.array([[1, 2], [3, 4]])

In [None]:
# repeat each element 3 times
numpy.repeat(a, 3)

In [None]:
# tile the matrix 3 times 
numpy.tile(a, 3)

### concatenate

In [None]:
b = numpy.array([[5, 6]])

In [None]:
numpy.concatenate((a, b), axis=0)

In [None]:
numpy.concatenate((a, b.T), axis=1)

### hstack and vstack

In [None]:
numpy.vstack((a,b))

In [None]:
numpy.hstack((a,b.T))

## Copying arrays

Remember that assignment statements in Python do not copy the object to a new memory location, rather a new reference to the same object is created. While this improved code performance, it can potentially be a source of bugs in many code. This is true in the case of numpy arrays as well.

In [None]:
a = array([[1, 2], [3, 4]])
b = a 
b[0,0] = 5 
print(a)

If we really want a copy of of an array that then we need to use numpy's copy function

In [None]:
b = numpy.copy(a)
b[0,0] = 1
print(b)
print(a)

## `any()` and `all()` methods

Numpy provides two array methods `any` or `all` that allow arryas to be used with conditional statements.

In [None]:
M = numpy.random.rand()*10
print(M)

In [None]:
# any() - will evaluate to True if atleast one element in M satisfies the condition
if (M > 5).any():
    print("At least one element is > 5")
else:
    print("All elements are <= 5")

In [None]:
# all() - will evaluate to True iff all elements in M satisfy the condition
if (M > 5).all():
    print("All elements are > 5")
else:
    print("Atleast one element is <= 5")

## Further Reading

* http://numpy.scipy.org
* http://scipy.org/Tentative_NumPy_Tutorial
* http://scipy.org/NumPy_for_Matlab_Users - A Numpy guide for MATLAB users.