# Numpy Arrays

## What is Numpy?

The Numpy package is the foundation of most scientific computations performed in Python. It is implemented in C/Fortran, so performance is greatly improved over native Python data types.  Numpy provides:

- the **ndarray** (n dimensinal array), numpy's primary object
- fast array operations
- large library of linear algebra procedures
- and much, much, more...

It is customary to import Numpy as

In [1]:
import numpy as np

## The ndarray

A Numpy ndarray, or array, object represents a multidimensional, homogeneous array of fixed-size items. An associated data-type object describes the format of each element in the array.  In Numpy, dimensions are referred to as axes. The number of axes is the rank.  

Arrays can be constructed in a number of ways, a common method is to construct an array from an existing sequence:

<div style="background-color: #FFF8C6; margin-left: 20px; margin-right: 20px; padding-bottom: 8px; padding-left: 8px; padding-right: 8px; padding-top: 8px;">
</a><b>Important</b><br/>
In Numpy we speak of `n` dimensional arrays.  It is common in mathematics to refer to a 2 dimensional array as a *matrix*.  In Numpy, however, this is not the convention.  As you are learning about arrays, I encourage you to think about them as arrays and not matrices.
</div>

In [2]:
a = np.array([10, 20, 30, 40])
b = np.array(['crunchy frog', 'ram bladder', 'lark vomit'])

The first example is an array of four integers. The second is an array of three character strings. Unlike lists, the elements of an ndarray have same type.  For example, it is illegal to assign a string to an element of `a`.

In [3]:
a[0] = 'string'

ValueError: invalid literal for long() with base 10: 'string'

The data-type of each element in an array is stored in the arrays `dtype` attribute:

In [4]:
print a.dtype
print b.dtype

int64
|S12


The data-type is inferred from its arguments at the time of construction. When elements of different types are passed to a constructor, the type of the resulting array corresponds to the more general or precise one (a behavior known as upcasting).  The data-type can also be explicitly specified:

In [5]:
c = np.array([1, 2, 3], dtype=np.float64)
print c
print c.dtype

[ 1.  2.  3.]
float64


Multidimensional arrays are constructed from nested sequences:

In [73]:
m = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
m

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

The number of axes in an array is stored in the `ndim` attribute

In [79]:
m.ndim

2

The `shape` attribute gives number of elements along each axis:

In [81]:
m.shape

(3, 3)

The total number of elements in an array is given by the `size` attribute

In [82]:
m.size

9

## Other methods for creating arrays

NumPy provides many functions to create arrays, the most common of which are described here.  The functions

    zeros, ones, empty

allow creation of arrays with a known shape and with initial placeholder content. These minimize the necessity of growing arrays, an expensive operation.   By default, the dtype of the created array is float64.

The functions

    array, zeros_like, ones_like, empty_like
    
allow creation of arrays with the same shape and type as another array.

The functions

    arange, linspace

allow creation of arrays over specified intervals.

### Examples

The function `ones` creates an array full of ones

In [58]:
np.ones(5)

array([ 1.,  1.,  1.,  1.,  1.])

The dimension of the array can be defined at the time of construction

In [8]:
np.ones((5,5))

[[ 1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.]]


The functions `zeros` creates an array full of zeros

In [61]:
np.zeros(5)

array([ 0.,  0.,  0.,  0.,  0.])

In [60]:
np.zeros((2,2))

array([[ 0.,  0.],
       [ 0.,  0.]])

The function `empty` creates an array whose initial content is random and depends on the state of the memory.

In [65]:
a = np.empty(5)
a

array([  4.94065646e-324,   9.88131292e-324,   1.48219694e-323,
         1.97626258e-323,   2.47032823e-323])

The array method `fill` fills the elements with a constant value

In [67]:
a.fill(3)
a

array([ 3.,  3.,  3.,  3.,  3.])

To create sequences of numbers, NumPy provides a function analogous to Python builtin `range` that returns an array instead of a list

In [77]:
start, end, step = 0, 10, 1
np.arange(start, end, step)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

When `arange` is used with floating point arguments, it is generally not possible to predict the number of elements obtained, due to the finite floating point precision. For this reason, it is usually better to use the function `linspace` that receives as an argument the number of elements that we want, instead of the step:

In [75]:
np.linspace(-5, 10, 6)

array([ -5.,  -2.,   1.,   4.,   7.,  10.])

The `identity` and `eye` functions return a 2 dimensional identity array

In [78]:
np.identity(3)

array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

In [69]:
np.eye(2)

array([[ 1.,  0.],
       [ 0.,  1.]])

The keyword `k` allows specifying the diagonal

In [70]:
np.eye(4, k=1)

array([[ 0.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  1.],
       [ 0.,  0.,  0.,  0.]])

## Changing the Shape of an Array

The size of an array is *fixed* at the time of creation.  But the *shape* of the array can be changed through a variety of means.

In [99]:
ar = np.arange(12)
ar

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

The `reshape` function returns its argument with a modified shape

In [98]:
ar = ar.reshape(4,3)
print ar

[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]


By default, the last index of an array changes most rapidly as one moves through the array as stored in memory (so called Row-major storage). For a 2 dimensional array, this is equivalent to the statement that a matrix is stored by rows.  This is different than languages such as matlab and Fortran that use Column-major storage.  The storage convention can be observed by viewing a flattened multidimensional array:

In [87]:
ar.flatten()

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

The functions `flatten()` and `reshape()` can also be instructed, using an optional argument, to use Fortran-style arrays, in which the leftmost index changes the fastest.

The shape of the array can be explicitly changed, though the total size must not change

In [100]:
ar.shape = 6, 2
ar

array([[ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11]])

In [89]:
ar.transpose()  # equivalently, ar.T

array([[ 0,  2,  4,  6,  8, 10],
       [ 1,  3,  5,  7,  9, 11]])

Whatever reshaping operation is used, the new shape must be consistent with the size of the original array:

In [101]:
ar.reshape(5,3)

ValueError: total size of new array must be unchanged

The `reshape` function returns its argument with a modified shape, but leaves the original array intact.  The `resize` method modifies the array itself:

In [102]:
ar.resize((3,4))
ar

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

If a dimension is given as -1 in a reshaping operation, the other dimensions are automatically calculated:

In [93]:
ar.reshape(2,-1)

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11]])

## Joining different arrays

Since the size of Numpy arrays is *fixed* at the time of creation, methods such as `append` or `extend` methods of the list object make little sense (they change the size of the object).  Numpy provides similar *functions* that return new arrays.

The `np.concatenate` function concatenates 2 or more arrays:

In [137]:
t1 = np.array([1,2,3])
t2 = np.array([3,4,5])
t3 = np.array([6,7,8])
np.concatenate((t1, t2))

array([1, 2, 3, 3, 4, 5])

In [138]:
np.concatenate((t1, t2, t3))

array([1, 2, 3, 3, 4, 5, 6, 7, 8])

The `column_stack` and `row_stack` functions create multi dimensional arrays from one dimensional arrays:

In [56]:
print np.column_stack((t1, t2))

[[1 3]
 [2 4]
 [3 5]]


In [57]:
print np.row_stack((t1, t2))

[[1 2 3]
 [3 4 5]]


By default, `concatenate` multidimensional joins arrays along the first axis

In [144]:
t4 = np.array([[1,2,3],[4,5,6]])
t5 = np.array([[7,8,9],[10,11,12]])
np.concatenate((t4, t5))

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

The `axis` argument allows concatenating along a different axis

In [145]:
np.concatenate((t4, t5), axis=1)

array([[ 1,  2,  3,  7,  8,  9],
       [ 4,  5,  6, 10, 11, 12]])

Setting `axis=None` concatenates flattend arrays

In [146]:
np.concatenate((t4, t5), axis=None)

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

Other functions for joining arrays include:

    append, hstack, vstack, dstack

## Indexing, slicing, and iterating

The syntax for accessing the elements of an array is the same as for accessing the elements of a list - the bracket operator. The expression inside the brackets specifies the index. Remember that the indices start at 0:

In [94]:
a[0]

3.0

For arrays with dimension > 1, a tuple is used to specify the index.  By default, matrix convention for indexing is used by arrays, i.e., the first index is the row.

In [115]:
m[0,1] # second element in first row of m

2

The slice operator works on arrays

In [116]:
m[:, 2]  # third column of m

array([3, 6, 9])

In [118]:
m[[0,-1], :]  #  first and last row of m

array([[1, 2, 3],
       [7, 8, 9]])

In [120]:
m[:, [0, -1]]  # first and last column of m

array([[1, 3],
       [4, 6],
       [7, 9]])

### Fancy Indexing

In addition to the usual indexing and slicing, arrays support *fancy* indexing, or, indexing with arrays of integers and arrays of booleans:

In [124]:
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
i = np.array([0, 3])
a[i]

array([1, 4])

In [125]:
j = np.array([True, True, False, False, True, False, True, False, True])
a[j]

array([1, 2, 5, 7, 9])

Indices can be given for more than one dimension. The arrays of indices for each dimension must have the same shape.

In [121]:
a = np.arange(12).reshape(3,4)
a

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [122]:
i = np.array([[0,1],        # indices for the first dim of a
              [1,2]])
j = np.array([[2,1],        # indices for the second dim
              [3,3]])
a[i,j]                  # i and j must have equal shape

array([[ 2,  5],
       [ 7, 11]])

In [123]:
a[:,j]                  # i.e., a[ : , j]

array([[[ 2,  1],
        [ 3,  3]],

       [[ 6,  5],
        [ 7,  7]],

       [[10,  9],
        [11, 11]]])

In [49]:
a[i,2]

array([[ 2,  6],
       [ 6, 10]])

Like lists, arrays are mutable. When the bracket operator appears on the left side of an assignment, it identifies the element of the list that will be assigned.

In [126]:
numbers = np.array([17., 123])
numbers[1] = 5
numbers

array([ 17.,   5.])

In [127]:
matrix = np.array([[1, 2], [3, 4]])
matrix[0,1] = 12
matrix

array([[ 1, 12],
       [ 3,  4]])

Indexing with arrays can be used as a target to assign to:

In [135]:
a = np.arange(12)
a[[0, 2, 4]] = [53, 54, 55]
a

array([53,  1, 54,  3, 55,  5,  6,  7,  8,  9, 10, 11])

Subject to some constraints, Numpy will “broadcast” a smaller array across the larger array so that they have compatible shapes:

In [133]:
a.resize(3,4)
a[:,1] = 44
a

array([[55, 44, 55,  3],
       [55, 44,  6,  7],
       [ 8, 44, 10, 11]])

### Iterating

Iterating over multidimensional arrays is done with respect to the first axis:

In [163]:
b = np.array([[0, 1, 2, 3],
              [10, 11, 12, 13],
              [20, 21, 22, 23],
              [30, 31, 32, 33],
              [40, 41, 42, 43]])
for row in b:
    print row

[0 1 2 3]
[10 11 12 13]
[20 21 22 23]
[30 31 32 33]
[40 41 42 43]


However, if one wants to perform an operation on each element in the array, one can use the flat attribute which is an iterator over all the elements of the array:

In [164]:
for element in b.flat:
    print element,

0 1 2 3 10 11 12 13 20 21 22 23 30 31 32 33 40 41 42 43


## Copies and views

### No copy

Simple assignments make no copy of array objects or of their data.

In [148]:
a = np.arange(12)
b = a            # no new object is created

In [149]:
b is a           # a and b are two names for the same ndarray object

True

In [150]:
b.shape = 3,4    # changes the shape of a
a.shape

(3, 4)

### View

Different array objects can share the same data. The `view` method creates a new array object that looks at the same data.

In [151]:
c = a.view()
c is a

False

In [152]:
c.base is a                        # c is a view of the data owned by a

True

In [153]:
c.flags.owndata

False

In [154]:
c.shape = 2,6                      # a's shape doesn't change
a.shape

(3, 4)

In [155]:
c[0,4] = 1234                      # a's data changes
a

array([[   0,    1,    2,    3],
       [1234,    5,    6,    7],
       [   8,    9,   10,   11]])

Slicing an array returns a view of it:

In [159]:
s = a[ : , 1:3]   
# spaces added for clarity; could also be written "s = a[:,1:3]"
s[:] = 10           # s[:] is a view of s. Note the difference between s=10 and s[:]=10
a

array([[   0,   10,   10,    3],
       [1234,   10,   10,    7],
       [   8,   10,   10,   11]])

### Deep copy

The copy method makes a complete copy of the array and its data.

In [160]:
d = a.copy()                          # a new array object with new data is created
d is a

False

In [161]:
d.base is a                           # d doesn't share anything with a

False

In [162]:
d[0,0] = 9999
a

array([[   0,   10,   10,    3],
       [1234,   10,   10,    7],
       [   8,   10,   10,   11]])