# (Numpy) Collections/Vector/Matrix

Last update: Sept 22nd, 2020

We only talk about lists below as it's closer to Numpy arrays. There are more interesting collection types like "Tuple" and "Dictionary" that you can check by yourselves.

**UPDATE**: For those who are interested, the Numpy group has their summary paper published on [Nature](https://www.nature.com/articles/s41586-020-2649-2)

## Ordered Collections

### Lists

A Python list is an ordered collection of items.

We can create lists using the following syntax

```python
[item1, item2, ...,  itemN]
```


where the `...` represents any number of additional items.

Each item can be of any type.

Let’s create some lists.

In [2]:
# created, but not assigned to a variable
[2.0, 9.1, 12.5]

[2.0, 9.1, 12.5]

In [4]:
# stored as the variable `x`
x = [2.0, 9.1, 12.5]
print("x has type", type(x))
print(x)

x has type <class 'list'>
[2.0, 9.1, 12.5]


#### What Can We Do with Lists?

We can access items in a list called `mylist` using `mylist[N]`
where `N` is an integer.

Note: Anytime that we use the syntax `x[i]` we are doing what is
called indexing – it means that we are selecting a particular element
of a *collection* `x`.

In [7]:
x[2]

12.5

Wait? Why did `x[1]` return `9.1` when the first element in x is
actually `2.0`?

This happened because Python starts counting at zero!

Lets repeat that one more time for emphasis **Python starts counting at zero**!

To access the first element of x we must use `x[0]`:

In [4]:
x[0]

2.0

We can also determine how many items are in a list using the `len` function.

In [8]:
len(x)

3

What happens if we try to index with a number higher than the number of
items in a list?

In [9]:
# uncomment the line below and run
x[3]

IndexError: list index out of range

We can check if a list contains an element using the `in` keyword.

In [7]:
2.0 in x

True

In [8]:
1.5 in x

False

For our list `x`, other common operations we might want to do are…

In [10]:
x.reverse()
x

[12.5, 9.1, 2.0]

In [11]:
number_list = [10, 25, 42, 1.0]
print(number_list)
number_list.sort()
print(number_list)

[10, 25, 42, 1.0]
[1.0, 10, 25, 42]


Note that in order to `sort`, we had to have all elements in our list
be numbers (`int` and `float`), more on this [below](#inhomogenous-lists).

We could actually do the same with a list of strings. In this case, `sort`
will put the items in alphabetical order.

In [11]:
str_list = ["NY", "AZ", "TX"]
print(str_list)
str_list.sort()
print(str_list)

['NY', 'AZ', 'TX']
['AZ', 'NY', 'TX']


The `append` method adds an element to the end of existing list.

In [12]:
num_list = [10, 25, 42, 8]
print(num_list)
num_list.append(10)
print(num_list)

[10, 25, 42, 8]
[10, 25, 42, 8, 10]


However, if you call `append` with a list, it adds a `list` to the end,
rather than the numbers in that list.

In [13]:
num_list = [10, 25, 42, 8]
print(num_list)
num_list.append([20, 4])
print(num_list)

[10, 25, 42, 8]
[10, 25, 42, 8, [20, 4]]


To combine the lists instead…

In [14]:
num_list = [10, 25, 42, 8]
print(num_list)
num_list.extend([20, 4])
print(num_list)

[10, 25, 42, 8]
[10, 25, 42, 8, 20, 4]


### The `range` Function

One function you will see often in Python is the `range` function.

It has three versions:

1. `range(N)`: goes from 0 to N-1  
1. `range(a, N)`: goes from a to N-1  
1. `range(a, N, d)`: goes from a to N-1, counting by d  


When we call the `range` function, we get back something that has type `range`:

In [12]:
r = range(5)
print("type(r)", type(r))

type(r) <class 'range'>


In [14]:
r = range(2, 5)
list(r)

[2, 3, 4]

To turn the `range` into a list:

In [16]:
list(r)

[0, 1, 2, 3, 4]

## NumPy Arrays


<a id='index-2'></a>
The essential problem that NumPy solves is fast array processing.

The most important structure that NumPy defines is an array data type formally called a [numpy.ndarray](http://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html).

NumPy arrays power a large proportion of the scientific Python ecosystem.

Let’s first import the library.

In [17]:
import numpy as np

To create a NumPy array containing only zeros we use  [np.zeros](http://docs.scipy.org/doc/numpy/reference/generated/numpy.zeros.html#numpy.zeros)

In [18]:
a = np.zeros(3)
a

array([0., 0., 0.])

In [19]:
type(a)

numpy.ndarray

NumPy arrays are somewhat like native Python lists, except that

- Data *must be homogeneous* (all elements of the same type).  
- These types must be one of the [data types](https://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html) (`dtypes`) provided by NumPy.  


The most important of these dtypes are:

- float64: 64 bit floating-point number  
- int64: 64 bit integer  
- bool:  8 bit True or False  


There are also dtypes to represent complex numbers, unsigned integers, etc.

On modern machines, the default dtype for arrays is `float64`

In [4]:
a = np.zeros(3)
type(a[0])

numpy.float64

If we want to use integers we can specify as follows:

In [5]:
a = np.zeros(3, dtype=int)
type(a[0])

numpy.int32


<a id='numpy-shape-dim'></a>

### Shape and Dimension


<a id='index-3'></a>
Consider the following assignment

In [22]:
tmp = np.zeros((3, 3))

In [26]:
tmp.size

9

In [20]:
z = np.zeros(10)

Here `z` is a *flat* array with no dimension — neither row nor column vector.

The dimension is recorded in the `shape` attribute, which is a tuple

In [21]:
z.shape

(10,)

Here the shape tuple has only one element, which is the length of the array (tuples with one element end with a comma).

To give it dimension, we can change the `shape` attribute

In [8]:
z.shape = (10, 1)
z

array([[0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.]])

In [9]:
z = np.zeros(4)
z.shape = (2, 2)
z

array([[0., 0.],
       [0., 0.]])

In the last case, to make the 2 by 2 array, we could also pass a tuple to the `zeros()` function, as
in `z = np.zeros((2, 2))`.


<a id='creating-arrays'></a>

### Creating Arrays


<a id='index-4'></a>
As we’ve seen, the `np.zeros` function creates an array of zeros.

You can probably guess what `np.ones` creates.

Related is `np.empty`, which creates arrays in memory that can later be populated with data

In [10]:
z = np.empty(3)
z

array([0., 0., 0.])

The numbers you see here are garbage values.

(Python allocates 3 contiguous 64 bit pieces of memory, and the existing contents of those memory slots are interpreted as `float64` values)

To set up a grid of evenly spaced numbers use `np.linspace`

In [11]:
z = np.linspace(2, 4, 5)  # From 2 to 4, with 5 elements

To create an identity matrix use either `np.identity` or `np.eye`

In [12]:
z = np.identity(2)
z

array([[1., 0.],
       [0., 1.]])

In addition, NumPy arrays can be created from Python lists, tuples, etc. using `np.array`

In [13]:
z = np.array([10, 20])                 # ndarray from Python list
z

array([10, 20])

In [14]:
type(z)

numpy.ndarray

In [15]:
z = np.array((10, 20), dtype=float)    # Here 'float' is equivalent to 'np.float64'
z

array([10., 20.])

In [16]:
z = np.array([[1, 2], [3, 4]])         # 2D array from a list of lists
z

array([[1, 2],
       [3, 4]])

See also `np.asarray`, which performs a similar function, but does not make
a distinct copy of data already in a NumPy array.

In [17]:
na = np.linspace(10, 20, 2)
na is np.asarray(na)   # Does not copy NumPy arrays

True

In [18]:
na is np.array(na)     # Does make a new copy --- perhaps unnecessarily

False

To read in the array data from a text file containing numeric data use `np.loadtxt`
or `np.genfromtxt`—see [the documentation](http://docs.scipy.org/doc/numpy/reference/routines.io.html) for details.

### Array Indexing


<a id='index-5'></a>
For a flat array, indexing is the same as Python sequences:

In [19]:
z = np.linspace(1, 2, 5)
z

array([1.  , 1.25, 1.5 , 1.75, 2.  ])

In [20]:
z[0]

1.0

In [21]:
z[0:2]  # Two elements, starting at element 0

array([1.  , 1.25])

In [22]:
z[-1]

2.0

For 2D arrays the index syntax is as follows:

In [23]:
z = np.array([[1, 2], [3, 4]])
z

array([[1, 2],
       [3, 4]])

In [24]:
z[0, 0]

1

In [25]:
z[0, 1]

2

And so on.

Note that indices are still zero-based, to maintain compatibility with Python sequences.

Columns and rows can be extracted as follows

In [26]:
z[0, :]

array([1, 2])

In [27]:
z[:, 1]

array([2, 4])

NumPy arrays of integers can also be used to extract elements

In [28]:
z = np.linspace(2, 4, 5)
z

array([2. , 2.5, 3. , 3.5, 4. ])

In [29]:
indices = np.array((0, 2, 3))
z[indices]

array([2. , 3. , 3.5])

Finally, an array of `dtype bool` can be used to extract elements

In [30]:
z

array([2. , 2.5, 3. , 3.5, 4. ])

In [31]:
d = np.array([0, 1, 1, 0, 0], dtype=bool)
d

array([False,  True,  True, False, False])

In [32]:
z[d]

array([2.5, 3. ])

We’ll see why this is useful below.

An aside: all elements of an array can be set equal to one number using slice notation

In [33]:
z = np.empty(3)
z

array([2. , 3. , 3.5])

In [34]:
z[:] = 42
z

array([42., 42., 42.])

### Array Methods


<a id='index-6'></a>
Arrays have useful methods, all of which are carefully optimized

In [None]:
a = np.array((4, 3, 2, 1))
a

In [None]:
a.sort()              # Sorts a in place
a

In [None]:
a.sum()               # Sum

In [None]:
a.mean()              # Mean

In [None]:
a.max()               # Max

In [None]:
a.argmax()            # Returns the index of the maximal element

In [None]:
a.cumsum()            # Cumulative sum of the elements of a

In [None]:
a.cumprod()           # Cumulative product of the elements of a

In [None]:
a.var()               # Variance

In [None]:
a.std()               # Standard deviation

In [None]:
a.shape = (2, 2)
a.T                   # Equivalent to a.transpose()

Another method worth knowing is `searchsorted()`.

If `z` is a nondecreasing array, then `z.searchsorted(a)` returns the index of the first element of `z` that is `>= a`

In [None]:
z = np.linspace(2, 4, 5)
z

In [None]:
z.searchsorted(2.2)

Many of the methods discussed above have equivalent functions in the NumPy namespace

In [None]:
a = np.array((4, 3, 2, 1))

In [None]:
np.sum(a)

In [None]:
np.mean(a)

## Operations on Arrays


<a id='index-7'></a>

### Arithmetic Operations

The operators `+`, `-`, `*`, `/` and `**` all act *elementwise* on arrays

In [35]:
a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])
a + b

array([ 6,  8, 10, 12])

In [36]:
a * b

array([ 5, 12, 21, 32])

We can add a scalar to each element as follows

In [37]:
a + 10

array([11, 12, 13, 14])

Scalar multiplication is similar

In [38]:
a * 10

array([10, 20, 30, 40])

The two-dimensional arrays follow the same general rules

In [39]:
A = np.ones((2, 2))
B = np.ones((2, 2))
A + B

array([[2., 2.],
       [2., 2.]])

In [40]:
A + 10

array([[11., 11.],
       [11., 11.]])

In [41]:
A * B

array([[1., 1.],
       [1., 1.]])


<a id='numpy-matrix-multiplication'></a>
In particular, `A * B` is *not* the matrix product, it is an element-wise product.

### Matrix Multiplication


<a id='index-8'></a>
With Anaconda’s scientific Python package based around Python 3.5 and above,
one can use the `@` symbol for matrix multiplication, as follows:

In [42]:
A = np.ones((2, 2))
B = np.ones((2, 2))
A @ B

array([[2., 2.],
       [2., 2.]])

(For older versions of Python and NumPy you need to use the [np.dot](http://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html) function)

We can also use `@` to take the inner product of two flat arrays

In [43]:
A = np.array((1, 2))
B = np.array((10, 20))
A @ B

50

In fact, we can use `@` when one element is a Python list or tuple

In [44]:
A = np.array(((1, 2), (3, 4)))
A

array([[1, 2],
       [3, 4]])

In [45]:
A @ (0, 1)

array([2, 4])

Since we are post-multiplying, the tuple is treated as a column vector.

### Vectorized Functions


<a id='index-9'></a>
NumPy provides versions of the standard functions `log`, `exp`, `sin`, etc. that act *element-wise* on arrays

In [46]:
z = np.array([1, 2, 3])
np.sin(z)

array([0.84147098, 0.90929743, 0.14112001])

This eliminates the need for explicit element-by-element loops such as

In [47]:
n = len(z)
y = np.empty(n)
for i in range(n):
    y[i] = np.sin(z[i])

Because they act element-wise on arrays, these functions are called *vectorized functions*.

In NumPy-speak, they are also called *ufuncs*, which stands for “universal functions”.

As we saw above, the usual arithmetic operations (`+`, `*`, etc.) also
work element-wise, and combining these with the ufuncs gives a very large set of fast element-wise functions.

In [48]:
z

array([1, 2, 3])

In [49]:
(1 / np.sqrt(2 * np.pi)) * np.exp(- 0.5 * z**2)

array([0.24197072, 0.05399097, 0.00443185])

### Comparisons


<a id='index-10'></a>
As a rule, comparisons on arrays are done element-wise

In [54]:
z = np.array([2, 3])
y = np.array([2, 3])
z == y

array([ True,  True])

In [55]:
y[0] = 5
z == y

array([False,  True])

In [56]:
z != y

array([ True, False])

The situation is similar for `>`, `<`, `>=` and `<=`.

We can also do comparisons against scalars

In [57]:
z = np.linspace(0, 10, 5)
z

array([ 0. ,  2.5,  5. ,  7.5, 10. ])

In [58]:
z > 3

array([False, False,  True,  True,  True])

This is particularly useful for *conditional extraction*

In [59]:
b = z > 3
b

array([False, False,  True,  True,  True])

In [60]:
z[b]

array([ 5. ,  7.5, 10. ])

Of course we can—and frequently do—perform this in one step

In [61]:
z[z > 3]

array([ 5. ,  7.5, 10. ])

### Sub-packages

NumPy provides some additional functionality related to scientific programming
through its sub-packages.

We’ve already seen how we can generate random variables using np.random

In [62]:
z = np.random.randn(10000)  # Generate standard normals
y = np.random.binomial(10, 0.5, size=1000)    # 1,000 draws from Bin(10, 0.5)
y.mean()

4.991

Another commonly used subpackage is np.linalg

In [63]:
A = np.array([[1, 2], [3, 4]])

np.linalg.det(A)           # Compute the determinant

-2.0000000000000004

In [64]:
np.linalg.inv(A)           # Compute the inverse

array([[-2. ,  1. ],
       [ 1.5, -0.5]])

For a comprehensive list of what’s available in NumPy see [this documentation](https://docs.scipy.org/doc/numpy/reference/routines.html).