In [1]:
# You must run this cell, but you can ignore its contents.

import hashlib

def hash(ty):
    """Return a unique string for input"""
    ty_str = str(ty).encode()
    m = hashlib.sha256()
    m.update(ty_str)
    return m.hexdigest()[:10]

# numpy basics and creating arrays

Numpy is a widely used library for handling arrays of data, especially numberical data. It would not be an exageration to say it is fundamental to the Python data science ecosystem.

The most important part of numpy is the numpy `array` type. A numpy `array` is conceptually similar to a Python `list` or `tuple` but each element has the same data type and the array has a fixed size.

Typically `numpy` is imported as `np`, a conventional shorthand that saves a bit of typing.

In [2]:
import numpy as np

We can create an array from any sequence type, such as lists and tuples:

In [3]:
x = np.array([1,2,3,4])
x

array([1, 2, 3, 4])

We can crate an array of `n` elements (from `0` to `n-1`) with the `arange` function.

In [4]:
x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

We can create an array of `n` equally spaced elements from `start` to `stop` with `np.linspace`. For example, here `start` is `100`, `stop` is `120` and `n` is 11.

In [5]:
np.linspace(100, 120, 11)

array([100., 102., 104., 106., 108., 110., 112., 114., 116., 118., 120.])

We can also create arrays of zero or one with a given shape:

In [6]:
np.zeros((3,5))

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

In [7]:
np.ones((5,3))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

## array shape

In addition to 1 dimensional numpy arrays which are very similar to lists or tuples, numpy arrays may also be 2 or more dimensions. The `shape` attribute of a numpy array may be used to get or set its number of dimensions and size.

In [8]:
x = np.arange(12)
x.shape = (3,4)
x

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

The `ndim` attribute is the dimensionality of the array (and, thus, equal to length of the array `shape`):

In [9]:
x.ndim

2

## array operations

Numpy arrays support mathematical operations with other numpy arrays and with single numbers ("scalars").

With scalars, the scalar is first converted to an array with the same shape as the numpy array and then an element-wise operation is performed.

With other arrays of the same size, an element-wise operation is performed.

In [10]:
x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [11]:
y = np.arange(10) * 2
y

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [12]:
z = x + y
z

array([ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27])

In [13]:
x + 3

array([ 3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

In [14]:
x + 3.5

array([ 3.5,  4.5,  5.5,  6.5,  7.5,  8.5,  9.5, 10.5, 11.5, 12.5])

In [15]:
x/5

array([0. , 0.2, 0.4, 0.6, 0.8, 1. , 1.2, 1.4, 1.6, 1.8])

## array dtype

Just like lists or tuples, every element in a numpy array has a data type. As mentioned above, however, every element in a numpy array has the same data type, and thus we can refer to the "datatype of the array". This can be set when the array is created with the `dtype` keyword argument and read from the `dtype` attribute:

In [16]:
x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [17]:
x.dtype

dtype('int64')

In [18]:
x = np.arange(10, dtype=np.float)
x

array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

In [19]:
x.dtype

dtype('float64')

## array indexing and slicing

Numpy arrays can be indexed sliced just like other Python sequence types such as lists, tuples, and strings.

Just like with python lists, the indexes or slices can be read and written. In other words, numpy arrays are *mutable*.

In [20]:
x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [21]:
x[:4]

array([0, 1, 2, 3])

In [22]:
x[4:]

array([4, 5, 6, 7, 8, 9])

In [23]:
x[4:7]

array([4, 5, 6])

In [24]:
x[::2]

array([0, 2, 4, 6, 8])

Because numpy arrays can have 2 or more, dimensions, we can also index and slice them in higher dimensions. For two dimensional arrays, the first index is always the row index and the second index is always the column index.

In [25]:
x = np.arange(12)
x.shape = (3,4)
x

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [26]:
x[1:, :]

array([[ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [27]:
x[:, 1:]

array([[ 1,  2,  3],
       [ 5,  6,  7],
       [ 9, 10, 11]])

In [28]:
x[1:, 2:] = 99
x

array([[ 0,  1,  2,  3],
       [ 4,  5, 99, 99],
       [ 8,  9, 99, 99]])

## References to arrays

Remember that variable assignment in Python does not create a new object but only creates a variable which points to an existing object. This is very important with numpy.

In [29]:
x = np.arange(20)
x

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

In [30]:
# Here we create a variable which references the first 10 elements of `x`.
y = x[:10]
y

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [31]:
# Now we assign all the elements of `y` to have the value of 123.
y[:] = 123

In [32]:
# How does this affect the original array `x`?
x

array([123, 123, 123, 123, 123, 123, 123, 123, 123, 123,  10,  11,  12,
        13,  14,  15,  16,  17,  18,  19])

## Efficient data processing with numpy

Because operations on numpy arrays happen for all elements with a single Python expression, these can operations can be performed very fast and efficiently by the computer. For example, if `x` is a numpy array with 10,000 elements, we can avoid a Python for loop with 10,000 iterations by performing our work with numpy.

Below we use the Jupyter "magic command `%timeit`" to measure how long a single expression takes, in this case performing an element-wise multiplication.

In [33]:
x = np.arange(10000, dtype=np.float)

In [34]:
%timeit x*x

6.59 µs ± 986 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [35]:
y = x*x
len(y)

10000

In [36]:
assert y[2] == 4

Now let's do the same as above with a Python `list`. We need to crease a list_mul function.

In [37]:
def list_mul(a,b):
    """element-wise product of `a` and `b`"""
    n = len(a)
    assert(n==len(b))
    c = []
    for i in range(n):
        c.append(a[i] * b[i])
    return c

Now convert `x` to a list from a numpy array.

In [38]:
x = list(x)

In [39]:
%timeit list_mul(x,x)

4.57 ms ± 1.3 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [40]:
y = list_mul(x,x)

In [41]:
assert y[2] == 4

## Elementwise numpy operations

Above you have already seen element-wise multiplication, which multiplies every element of two inputs. Similarly, other operations operate element wise on a single input array.

In [42]:
np.sqrt( np.array([1, 4, 9]))

array([1., 2., 3.])

In [43]:
np.cos( np.linspace( 0, 2*np.pi, 30) )

array([ 1.        ,  0.97662056,  0.90757542,  0.79609307,  0.64738628,
        0.46840844,  0.26752834,  0.05413891, -0.161782  , -0.37013816,
       -0.56118707, -0.72599549, -0.85685718, -0.94765317, -0.99413796,
       -0.99413796, -0.94765317, -0.85685718, -0.72599549, -0.56118707,
       -0.37013816, -0.161782  ,  0.05413891,  0.26752834,  0.46840844,
        0.64738628,  0.79609307,  0.90757542,  0.97662056,  1.        ])

## More numpy operations

In addition to elementwise operations such as `np.cos(x)` or `x * y` where `x` and `y` are same-shaped arrays, numpy can also perform operations on entire arrays.

Take for example the `mean()` function.

In [44]:
x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [45]:
np.mean(x)

4.5

We can also do the mean on a 2D array, either for the entire array or row-wise or column-wise:

In [46]:
x = np.arange(30)
x.shape = (5,6)
x

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29]])

In [47]:
np.mean(x)

14.5

In [48]:
# take the mean across the rows, which is axis 0.
np.mean(x,axis=0)

array([12., 13., 14., 15., 16., 17.])

In [49]:
# take the mean across the columns, which is axis 1.
np.mean(x,axis=1)

array([ 2.5,  8.5, 14.5, 20.5, 26.5])

In addition to `mean()`, numpy provides `std()`, `sum()`, and more. 

In [50]:
np.std(x)

8.65544144839919

In [51]:
np.sum(x)

435

## argmin and argmax

Important in many scientific computing applications are `argmin` and `argmax` functions. These return the index of the smallest or largest value, respectively.

In [52]:
x = np.array([0, 10, 0, 4, 3, 2, 100, 2, 2, -1])

In [53]:
min_idx = np.argmin(x)
min_idx

9

In [54]:
x[min_idx]

-1

In [55]:
max_idx = np.argmax(x)
max_idx

6

In [56]:
x[max_idx]

100

## Because of its speed, numpy makes it possible to use Python for scientific computing.

You can read more about numpy at its [User Guide](https://numpy.org/doc/1.17/user/index.html) and its [Reference Guide](https://numpy.org/doc/1.17/reference/index.html).

## Q1 Create a 1 dimensional numpy array named `x` with 20 elements from 0 to 19.

In [57]:
# Type your answer here and then run this and the following cell.
x = np.arange(20)

In [58]:
# If this runs without error, it means the answer in your previous cell was correct.
assert(hash(x)=='7a150607a7')

## Q2 Create a 2 dimensional numpy array named `x` of shape 5,6.

In [59]:
# Type your answer here and then run this and the following cell.
x = np.arange(30)
x.shape = (5,6)

In [60]:
# If this runs without error, it means the answer in your previous cell was correct.
assert(hash(x.shape)=='6510380315')

## Q3 Consider the first 100 integers starting with 0. What is their mean value? Put this in a variable `mean100`.

In [61]:
# Type your answer here and then run this and the following cell.
mean100 = np.mean(np.arange(100))

In [62]:
# If this runs without error, it means the answer in your previous cell was correct.
assert(hash(round(mean100*1000))=='90cd8ccb0f')

## Q4 Create an array named `x` with every 4th value of the first 100000 integers.

In [63]:
# Type your answer here and then run this and the following cell.
x = np.arange(100000)[::4]
x

array([    0,     4,     8, ..., 99988, 99992, 99996])

In [64]:
# If this runs without error, it means the answer in your previous cell was correct.
assert(hash(len(x))=='0812a4ef4e')
assert(hash(x)=='17e101b2ef')