# NumPy basics

[NumPy](https://numpy.org) is an important library for numerical computation in Python.  It includes a multidimensional array object, efficient support for array-wise operations, linear algebra primitives, and more.

In this notebook, you'll get a quick introduction to NumPy in a question-and-answer format.  In some of the cells, we've filled in working code for you;  you should think about what these cells do before you execute them (and then figure out why they did what they did after you execute them).  Other cells will require you to write code.  Let's get started!

## Importing NumPy

By convention, many programmers import `numpy` with the alias of `np`.  (You'll be typing this module name a lot, so it'll be nice to save a few keystrokes each time.)

In [1]:
import numpy as np

##  One-dimensional arrays

Let's start by creating a numpy [array](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html) object, which is called `ndarray`, although we will usually refer to it by the alias `array`.  

###  Initializing arrays from Python lists

There are a few ways to initialize an `array`.

How would you initialize an `array` from a Python list?

In [3]:
np.array([1,1,2,3,5,8])

array([1, 1, 2, 3, 5, 8])

###  Initializing arrays from ranges and Python iterators

We might also want to initialize an array with a sequential range of integers.  Recall that we can generate an iterator over the range of numbers from 0 to _n - 1_ with the Python function `range(n)`.

How would you initialize a numpy array containing the numbers `0` through `9`?

In [4]:
# FIXME:  only keep this in the solution

np.array(range(10))

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

That's a good start!  

But — as we'll see regularly in this tutorial — the way you might do something using Python's library and builtins isn't necessarily the way you'd want do the same thing with NumPy.  NumPy provides a number of special-case operations that are faster or more convenient (or both!) than their standard Python equivalents.  One such example is the `arange` function, which populates an array  with a range.

Are there any differences between `np.arange(10)` and `np.array(range(10))`?

In [9]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

It looks like they return the same result!  But do you suppose there is a reason to prefer one or the other?

### Sidebar:  timing code in notebooks

`np.arange(10)` is slightly fewer keystrokes, but that may not make a difference to us.  But Perhaps it's also faster to execute?

We can get timings for individual notebook cells in Jupyter by using so-called "Jupyter magic," or special directives in code cells.  In particular,  we'll add `%%time` to the beginning of a code cell and Jupyter will report how long it takes to execute.  Timing very short code executions is error-prone, so we'll construct many arrays using each method to see if one is faster than the other.

In [12]:
%%time

for _ in range(10000):
    np.array(range(100))

CPU times: user 193 ms, sys: 5 ms, total: 198 ms
Wall time: 223 ms


In [13]:
%%time

for _ in range(10000):
    np.arange(100)

CPU times: user 12.3 ms, sys: 2.34 ms, total: 14.7 ms
Wall time: 15.8 ms


As we can see, one of these options is significantly faster.  If we're analyzing data, running a simulation, or training a machine learning model, we'll want our numeric code to be as fast as possible, so it's important to choose the best option for a given use case.

### Initializing arrays with a constant value

Remember that we can create a Python list consisting of a constant value with the `*` operator, like this:

In [14]:
[5] * 5

[5, 5, 5, 5, 5]

Edit the following cell to make a numpy array of twelve fives.

In [15]:
# replace None with an expression so that this cell 
# will return a numpy array with twelve fives 

np.array(None)

array(None, dtype=object)

We can instead create a numpy array without initializing its elements (using the `empty` function)  and then fill it in with a constant value in place (using the `fill` member of `ndarray`).

In [22]:
a = np.empty(12, 'i')
a.fill(5)
a

array([5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5], dtype=int32)

Let's see which of these is faster!

In [29]:
%%time 

for _ in range(100000):
    np.array([5] * 12) 

CPU times: user 199 ms, sys: 4.61 ms, total: 204 ms
Wall time: 211 ms


In [28]:
%%time 

for _ in range(100000):
    a = np.empty(12, 'i')
    a.fill(5)
    a 

CPU times: user 144 ms, sys: 3.51 ms, total: 147 ms
Wall time: 161 ms


It may also be interesting to compare the performance of these two  techniques while creating larger arrays.

In [32]:
%%time 

for _ in range(100000):
    np.array([5] * 1000) 

CPU times: user 6.09 s, sys: 19.1 ms, total: 6.11 s
Wall time: 6.18 s


In [33]:
%%time 

for _ in range(100000):
    a = np.empty(1000, 'i')
    a.fill(5)
    a 

CPU times: user 181 ms, sys: 4.63 ms, total: 186 ms
Wall time: 218 ms


(You may have noticed the `'i'` argument to `np.empty`.   What do you suppose it does?  What happens if you remove it?)

### Special case:  initializing arrays of zeroes

NumPy provides a special function to initialize an array of zeroes.   Without searching the internet, replace the code in the following cell with a call to this special function that will return the same result.  (Hint:  use `dir(np)` to find the name of this function and `help` to get documentation on it.)

In [34]:
np.array([0] * 100)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

Write an experiment to see which is faster, using the `%%time` magic.

In [35]:
# Test the time performance of np.array([0] * 100)

In [36]:
# Test the time performance of the special numpy function 
# that produces an array of one hundred zeroes