# NumPy basics

[NumPy](https://numpy.org) is an important library for numerical computation in Python.  It includes a multidimensional array object, efficient support for array-wise operations, linear algebra primitives, and more.

In this notebook, you'll get a quick introduction to NumPy, some of which is in a question-and-answer format.  In some of the cells, we've filled in working code for you;  you should think about what these cells do before you execute them (and then figure out why they did what they did after you execute them).  Other cells will require you to write code.  

Let's get started!

## Importing NumPy

By convention, many programmers import `numpy` with the alias of `np`.  (You'll be typing this module name a lot, so it'll be nice to save a few keystrokes each time.)

In [None]:
import numpy as np

##  Creating one-dimensional arrays

Let's start by creating a numpy [array](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html) object, which is called `ndarray`, although we will usually refer to it by the alias `array`.  

###  Initializing arrays from Python lists

There are a few ways to initialize an `array`.

How would you initialize an `array` from a Python list?

In [None]:
np.array([1,1,2,3,5,8])

###  Initializing arrays from ranges and Python iterators

We might also want to initialize an array with a sequential range of integers.  Recall that we can generate an iterator over the range of numbers from 0 to _n - 1_ with the Python function `range(n)`.

How would you initialize a numpy array containing the numbers `0` through `9`?

In [None]:
# FIXME:  only keep this in the solution

np.array(range(10))

That's a good start!  

But — as we'll see regularly in this tutorial — the way you might do something using Python's library and builtins isn't necessarily the way you'd want do the same thing with NumPy.  NumPy provides a number of special-case operations that are faster or more convenient (or both!) than their standard Python equivalents.  One such example is the `arange` function, which populates an array  with a range.

Are there any differences between `np.arange(10)` and `np.array(range(10))`?

In [None]:
np.arange(10)

It looks like they return the same result!  But do you suppose there is a reason to prefer one or the other?

### Sidebar:  timing code in notebooks

`np.arange(10)` is slightly fewer keystrokes, but that may not make a difference to us.  But Perhaps it's also faster to execute?

We can get timings for individual notebook cells in Jupyter by using so-called "Jupyter magic," or special directives in code cells.  In particular,  we'll add `%%time` to the beginning of a code cell and Jupyter will report how long it takes to execute.  Timing very short code executions is error-prone, so we'll construct many arrays using each method to see if one is faster than the other.

In [None]:
%%time

for _ in range(10000):
    np.array(range(100))

In [None]:
%%time

for _ in range(10000):
    np.arange(100)

As we can see, one of these options is significantly faster.  If we're analyzing data, running a simulation, or training a machine learning model, we'll want our numeric code to be as fast as possible, so it's important to choose the best option for a given use case.

### More on ranges

NumPy also has functions to initialize an array with values evenly spaced across a linear or logarithmic scale.  Can you figure out what the arguments to these functions mean?  (Use `help(np.linspace)` if you need more information!)

In [None]:
np.linspace(0, 5, 16)

In [None]:
np.logspace(0, 10, 11, base=2)

### Initializing arrays with a constant value

Remember that we can append a Python list to itself several times with the `*` operator.  We can use this behavior to make a Python list containing a constant value:

In [None]:
[5] * 5

Edit the following cell to make a numpy array of twelve fives.

In [None]:
# replace None with an expression so that this cell 
# will return a numpy array with twelve fives 

np.array(None)

We could also use the `full` function to generate an array of a given shape (in this case, a one-dimensional array with twelve elements) and the same value in each element:

In [None]:
np.full(12, 5)

Let's see which of these is faster!

In [None]:
%%time 

for _ in range(100000):
    np.array([5] * 12) 

In [None]:
%%time 

for _ in range(100000):
    np.full(12, 5)

It may also be interesting to compare the performance of these two  techniques while creating larger arrays.

In [None]:
%%time 

for _ in range(100000):
    np.array([5] * 1000) 

In [None]:
%%time 

for _ in range(100000):
    np.full(1000, 5)

### Special case:  initializing arrays of zeroes or ones

NumPy provides special functions to initialize an array of zeroes or an  array of ones.   Without searching the internet, replace the code in the following cell with a call to this special function that will return the same result.  (Hint:  use `dir(np)` to find the name of this function and `help` to get documentation on it.)

In [None]:
np.array([0] * 100)

Write an experiment to see which is faster, using the `%%time` magic.

In [None]:
%%time

# Test the time performance of np.array([0] * 100)

In [None]:
%%time

# Test the time performance of the special numpy function 
# that produces an array zeroes by repeatedly generating
# arrays of one hundred zeroes

### Creating empty arrays and filling arrays

We could also create a numpy array without initializing its elements (using the `empty` function)  and then fill it in with a constant value in place (using the `fill` member of `ndarray`).

In [None]:
a = np.empty(12, int)
a.fill(15)
a

(You may have noticed the `int` argument to `np.empty`.   What do you suppose it does?  What happens if you remove it?  Stay tuned...)

## Basic array operations

In this section of the notebook, we'll see how the basic array operations we know from Python also apply to NumPy arrays.

Let's start by creating two similar arrays:

In [None]:
pa = [n for n in range(100)]
na = np.arange(100)

How would you see how many elements are in a Python array?

In [None]:
len(pa)

Great!  Does this also work for a NumPy array?

In [None]:
len(na)

It does!  (You can also use the `size` member of a NumPy array to see how many elements it contains, like we'll do in the next cell.)

In [None]:
na.size

How would you access the fifth element of a Python array?   (Remember that the _first_ element of an array is numbered `0`!)

In [None]:
# write an expression to access the fifth element of pa

In [None]:
# write an expression to access the fifth element of na

Both expressions should have evaluated to `4`.  You can also update array elements in the same way.  Write some code to change the first element of each array to `1000`.

In [None]:
# write a statement to set the first element of pa to 1000
# write a statement to set the first element of na to 1000

Python arrays also support some interesting indexing modes to access from the end of the array or to access a _slice_ of contiguous values in the array.  For example;

- `pa[-1]` will give you the last element of `pa`, :
- `pa[2:]` will give you a _slice_ consisting of the contiguous elements of `pa` starting from the third through the end of the array,
- `pa[:5]` will give you a _slice_ consisting of the first five elements of `pa`, and
- `pa[1:6]` will give you a _slice_ consisting of the second through sixth elements of `pa`.

You can also combine these modes, for example, to start or end a slice at an offset from the end of the array.  Try it out now!

In [None]:
# write an expression to take a slice of pa 
# starting with the fifth element from the 
# last and going through the last element

NumPy arrays also support this slicing and indexing notation.  Try it out!

In [None]:
# write an expression to take a slice of na 
# starting with the fifth element from the 
# last and going through the last element

### Types

Python is an _untyped_ language, which means that you cannot, in general, statically ascribe a nontrivial type to a variable.  (Python 3.5 introduced [some support for type annotations](https://www.python.org/dev/peps/pep-0483/), which other tools can use to help check and test your software, but which are generally ignored by the language environment itself.)

Furthermore, the _contents_ of Python arrays are possibly heterogeneous (meaning that you cannot, in general, even dynamically ascribe a nontrivial type to the contents of a Python array variable).  It's perfectly acceptable to put a bunch of arbitrary values in a Python array and it's up to the rest of the program to make sure these are handled properly.

Try out the following cell to see how crazy things can get:

In [None]:
untyped_array = [n for n in range(10)]
untyped_array[2:6] = ["this", "sure", "is", "fun"]
untyped_array[0] = {"any values" : "are welcome"}
untyped_array[-1] = Exception("uh oh")
untyped_array

By contrast, NumPy arrays track their element types.   Try the following two cells to see how this works!

In [None]:
arr = np.arange(15)
print(arr)
arr.dtype

In [None]:
arr = np.logspace(0, 10, 11, base=2)
print(arr)
arr.dtype

This type information is necessary because much of NumPy is implemented with high-performance native libraries and types are essential for both performance and correctness.  In order to avoid surprising us, NumPy will keep us from putting the wrong kind of value in an array.

In [None]:
arr[3] = "eight"

In [None]:
arr

Let's see what happens if we try and change the type of the array (by changing the type of its elements).

In [None]:
arr.dtype = 'int32'
arr

Is that the result you expected?  What happened there, anyway?  (If you need a hint, run the next cell.)

In [None]:
arr.dtype = 'float64'
arr

## A first gotcha

So far we've seen a few performance differences between Python lists and NumPy arrays, but these data structures generally support similar interfaces.  We've only seen minor behavioral differences so far.  However, in the next cell, we'll show you an important difference between NumPy arrays and Python lists -- and we'll do that by defining a function that returns the same value whether you pass it a Python list or a NumPy array with the same elements.

In [None]:
def florble(a):
    if (len(a)) < 5:
        return 0
    
    foo = a[1:-1]
    foo[1] = foo[-1]
    return foo

Read through that function and think about what it does.  We'll try it out next.

In [None]:
na = np.arange(6)
pa = [n for n in range(6)]

In [None]:
florble(na)

In [None]:
florble(pa)

This function returns the same value whether you've passed in a NumPy array or a Python list.  Are these two cases different?  (If so, how?)

_Hint:  if you get stuck, try running the code in [this notebook](hints/hint01.ipynb) and then think about what's going on some more._

We'll be thinking about this difference and similar ones in the rest of this part of the tutorial.  Why might Python lists and NumPy arrays behave differently this way?  What are the tradeoffs?

##  One last thing:  interactively finding more help

You've done a great job of finding some interesting NumPy functions by using `dir` and `help`.  But NumPy provides a function that makes it easier to find a relevant NumPy function based on keywords in its documentation string.  (If you've used the `apropos` shell command before,  you know that this can be extraordinarily useful.)   Since you don't yet know the name of the function to search documentation strings, we won't make you search documentation strings to find it --  just run the next cell to try it out.

In [None]:
np.lookfor("keyword search")

Well, look at that!  It looks like `numpy.lookfor` will search documentation for keywords.  In the next cell, use `lookfor` to find a function that determines whether or not two arrays may share memory.

In [None]:
np.lookfor("edit this cell and put your search string here")

Can you think of why this function might be useful?

Let's go on to the next notebook now!