# Introduction 

Throughout this entire notebook you should be experimenting with the code in the non-text cells. A great way to begin to get a feel for Python is by playing with it. So have some fun by changing the values in the cells and then running them again with Shift-Enter. Before you do, think about what you expect the output to be, and make sure your intuition matches up with what you run. If it doesn't, take some time to think about what happened so you can hone your intuition.

At the end of each section there will be some questions to help further your understanding. Remember, in Python we can always manually test code by running it; however, you should try to think about the answers to these questions before you run some code. This way you can check and verify your understanding of the section's topic.

## A Little bit more of Numpy Arrays

Okay, so now that we know a little bit about a `numpy array`, what else can we do with it? There are actually quite a number of things we can do. We can index into them, perform calculations, ask for aggregation type metrics, etc. 

#### Indexing 

Let's begin by indexing into them. With `numpy arrays`, we don't have the `.loc[]`, `.iloc[]`, or `.ix[]` methods like we do on a DataFrame - we simply index into them like we would a list. It's effectively a multidimensional list, though. Therefore, we can pass it multiple indexing values. Let's take a look. 

In [1]:
import numpy as np

In [2]:

# Reshape will reshape the data to the shape that you tell it to (here 5 rows, 4 columns). 
range_arr = np.arange(0, 20, 1).reshape(5, 4)
range_arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19]])

In [3]:
range_arr[:, 2] # Grab every row, but only the element at index 2 in those rows. 

array([ 2,  6, 10, 14, 18])

In [4]:
range_arr[0:2] # With no second index, this defaults to taking the rows. 

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

In [5]:
range_arr[0:2, 1:3] # The first set of numbers refers to the rows to grab, the 
                    # second set the columns.  

array([[1, 2],
       [5, 6]])

#### Other methods

Now let's look at some of the other methods that are available. Again, there is a ton we can do, and we're aiming here to at least get your eyes on a lot of the things that are possible. We also want to give you a notebook here that you can look back at to see what is possible (Google is also amazing for this). 

In [6]:
# Remember what this looks like. 
range_arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19]])

#### We can perform sums in any direction with a method on the arrays.

In [7]:
range_arr.sum(axis=0) # Sum along the rows (i.e. get column totals)

array([40, 45, 50, 55])

In [8]:
range_arr.sum(axis=1) # Sum along the columns (i.e. get row totals)

array([ 6, 22, 38, 54, 70])

In [9]:
range_arr.sum() # Get sum of all elements in numpy array. 

190

#### We can also grab the mean, standard deviation, max, and min values along the rows (i.e. for the columns). We could also do this along the columns, or for the array as a whole (just like we did with `.sum()`).

In [10]:
range_arr.mean(axis=0)

array([  8.,   9.,  10.,  11.])

In [11]:
range_arr.std(axis=0)

array([ 5.65685425,  5.65685425,  5.65685425,  5.65685425])

In [12]:
range_arr.max(axis=0)

array([16, 17, 18, 19])

In [13]:
range_arr.min(axis=0)

array([0, 1, 2, 3])

#### If we want to instead grab the **index** at which those min and max values occur (either along the rows or columns), then we can use the `argmin()` and `argmax()` methods available on our numpy array. 

In [14]:
range_arr.argmin(axis=0) # We see that the mins of each column occur at row 1 (index 0).

array([0, 0, 0, 0])

In [15]:
range_arr.argmax(axis=0) # We see that the maxes of each column occur at row 5 (index 4).

array([4, 4, 4, 4])

In [16]:
range_arr.argmin() # Here we get the index of the overall minimum (the 0th index).

0

In [17]:
range_arr.argmax() # Here we get the index of the overall maximum (the last index). 

19

#### We can get the cumulative sum or product with the following. 

In [18]:
range_arr.cumsum(axis=0)  # Here it gets the cumsum along the rows (i.e. from top to bottom)

array([[ 0,  1,  2,  3],
       [ 4,  6,  8, 10],
       [12, 15, 18, 21],
       [24, 28, 32, 36],
       [40, 45, 50, 55]])

In [19]:
range_arr.cumprod(axis=0) # Gets the cumprod along the rows

array([[    0,     1,     2,     3],
       [    0,     5,    12,    21],
       [    0,    45,   120,   231],
       [    0,   585,  1680,  3465],
       [    0,  9945, 30240, 65835]])

In [20]:
# We can flatten our arrays as follows. 
range_arr.flatten()
range_arr.ravel()  # They look the same in this case. 

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

##### Numpy Array Method Questions

Given our array above,  
 `array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19]])`: 
       
1. How would we find the mean along the columns?
2. How would we find the max along the columns?
3. Using bracket notation, how would we grab the first row in the array (e.g. [0, 1, 2, 3])?
4. Again using bracket notation, how would we grab the second element of the first row in the array (e.g. 1)?

## A brief look at a cool Numpy method

The majority, if not all, of the methods that we looked at for numpy arrays are available on Pandas columns. They might have some slightly different naming conventions (`idxmax` on a column versus `argmax` on a numpy array, for example), but since Pandas DataFrames are built on numpy arrays, the methods available on numpy arrays largely coincide with the methods available on Pandas DataFrames. 

Many of these methods are available as functions on the `numpy` module itself, as well. Just like we can call the `argmax()` method on a numpy array, we can call `np.argmax()` and pass in a list or tuple. Before we move back to DataFrames, let's look at one last method that is available in `numpy`, `np.where()`. `np.where()` can help us to find what elements in a numpy array meet some condition. 

In [21]:
my_ndarray = np.array([2, 4, 6, 8, 24, 3, 8, 9, 12])

In [22]:
print(np.where(my_ndarray <= 2)) # Returns the indices where the data meet the condition. 
print(np.where(my_ndarray == 8)) # Returns the indices where the data meet the condition. 
print(np.where(my_ndarray > 6)) # Returns the indices where the data meet the condition. 

(array([0]),)
(array([3, 6]),)
(array([3, 4, 6, 7, 8]),)


##### `np.where()` Questions

Given the above array: 
    
1. How would we find the indices where the values are equal to 9?
2. How would we find the indices where the values are greater than or equal to 12?