# Getting used to numpy: a mini review + some extra tips/tools

For the first ~20 minutes of class, we'll be covering nump - you've already been introduced to numpy arrays, so this will be a refresher.

In the window below, import numpy as np

In [2]:
import numpy as np

Create a 1D numpy array "my_numbers" containing the numbers 1 through 10 in order. What different ways are there to do this?

In [142]:
my_numbers = np.

Now, using numpy's reshape( ) method, reshape my_numbers into a 2 x 5 array

In [152]:
#There are also multiple ways to do this!
my_numbers = 

Using your new, reshaped my_numbers array, use array indexing to print out the second value of the second element (the number 8)

In [None]:
print()

Using slicing techniques, print every alternating value from my_numbers's first element 

In [None]:
print()

Another convenient way of pulling values from a numpy array is numpy's where() function. If we wanted to pull __the indices__ of all values in my_numbers greater than 6, we could say: <code> np.where(my_numbers > 6) </code>

In [145]:
#go ahead and run this cell to see what the output looks like
np.where(my_numbers > 6)

(array([1, 1, 1, 1]), array([1, 2, 3, 4]))

This output is actually two arrays containing an __index__ of matching rows and columns! e.g. <code>my_numbers[1,1]</code> = 7, <code>my_numbers[1,2]</code> = 8, and so on.
<br> <br>
How would we create a new array called new_array that contains the actual values? Give it a try now
<br><br>


In [153]:
new_array = #fill in answer here
print(new_array)

It's important to note that the __where__ function has a lot more capability than just returning an index- it actually  can manipulate elements using the same logic as <code> [xv if c else yv
 for c, xv, yv in zip(condition, x, y)] </code> when passed <code> np.where(condition, [x,y])</code>

See example below:

In [167]:
#go ahead and run this cell to see output
example_arr = np.arange(1,20,1)

print('Before using np.where(): ','\n',example_arr)

example_arr = np.where(example_arr < 10, example_arr, example_arr-10)

print('After using np.where(), all indices where example_arr < 10 = False are subject to example_arr-10: ',
      '\n',example_arr)

Before using np.where():  
 [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
After using np.where(), all indices where example_arr < 10 = False are subject to example_arr-10:  
 [1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9]


Finally, while it was touched on last lecture, we'd like to make the point that one can also easily index arrays by creating boolean/mask arrays using conditional statements. In this case, the functionality would look like: <br> <br>
<code> new_array = old_array[old_array <font color = red>CONDITIONAL STATEMENT</font>]</code>

In the code below, we've creating an array __a__ that's equal the the numbers between 1 and 20. Use a conditional statement to create a mask array with index values, and reassign a to only equaling its values that are greater than or equal to 12.

In [178]:
a = np.arange(1,21,1)
a = #fill in answer here
print(a)


# Random number generation in numpy

Numpy comes with a "random" module (np.random) that contains a number of functions for producing random numbers. Some examples include:
<br> <br>
<code>np.random.random(tuple indicating output dimensions, e.g. (2,3) )</code> -> outputs random values from continuous distribuion over 0 to 1 in n-dimensions (specified by input tuple)
<br><br>
<code>np.random.randn(dimension1, dimension2, dimension 3...)</code> ->  random values from 0 to 1 over normal distribution
<br><br>
<code>np.random.randint(size, (lower bound, higher bound))</code> -> array of size x over specified lower/higher bounds
<br> <br><br>
Check out further numpy rng features here:

https://docs.scipy.org/doc/numpy-1.15.1/reference/routines.random.html






Using <code>np.random.random</code>, create two arrays of random numbers from a continuous distribution: array a (100x5), and array b (5x10)

In [180]:
#hint: don't forget that np.random.random takes in a tuple!
a = 
b = 

Next, using the __shape__ attribute of numpy arrays, print the shape of a and the shape of b. 
<br><br>


In [181]:
#hint: attributes, unlike methods, don't require the use of parentheses!
print()
print()

# numpy arrays: operators and a few more useful attributes


Rather than having a "len" like lists, numpy arrays have two attributes than can help users keep track of dimensions/elements: <code>size</code> and <code>shape</code>. <code> size </code> refers to the __number of elements__ within an array, while <code> shape </code> returns an array's dimensions.
<br> <br>
Below, print the size and shape of a and b. What do you get?

In [193]:
print() #a size
print() #a shape
print() #b size
print() #b shape







In the last lecture, Jacob covered some operators (+, -, etc.) that can be used on numpy arrays. We just wanted to make the point that matrix multiplication can be quickly accomplished using the <code> @ </code>. Below, create a matrix c that is the product of a and b and print the shapes of a, b, and c to demonstrate matrix multiplication.

In [195]:
print() #a shape
print() #b shape
c = 
print()#c shape






Arrays also have lots of useful methods associated with them that perform various functions, including:
<br><br>
<code> np.min()</code> <- returns minimum value of array<br> <br>
<code> np.max()</code> <-returns max value of array<br> <br>
<code> np.mean()</code> <-returns mean value of array<br> <br>
<code> np.astype()</code> <- casts all data types contained in array to specified data type, eg. <code> =astype(int)</code>

For a complete list of attributes and methods associated with numpy arrays, check out this documentation:
 <br> <br>
    https://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html#array-methods
   <br> <br>
Using this information, find the minimum value of a.
    

In [206]:
print()

## A note on axes in numpy arrays:

We've just told you about some of the neat methods you can use with numpy arrays; however, we haven't covered situations in which you might want to find the max or min of a __single column__ or __row__ of an array (we refer to these as __axes__).

In this case, you'll want to specify *within* the method the axis that you would like to operate on. Let's take our 100x5 array "a" as an example. Let's say that we actually want to take the mean across the __first value__ of each "column", and return a vector of length 5 containing each of those values. In this case, we would need to specify axis = 0 *as input to the function*, like so:
<br><br>
<code> np.mean(<font color=red>ARRAY</font>,axis=0)</code>
<br><br> Conversely, specifying "axis = 1" would return a vector of length 100 with the mean value of each "row".
Below, create an array "a_mean" that's equal to the mean of array "a" across axis 0.

In [203]:
a_mean = 
print(a_mean)


# Huzzah! You've made it through part 1 of the lecture! We're going to go through some slides for a bit, and then you'll be prompted to import __pandas__. Go ahead and pick up here whenever that happens :)
<br><br><br><br><br>

In the cell below, import <code> pandas </code> as <code> pd</cd>

In [210]:
import pandas as pd

Below, find the list "names_list", consisting of five random names.

In [208]:
names_list = ['Aragorn', 'Legolas', 'Gimli', 'Galadriel', 'Eowyn']

Use panda's Series() method, with names_list as input, to create a series assigned to variable "names_series" Print output.

In [None]:
names_series =
print()

Looking at our printed series, it's pretty clear that this no longer looks like a list. There are values all along the left side!! This is the series __index__. Because we didn't specify anything in particular, pandas automatically created an index for us beginning with 0 and continuing through the length of the list we passed it; however, we have control over the index! We can specify values *within* the input we provide the Series() function, like this:
<br><br>
<code> names_series = pd.Series(names_list, index = [5,6,7,8,9])</code>
<br><br> __OR__ If we create a series from a __dictionary__, the index will automatically contain key values from the dict.

<br><br><br>
So how do we make a DataFrame? There are actually myriad ways!! This is __one of many__ - and not the most efficient!
<br> Create a dictionary lotr_data, with the key 'Beings'assigned a list containing the strings 'human','elf','dwarf','elf', and 'human', and the key 'Age', assigned a list containing the values 87, 2931, 139, 7000, and 24

In [227]:
lotr_data = 


Now, create a dataframe lotr_df by passing lotr_data into the pandas function DataFrame()

In [233]:
lotr_df = #create DataFrame here
print(lotr_df)


We've been using Python's built-in print function throughout this class to look at data - but DataFrames are unique in that they are actually nicer to look at as output! Try using the .head() method associated with DataFrames - head() will normally return the first five rows of a DataFrame, but can take any number as input. In this case, just show the first 3 rows:

Let's say we want to view just the 'Beings' column of our DataFrame - this can be accomplished via the simple command <code> DataFrame[<font color=red>'COLUMN NAME'<font>]</code>. Try looking at just Beings below.

Interestingly, the columns of a DataFrame are actually also __attributes__, meaning they can be accessed using <code> DataFrame.<font color = red>COLUMNS</font></code> notation. Try pulling out the Age column this way below.

Egads! It looks like we've forgotten to insert our character names from earlier into our DataFrame! Thankfully, using the same indexing that we see above, this is quite easy to accomplish in a DataFrame - simply create a *new* column with the command <code>DataFrame['NEW COLUMN NAME'] = ________ </code> Below, insert your names_series Series into lotr_df in a column called 'Character Name', and take a look at your updated DataFrame

 This is all well and good, but let's say we get our hands on a *slightly* more complete dataset, and want to import it. pandas has built in functions to read/import __many__ different data types, including (but not limited to) numpy arrays, .xlsx, and .csv files. 
 <br><br> Use the pandas read_csv function, which will take a csv file at a given directory and import it into a DataFrame, to import the lotr_char_age.csv file that you should have put into the same folder as this notebook at the beginning of class. Re-assign your lotr_df DataFrame to the output of this function.

In [None]:
lotr_df =

What a lovely DataFrame we've created! It doesn't have a *lot* more information, but it should be enough for us to quickly go over how to pull the data your want out of your DataFrame.

In [52]:
my_random_array = np.random.randn(100)
print(my_random_array)

[ 0.67866349  0.6974807  -1.74709009  1.29672406  0.49882626  0.98887658
 -0.17751463 -1.07238797 -0.05591887 -1.39110012  1.59796597  0.08068582
 -0.69362336  1.22925948  0.24712293 -0.77193723 -0.93869541 -0.07732541
 -0.69299071  0.42640159 -0.59010907  1.21257174  1.33632952  1.16498618
 -0.30774921 -0.27269119 -0.70685846  0.60363261 -0.24760296 -2.69380227
 -0.41017918 -0.1455258   0.46809844 -0.04013323  0.67853182 -0.75973737
  1.47592263 -1.10757199  0.02292001 -0.22632626 -0.06664435 -2.45995426
  0.64675548  1.48106383  0.68417371 -0.17641314 -1.07998282  0.9024132
  0.57159997 -1.40917168 -0.39993998  1.30176932 -0.36464923  1.23236001
  1.82680444  0.17498237  0.18623439 -0.09403949 -0.54808126 -0.28497478
 -0.77414718 -1.60821906 -0.42483949 -0.30910183 -1.18246645 -0.35091168
 -0.05194792  0.56553589  0.63056925 -0.74563268  0.9533396  -0.37825757
  0.48221533 -0.41765164  1.99181698 -1.74316776  0.93050269 -1.55250337
 -0.25139287 -0.44956909  0.26097001  1.55814034  2.

In [36]:
my_numbers = my_numbers.reshape(2,5)
print(my_numbers)

[[ 1  2  3  4  5]
 [ 6  7  8  9 10]]


In [37]:
my_numbers = np.reshape(my_numbers,(2,5))
print(my_numbers)

[[ 1  2  3  4  5]
 [ 6  7  8  9 10]]


In [49]:
print(my_numbers[0,::2])

[1 3 5]


In [191]:
a = np.random.random((100, 5))
b = np.random.random((2, 3, 5, 10))

In [77]:
#FOR EXPLAINING INDEXING - ALSO AN OPTION
a[np.where(a > 0.5)]
#INDEXING BY ARRAYS OF INTEGERS FOR EACH AXIS

array([0.72255993, 0.58319049, 0.536816  , 0.62196764, 0.54430902,
       0.87900356, 0.84051336, 0.93000772, 0.72645941, 0.71042466,
       0.60517229, 0.64924397, 0.8691445 , 0.61851871, 0.50893645,
       0.71247617, 0.59485699, 0.73464286, 0.85506674, 0.83222676,
       0.92908299, 0.87464149, 0.90559444, 0.52557928, 0.84203377,
       0.50352238, 0.94107043, 0.84602192, 0.64044767, 0.7772278 ,
       0.80422655, 0.51143342, 0.96493792, 0.52775755, 0.57384932,
       0.665076  , 0.69274729, 0.86428971, 0.50899522, 0.54244077,
       0.84500358, 0.62886723, 0.91703817, 0.65069582, 0.74789592,
       0.51517821, 0.94663423, 0.869826  , 0.54633506, 0.84749555,
       0.96879359, 0.83275489, 0.70760488, 0.90943558, 0.7304495 ,
       0.77579979, 0.92386409, 0.97986295, 0.57365481, 0.95220589,
       0.7087915 , 0.87114297, 0.66034492, 0.92712909, 0.9328133 ,
       0.94142945, 0.6878104 , 0.63670917, 0.90134055, 0.74415877,
       0.90835335, 0.5679191 , 0.90698119, 0.73938741, 0.71850

In [67]:
a[None, :] * b[:, None]

array([[9.58708520e-02, 1.71298326e-01, 5.08323063e-02, 3.93229843e-03,
        1.16692581e-01],
       [1.94583407e-01, 3.47674096e-01, 1.03171330e-01, 7.98115390e-03,
        2.36844040e-01],
       [4.54270338e-01, 8.11672648e-01, 2.40861623e-01, 1.86326344e-02,
        5.52931126e-01],
       [2.32604698e-01, 4.15609066e-01, 1.23330846e-01, 9.54065880e-03,
        2.83122992e-01],
       [5.05781909e-01, 9.03711528e-01, 2.68173907e-01, 2.07454650e-02,
        6.15630247e-01],
       [1.78159567e-02, 3.18328615e-02, 9.44631398e-03, 7.30750349e-04,
        2.16853186e-02],
       [7.67211207e-03, 1.37082327e-02, 4.06788031e-03, 3.14684115e-04,
        9.33838115e-03],
       [2.93879632e-01, 5.25092745e-01, 1.55819826e-01, 1.20539496e-02,
        3.57705934e-01],
       [1.17994069e-01, 2.10827233e-01, 6.25624008e-02, 4.83971805e-03,
        1.43620633e-01],
       [4.17768956e-01, 7.46453392e-01, 2.21507988e-01, 1.71354710e-02,
        5.08502184e-01]])

In [68]:
np.outer(b, a)

array([[9.58708520e-02, 1.71298326e-01, 5.08323063e-02, 3.93229843e-03,
        1.16692581e-01],
       [1.94583407e-01, 3.47674096e-01, 1.03171330e-01, 7.98115390e-03,
        2.36844040e-01],
       [4.54270338e-01, 8.11672648e-01, 2.40861623e-01, 1.86326344e-02,
        5.52931126e-01],
       [2.32604698e-01, 4.15609066e-01, 1.23330846e-01, 9.54065880e-03,
        2.83122992e-01],
       [5.05781909e-01, 9.03711528e-01, 2.68173907e-01, 2.07454650e-02,
        6.15630247e-01],
       [1.78159567e-02, 3.18328615e-02, 9.44631398e-03, 7.30750349e-04,
        2.16853186e-02],
       [7.67211207e-03, 1.37082327e-02, 4.06788031e-03, 3.14684115e-04,
        9.33838115e-03],
       [2.93879632e-01, 5.25092745e-01, 1.55819826e-01, 1.20539496e-02,
        3.57705934e-01],
       [1.17994069e-01, 2.10827233e-01, 6.25624008e-02, 4.83971805e-03,
        1.43620633e-01],
       [4.17768956e-01, 7.46453392e-01, 2.21507988e-01, 1.71354710e-02,
        5.08502184e-01]])

In [71]:
np.all(np.arange(10) == np.arange(10))

True

In [None]:
np.isclose?, np.allclose

In [None]:
scipy