<a href="https://colab.research.google.com/github/youminpark/NEUR265/blob/main/notebooks/Numpy_02_05_24.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Numpy

In this notebook, we'll encounter a package for scientific computing in Python: NumPy.

**At the end of this notebook, you'll be able to:**
* Install and import packages for Python
* Create NumPy arrays
* Execute methods & access attributes of arrays


## Importing packages

Before we can use numpy, we need to import it. We can also nickname modules when we import them.

The convention is to import `numpy` as `np`.

In [1]:
# Import packages
import numpy as np

# Use whos 'magic command' to see available modules
%whos

Variable   Type      Data/Info
------------------------------
np         module    <module 'numpy' from '/us<...>kages/numpy/__init__.py'>


## Numpy

**Numpy** is the fundamental package for scientific computing with Python. It'll allow us to work with bigger datasets more efficiently.

### Creating `numpy` arrays

A numpy **array** is a grid of values which are all the same type (they’re homogenous).

We can create a numpy array in a few different ways:

* from a Python list or tuples
* by using functions that are dedicated to generating numpy arrays, such as `arange`, `linspace`, `empty`,`zeroes`, etc.
* reading data from files

In [2]:
# Create a list
lst = [1,2,3,4,5]

# Make our list into an array
my_vector = np.array(lst)
print(type(my_vector))
print(my_vector)

<class 'numpy.ndarray'>
[1 2 3 4 5]


In [3]:
# If we give numpy a list of lists, it will create a matrix
my_matrix = np.array([lst,lst])
print(my_matrix)

[[1 2 3 4 5]
 [1 2 3 4 5]]


### Accessing attributes of numpy arrays

We can test shape and size either by looking at the attribute of the array, or by using the `shape()` and `size()` methods.

Other attributes that might be of interest are `ndim` and `dtype`.

In [9]:
# Check the type of vector (and the amount of bits it takes up)
print(my_vector.dtype)

# Check the dimensions of matrix --> (x,y,etc.)
print(my_matrix.ndim)

print(my_matrix.shape) #returns (rows, columns)

print(my_matrix.size) #total number of elements

int64
2
(2, 5)
10


Array data type is decided upon creation of the array.

You can explicitly define the data type by using `dtype= ` when you use `np.array()`. You can set the dtype to be `int, float, complex, bool, object`, etc

In [10]:
# my_matrix.dtype

my_complex_array = np.array([lst,lst],dtype='complex')
my_complex_array.dtype

dtype('complex128')

><b>Task:</b> Create an array of integers called `int_array` that is 2 rows x 3 columns. Access the shape and ndim attributes to confirm its size, and the dtype attribute to confirm that it is indeed an array of integers.



In [14]:
# Your code here
wackdata = [1,2,3]
int_array = np.array([wackdata, wackdata])
print(int_array.shape)
print(int_array.dtype)
print(int_array.ndim)
print(int_array.size)


(2, 3)
int64
2
6


### Indexing & slicing arrays

Indexing and slicing 1D arrays (vectors) is similar to indexing lists.

You can index matrices using `[row,column]`. If you omit the column, it will give you the whole row.

If you use `:` for either row or column, it will give you all of those values.

In [19]:
my_matrix[:,:]

#[:,:] is valid syntax

array([[1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5]])

**Booleans** are variables that store `True` or `False`. They are named after the British mathematician George Boole. He first formulated Boolean algebra, which are a set of rules for how to reason with and combine these values. This is the basis of all modern computer logic.

We can also index arrays using Boolean operators or lists. When we use Booleans, we can think of this as filtering the array. For example:

In [20]:
# Index with an operator
bool_matrix = my_matrix[my_matrix>2]  #filters out all values less than or equal to 2

# Index with a list of coordinates
list_matrix = my_matrix[[0,1],[1,3]]

print(my_matrix)
print(bool_matrix)
print(list_matrix)

[[1 2 3 4 5]
 [1 2 3 4 5]]
[3 4 5 3 4 5]
[2 4]


><b>Task:</b> Create a variable called `my_boolean_matrix` that is equal to your `bool_matrix` variable, but with `dtype = 'bool'`. Print your `my_boolean_matrix` variable. What happened?




In [21]:
# Your code here
my_boolean_matrix = bool_matrix.astype('bool') #python by default will evaluate positive numbers as true and neg ones as false
my_boolean_matrix

array([ True,  True,  True,  True,  True,  True])

We can also change values in an array similar to how we would change values in a list. Run the code below, and then re-run the code with the second line un-commented.

In [30]:

my_matrix[0] = 7      # changed entire row to 8
my_matrix[:,0] = 8    # ':' indicates all rows

print(my_matrix)

[[8 7 7 7 7]
 [8 2 3 4 5]]


### Benefits of using arrays

In addition to being less clunky & a bit faster than lists of lists, arrays can do a lot of things that lists can't. For example, we can add and multiply them. Alternatively, we can use the `sum` method to sum across a specific axis.

In [31]:
sum_list = [1,3,5] + [3,5,7]                              # adds second array in the same row
sum_array = np.array([1,3,5]) + np.array([3,5,7])
mult_array = np.array([1,3,5]) * np.array([3,5,7])

print(sum_list)
print(mult_array)

[1, 3, 5, 3, 5, 7]
[ 3 15 35]


In [34]:
this_array = np.array([[1,3,5],[3,5,7]])                 # comma separates by row
sum_rows = this_array.sum(axis=1)                        # sums by row; axis = 1 indicates that it is summed across columns
print(this_array)
print(sum_rows)

[[1 3 5]
 [3 5 7]]
[ 9 15]


### Numpy also includes some very useful array generating functions:

* `arange`: like `range` but gives you a useful numpy array, instead of an interator, and can use more than just integers
* `linspace` creates an array with given start and end points, and a desired number of points
* `logspace` same as linspace, but in log.
* `random` can create a random list
* `concatenate` which can concatenate two arrays along an existing axis
* `hstack` and `vstack` which can horizontally or vertically stack arrays

Whenever we call these, we need to use whatever name we imported numpy as (here, `np`).

In [35]:
# When using linspace, both end points are included!
np.linspace(0,147,10)

array([  0.        ,  16.33333333,  32.66666667,  49.        ,
        65.33333333,  81.66666667,  98.        , 114.33333333,
       130.66666667, 147.        ])

>**Task**: Create an array called `big_array` that has two rows. The first row should be a list of 10 numbers that are evenly spaced, and range from exactly 1 to 100. The second row should be a list of 10 numbers that begin at 0 and are exactly 10 apart (*hint*: use [np.arange](https://numpy.org/doc/stable/reference/generated/numpy.arange.html)). `big_array` should have a shape (2,10): two rows, and ten columns. Lastly, reassign the last value of each row in the array to be -100.

In [41]:
# Your code here
row1 = np.linspace(1,100,10)
row2 = np.arange(0,100,10)

big_array = np.array([row1, row2])
#big_array.shape

big_array[:,-1] = -100
big_array


array([[   1.,   12.,   23.,   34.,   45.,   56.,   67.,   78.,   89.,
        -100.],
       [   0.,   10.,   20.,   30.,   40.,   50.,   60.,   70.,   80.,
        -100.]])