<a href="https://colab.research.google.com/github/UAPH451551/PH451_551_Sp23/blob/main/Exercises/PythonRefreshers_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Numpy and Pandas (Array-Like Data Structures)
Some resources
- [Numpy Documentation](https://numpy.org/doc/stable/user/absolute_beginners.html)
- [Pandas Documentation](https://pandas.pydata.org/docs/getting_started/index.html#getting-started)

## Numpy

Short for "numeric python", this library is **built around storing data in arrays** <br>
**and performing numeric operations on items in the arrays**. Arrays can be <br>
**two-dimensional** like a list [0, 1, 2] *or* they can also be **multi-dimensional** <br>
like lists of lists or lists of lists of lists, etc.

The standard nickname convention for numpy is to import it as **np**.

In [None]:
# can shorten module names
import numpy as np

Numpy has support for converting several common data types to arrays. **Below, we** <br>
**are converting the python list [1,2,3,4,5] to a numpy array** with the same <br>
values. Notice how it prints as array([1,2,3,4,5]).

In [None]:
numpy_array = np.array([1,2,3,4,5])
numpy_array

array([1, 2, 3, 4, 5])

Numpy has better support for doing numeric operations than lists do. For <br>
example, **we can easily add a value to every item in the array** at once.

In [None]:
numpy_array + 1

array([2, 3, 4, 5, 6])

Similar to the **range()** function, numpy has a function called **arange** which <br>
creates an array of [0,1,2,...,N].

In [None]:
size = 1000000  
   
# declaring arrays
array1 = np.arange(size)
array2 = np.arange(size)

If we multiply arrays by one another, **items of matching indices will multiply**. <br>
Below we are multiplying all items from indices 1 to 10-1=9. Notice how these <br>
**indices start at 0 and work similarly to list indices**.

In [None]:
array1[1:10] * array2[1:10]

array([ 1,  4,  9, 16, 25, 36, 49, 64, 81])

We can also use operations like **dot()** which **takes the dot product**. <br>
Reminder: [2, 7, 9] dottted with [3, 5, 1] equals 2 \* 3 + 7 \* 5 + 9 \* 1 <br>

In [None]:
# dot product
np.dot(array1[1:10], array2[1:10])

285

A **matrix** is a name for **an array that has multiple dimensions**. Below is an <br>
example of a 3x3 matrix.

In [None]:
# matrices
matrix = np.array([[1,2,3],[4,5,6],[7,8,0]])
matrix

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 0]])

We can also **select items from an array or matrix using indices** like we did with <br>
lists. <br>

matrix[0] gets the first row<br>
matrix[0,0] get the first column from the first row<br>
matrix[:,0] gets items in all rows but only for the first column<br>
The symbol : on its own means get all items.<br>

In [None]:
# numpy arrays
print(matrix[0])
print(matrix[0,0])
print(matrix[:,0])

[1 2 3]
1
[1 4 7]


Numpy does not really care about row/column vectors. In general **numpy can** <br>
**automatically pick axes that make the most sense for your operation**. Be <br> 
careful with this as it means numpy will sometimes successfully do something <br> 
even if it's not what you intended. Here's an example of 3x3 dot products.

\begin{align}
\begin{pmatrix}
7 & 8 & 0 \\ 
4 & 5 & 6 \\ 
1 & 2 & 3 \\ 
\end{pmatrix}
\ 
\begin{pmatrix}
1 \\
2 \\
3 \\
\end{pmatrix}
&=
\begin{pmatrix}
8 \\
17 \\
8 \\
\end{pmatrix}
\\
\begin{pmatrix}
1 &  2 &  3
\end{pmatrix}
\
\begin{pmatrix}
7 & 8 & 0 \\ 
4 & 5 & 6 \\ 
1 & 2 & 3 \\ 
\end{pmatrix}
&=
\begin{pmatrix}
18 & 21 & 6 
\end{pmatrix}
\end{align}

In [None]:
# matrix vector multiplication
print(np.dot(matrix, array1[0:3])) # 
print(np.dot(array1[0:3], matrix))

[ 8 17  8]
[18 21  6]


In [None]:
# matrix matrix multiplication
np.dot(matrix, matrix)

array([[30, 36, 15],
       [66, 81, 42],
       [39, 54, 69]])

Numpy as has a library called "random" that's great for generating different <br>
types of random data. What the below code says is, **using the random integer** <br>
**(randint) function** from the random library in numpy, I want to **create random** <br>
**integers with values less than 10 until I've filled up a 3x4x5 array**.

In [None]:
array_3D = np.random.randint(10, size=(3,4,5))
array_3D  # 3D matrix

array([[[6, 5, 1, 4, 4],
        [3, 4, 5, 2, 3],
        [3, 9, 4, 2, 3],
        [9, 1, 4, 8, 1]],

       [[9, 6, 2, 9, 2],
        [9, 0, 5, 2, 1],
        [6, 6, 3, 8, 8],
        [6, 2, 7, 4, 6]],

       [[0, 7, 5, 6, 0],
        [7, 7, 8, 2, 8],
        [8, 2, 5, 2, 4],
        [3, 5, 8, 1, 0]]])

ndim tells us how many dimensions/axes the array has

In [None]:
array_3D.ndim

3

shape tells us the exact size of those dimensions/axes in order

In [None]:
array_3D.shape # somethin python lists can't do

(3, 4, 5)

\begin{equation}
M_{ijk} \to M_{ikj}
\end{equation}

**Transpose is a very important function to know** for machine learning. Often <br>
data doesn't come in the exact format or order that you want it in. Being able <br>
to **change the order of the axes** is important. Here we're taking axes [0,1,2] <br>
and reordering them like [0,2,1]. In other words, we're swapping our last two <br> 
axes.

In [None]:
np.transpose(array_3D, axes=[0,2,1])

array([[[6, 3, 3, 9],
        [5, 4, 9, 1],
        [1, 5, 4, 4],
        [4, 2, 2, 8],
        [4, 3, 3, 1]],

       [[9, 9, 6, 6],
        [6, 0, 6, 2],
        [2, 5, 3, 7],
        [9, 2, 8, 4],
        [2, 1, 8, 6]],

       [[0, 7, 8, 3],
        [7, 7, 2, 5],
        [5, 8, 5, 8],
        [6, 2, 2, 1],
        [0, 8, 4, 0]]])

We can also use **.reshape()** to swap axes or **change the shape of our data**. This <br>
is often **less safe than transpose** because it can accept shapes that don't make <br>
much sense given your data. In other ways it **can be more useful like taking 9** <br> 
**items and reshaping it as 3x3 which you can't do with transpose**. Be careful <br> 
when using this.

In [None]:
array1[0:5]

array([0, 1, 2, 3, 4])

In [None]:
array1[0:5].reshape(5,1)

array([[0],
       [1],
       [2],
       [3],
       [4]])

In [None]:
matrix

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 0]])

Sometimes we may want to flatten our matrix. We can do this with reshape or the <br>
flatten function.

In [None]:
matrix.reshape(9)

array([1, 2, 3, 4, 5, 6, 7, 8, 0])

In [None]:
matrix.flatten()

array([1, 2, 3, 4, 5, 6, 7, 8, 0])

Just like how **we could append, delete and overwrite list values**, we can do <br>
similar operations with numpy arrays.

In [None]:
array3 = np.array([3,5,7])

In [None]:
np.append(array3, 1)  # does not change list3, but returns new list

array([3, 5, 7, 1])

In [None]:
array3 = np.append(array3, [1,2,3])
array3

array([3, 5, 7, 1, 2, 3])

Below we show the delete function. Here we're using it to take array 3, and <br>
return a copy of it with the value at index 1 removed.

In [None]:
np.delete(array3, 1)  # delete second element. Also does not change array

array([3, 7, 1, 2, 3])

Since **numpy functions like append and delete make copies of numpy arrays**, you <br>
need to save them to a new array or overwrite your old array to save them.

In [None]:
array3

array([3, 5, 7, 1, 2, 3])

## Before getting to an activity, let's review some concepts

We can create a function using:<br>
def func_name(parameter1, parameter2): <br><br>
We can then use that function in the following way:<br>
func_name(a, b)


In [None]:
def function(parameter1, parameter2):
  print("do something")
function(1, "A")

do something


We have reviewed for loops and how we can use them to iterate but there is <br>
also a type of loop called a while loop. **While loops execute code inside the** <br>
**loop *while* a statement is True**. The following code says, while num is not <br>
equal to 1, subtract 1 from it then print it.

In [None]:
num = 10
while num != 1:
  num -= 1
  print(num)

9
8
7
6
5
4
3
2
1


The **% or modulo function gets the remainder of division**. This is often **used** <br>
**to execute a function only every so many steps**. For example, in the following <br> 
code, we're printing only every 1,000 steps. The exact statement is, **if the** <br>
**step number divided by 1000 has a remainder of 0, then print the step number**.

In [None]:
for i in range(5000):
  if i % 1000 == 0:
    print(i)

0
1000
2000
3000
4000


**We can also use modulo to check other numeric properties** such as even or odd by <br>
by checking the remainder of dividing by 2.

In [None]:
for i in range(6):
  if i % 2 == 1:
    print(i)

1
3
5


Arrays can be constructed from lists or from single items.

In [None]:
arr1 = np.array(1)
arr2 = np.array([1,2])
print("arr1:", arr1)
print("arr2:", arr2)

arr1: 1
arr2: [1 2]


If we append an item to an array, it will create a new array which is a copy of <br>
the original array with the new item appended. We can add an item to a list by <br>
appending in place in the following way:

In [None]:
print(np.append(arr1, 2))
print("arr1 after running append without assignment:", arr1)
arr1 = np.append(arr1, 2)
print("arr1 after running append with assignment:   ", arr1)

[1 2]
arr1 after running append without assignment: 1
arr1 after running append with assignment:    [1 2]


### Activity

**The Collatz conjecture is a simple math statement. It says, start from any** <br>
**number. If that number is odd, multiply it by 3 then add 1. If it's even,**<br>
**divide it by 2. Eventually that number will go to 1.**<br><br>
**Example:**<br> 
**3 * 3 + 1 = 10** <br>
**10 / 2 = 5** <br>
**5 * 3 + 1 = 16** <br>
**16 / 2 = 8 => 4 => 2 => 1**<br><br>
**Create a function called collatz(start) which will take a number called start** <br>
**and execute the Collatz conjecture until it reaches 1. For each step, append** <br>
**it to the end of an array.** <br><br>
**Below, we'll review some concepts you might needfor this activity**.

**Required code:**<br>

```python
def collatz(number):
    array = np.array(number)
    while {write an ending condition here}:
        if number % 2 == 0:
            {write your collatz operation for even numbers}
        else:
            {write your collatz operation for odd numbers}
        {write a statement appending your current value to array}    
    print(array)
collatz(7)
```

In [None]:
#Your code goes here:
########################


########################

In [None]:
#EXAMPLE OUTPUT

[ 7. 22. 11. 34. 17. 52. 26. 13. 40. 20. 10.  5. 16.  8.  4.  2.  1.]
