# Numpy basics

**Table of contents**<a id='toc0_'></a>    
- 1. [Numerical python (numpy)](#toc1_)    
  - 1.1. [Basics](#toc1_1_)    
  - 1.2. [Math](#toc1_2_)    
  - 1.3. [Indexing](#toc1_3_)    
  - 1.4. [Multidimensional arrays](#toc1_4_)    
  - 1.5. [List of good things to know](#toc1_5_)    
  - 1.6. [Small quiz](#toc1_6_)    
- 2. [Extra: Memory](#toc2_)    

<!-- vscode-jupyter-toc-config
	numbering=true
	anchor=true
	flat=false
	minLevel=2
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

**[Numpy](https://numpy.org/)** is the main package for handling numerical data

**Further:** [Very detailed numpy tutorial](https://www.python-course.eu/numpy.php)

## 1. <a id='toc1_'></a>[Numerical python (numpy)](#toc0_)

* So far we have mainly used **base Python**. The set of operations and containers baked into the core of Python. 
* Now we are importing the package **numpy** (which you got through the Anaconda distribution).  
* Numpy is **the** package for handling data going into mathmatical operations (base Python isn't great there). 
* It is build around it's container, the **ndarray**, for which there is a bunch of especially made routines. 
* The routines (multiplications, matrix algebra, etc) are highly efficient. Implemented in C-code. 
* A list (which is base Python) and a ndarray are thus different things that behave differently. 

In [1]:
import numpy as np # import the numpy module

A **numpy array** is a lot like a list but with important differences:

1. Elements must be of **one homogenous type** (ints, floats...).
2. A **slice returns a view** rather than copy of the content.
3. A numpy array **cannot change size** after creation (there is no append function).

Thus, numpy arrays are less flexible things than lists.  
But that is what allows them to use more effecient mathmatical routines. 

### 1.1. <a id='toc1_1_'></a>[Basics](#toc0_)

Numpy arrays can be **created from lists** and can be **multi-dimensional**:

In [2]:
A = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) # one dimension
B = np.array([[3.4, 8.7, 9.9], 
              [1.1, -7.8, -0.7],
              [1. , 0.5 , -5.7],
              [4.1, 12.3, 4.8]]) # two dimensions

print('A is a', type(A),'and B is a', type(B)) # type
print('As data is',A.dtype,'Bs data is' , B.dtype) # data type
print('A dimensions:', A.ndim, 'B dimensions:',B.ndim) # dimensions
print('shape of A:',A.shape,'shape of B:',B.shape) # shape (1d: (columns,), 2d: (row,columns))
print('size of A:',A.size,'size of B:',B.size) # size

A is a <class 'numpy.ndarray'> and B is a <class 'numpy.ndarray'>
As data is int32 Bs data is float64
A dimensions: 1 B dimensions: 2
shape of A: (10,) shape of B: (4, 3)
size of A: 10 size of B: 12


Notice that the matrix `B` was a bunch of stacked **rows**. <br> 
Python is row major language which means the elements of the ndarray are stored in memory as concatenated rows (Fortran, MATLAB, and Julia are examples of column major languages), see the extra section at the bottom for more on this. 

**Slicing** a numpy array returns a **view**, which is a **reference** to the part of the array that was sliced out.  
*Remember that views and copies can be extracted from containers.* <br> 
*Views allows for changing the original object when the view is changed, copies does not*

In [13]:
A = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
V = A[2:6]         # V is a reference to a slice of a A

# Make changes in V and note what happens in A
V[0] = 0; V[1] = 0  # The ; allows you to write multiple code-lines in a single line
print('V =',V)
f'A =,{A}changed'

# If C was a copy, A wouldn't have changed
C = A.copy()
C[0] = 99
print('A =',A, 'did not change') 

V = [0 0 4 5]
A = [0 1 0 0 4 5 6 7 8 9] did not change


Numpy array can also be created using numpy functions:

In [17]:
print(np.ones((2,3)))
print(np.zeros((4,2)))
print(np.eye(4,))
print(np.linspace(0,100,6)) # linear spacing

[[1. 1. 1.]
 [1. 1. 1.]]
[[0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]]
[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]
[  0.  20.  40.  60.  80. 100.]


**Tip 1:** Try typing a comma inside a function

**Tip 2:** Try to write `?np.linspace` in a cell

In [4]:
?np.linspace

[1;31mSignature:[0m
[0mnp[0m[1;33m.[0m[0mlinspace[0m[1;33m([0m[1;33m
[0m    [0mstart[0m[1;33m,[0m[1;33m
[0m    [0mstop[0m[1;33m,[0m[1;33m
[0m    [0mnum[0m[1;33m=[0m[1;36m50[0m[1;33m,[0m[1;33m
[0m    [0mendpoint[0m[1;33m=[0m[1;32mTrue[0m[1;33m,[0m[1;33m
[0m    [0mretstep[0m[1;33m=[0m[1;32mFalse[0m[1;33m,[0m[1;33m
[0m    [0mdtype[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0maxis[0m[1;33m=[0m[1;36m0[0m[1;33m,[0m[1;33m
[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return evenly spaced numbers over a specified interval.

Returns `num` evenly spaced samples, calculated over the
interval [`start`, `stop`].

The endpoint of the interval can optionally be excluded.

.. versionchanged:: 1.16.0
    Non-scalar `start` and `stop` are now supported.

.. versionchanged:: 1.20.0
    Values are rounded towards ``-inf`` instead of ``0`` when an
    integer ``dtype`` is specified. The old behavior can
    s

### 1.2. <a id='toc1_2_'></a>[Math](#toc0_)

Numpy arrays are designed for **mathematical operations**

Operations * + - / happen element-by-element between two ndarrays.

In [30]:
A = np.array([[1,0],[0,1]])
B = np.array([[2,2],[2,2]])


print('A\n',A,'\nB\n',B)
print('A+B\n',A+B,'\n') # Add 2 numpy arrays element-by-element

A
 [[1 0]
 [0 1]] 
B
 [[2 2]
 [2 2]]
A+B
 [[3 2]
 [2 3]] 



In [31]:
# More examples
print('A-B\n',A-B,'\n')
print('A*B\n',A*B,'\n') # element-by-element product
print('A/B\n',A/B,'\n') # element-by-element division
print('A@B\n',A@B,'\n') # matrix product

A-B
 [[-1 -2]
 [-2 -1]] 

A*B
 [[2 0]
 [0 2]] 

A/B
 [[0.5 0. ]
 [0.  0.5]] 

A@B
 [[2 2]
 [2 2]] 



**Broadcasting**  
* If arrays does not fit together **broadcasting** is applied.  
* When broadcasting, numpy uses the dimensions that **do fit** (if possible).  
* **Simple case:** multiplying a scalar (an array with 1 element) with a larger ndarray.  

In [32]:
A = np.array([[10, 20, 30], 
              [40, 50, 60]]) # shape = (2,3) 
              
B = np.array([1, 2, 3]) # shape = (3,) = (1,3)

print('A\n',A, A.shape)
print('B\n',B, B.shape) # Notice the shape 'transformation' column vector!
print('\nMultiplication along columns')
print(A*B) 

A
 [[10 20 30]
 [40 50 60]] (2, 3)
B
 [1 2 3] (3,)

Multiplication along columns
[[ 10  40  90]
 [ 40 100 180]]


Another example. Note that B above did not have a 2nd dimension.  
C has explicitely 2 rows and 1 columun when created as such.

In [34]:
C = np.array([[1],[2]]) 

print(C, C.shape, '\n') 
print(A*C,'\n') # every column is multiplied by C

[[1]
 [2]] (2, 1) 

[[ 10  20  30]
 [ 80 100 120]] 



If you want to e.g. add arrays where broadcasting is not possible, consider **np.newaxis**. <br>
`np.newaxis` also allows you to be more explict about the operations you want to perform.

In [36]:
A = np.array([1, 2, 3]) # Is only 1D, shape = (3,)
B = np.array([1,2]) # Is only 1D, shape = (2,)

# You cannot broadcast B on A, because neither have 2 dimensions.
# Therefore, use newaxis
print(A[:,np.newaxis], A[:,np.newaxis].shape, '\n') # Is now (3,1)
print(B[np.newaxis,:], B[np.newaxis,:].shape, '\n') # Is now (1,2)

print(A[:,np.newaxis]*B[np.newaxis,:], '\n') # A is column vector, B is row vector
print(A[np.newaxis,:]*B[:,np.newaxis]) # A is row vector, B is column vector

[[1]
 [2]
 [3]] (3, 1) 

[[1 2]] (1, 2) 

[[1 2]
 [2 4]
 [3 6]] 

[[1 2 3]
 [2 4 6]]


**More on broadcasting:** [Documentation](https://docs.scipy.org/doc/numpy-1.13.0/user/basics.broadcasting.html).

A lot of **mathematical procedures** can easily be performed on numpy arrays.

In [42]:
A =  np.array([3.1, 2.3, 9.1, -2.5, 12.1])
print(np.min(A)) # find minimum
print(np.argmin(A)) # find index for minimum
print(np.mean(A)) # calculate mean
print(np.sort(A)) # sort (ascending)

-2.5
3
4.82
[-2.5  2.3  3.1  9.1 12.1]


**Note:** Sometimes a method can be used instead of a function, e.g. ``A.mean()``. <br> 
In numpy all methods are available as functions but not all functions are available as methods. Sometimes there are minute differences between them, like whether they return a copy or a view. <br>
We'll tend to stick to functions.

### 1.3. <a id='toc1_3_'></a>[Indexing](#toc0_)

**Multi-dimensional** indexing is done as:

In [65]:
X = np.array([ [11, 12, 13], 
               [21, 22, 23] ])

print(X[0:2,1:3])
print('X=',X[-1:,:])

print('X\n',X)
print('\nX[0]\n',X[0]) # first row, all columns (this is implictly X[0,:])
print('\nX[:,0]\n',X[:,0]) # all rows, first column
print('\nX[1,2]\n',X[1,2]) # second row, third column
print('\nX[0:2,1:3]\n',X[0:2,1:3]) # rows 1-2, Column 2-3

[[12 13]
 [22 23]]
X= [[21 22 23]]
X
 [[11 12 13]
 [21 22 23]]

X[0]
 [11 12 13]

X[:,0]
 [11 21]

X[1,2]
 23

X[0:2,1:3]
 [[12 13]
 [22 23]]


Indexes can be **logical**. Logical 'and' is `&` and logical 'or' is `|`. <br>
(how to type `|` varies a lot across keyboards, you should be able to find out how to type it by searching "vertical line keyboard" for your computer)

In [67]:
A = np.array([1,2,3,4,1,2,3,4])
B = np.array([3,3,3,3,2,3,2,2])
I = (A < 3) & (B == 3) # note & instead of 'and', indicates element-wise comparison
print(I)
print(A[I],'\n')

# Two ways of getting indices of the elements == True
print(np.where(I)) # A 'where' clause normally asks for where the True elements are.
print(np.nonzero(I)) # Because a True boolean is a 1 while a False is a 0.

[ True  True False False False  True False False]
[1 2 2] 

(array([0, 1, 5], dtype=int64),)
(array([0, 1, 5], dtype=int64),)


In [68]:
I = (A < 3) | (B == 3) # note | instead of 'or'
print(A[I])

[1 2 3 4 1 2]


### 1.4. <a id='toc1_4_'></a>[Multidimensional arrays](#toc0_)

Arrays can have more dimensions than two, they become more difficult to understand, but can be really useful. <br>
The python way of understanding them is as matrices storing matrices. <br>
In the shape attribute, the rows and columns of the most iner matrix is then the second last and last elements of the shape

In [69]:
A_i = np.array([[1,2],
                [3,4]])
A = np.array([A_i,A_i,A_i])

print(A.shape) # (Number of matrices,rows,cols)
print(A)

(3, 2, 2)
[[[1 2]
  [3 4]]

 [[1 2]
  [3 4]]

 [[1 2]
  [3 4]]]


In [70]:
B_i = np.array([[5,5],
                [5,5]])

AB = np.array([[A_i,A_i,A_i],
                [B_i,B_i,B_i]])
print(AB.shape) # (rows in matrix of matrices , columns in matrix of matrices, rows, columns )
print(AB)

(2, 3, 2, 2)
[[[[1 2]
   [3 4]]

  [[1 2]
   [3 4]]

  [[1 2]
   [3 4]]]


 [[[5 5]
   [5 5]]

  [[5 5]
   [5 5]]

  [[5 5]
   [5 5]]]]


**Note:** the inner matrices (A_i and B_i) should be of the same shape

### 1.5. <a id='toc1_5_'></a>[List of good things to know](#toc0_)

**Attributes and methods** to know:

- size / ndim / shape
- ravel / reshape / sort
- copy

**Functions** to know:

- array / empty / zeros / ones / linspace
- mean / median / std / var / sum / percentile
- min/max, argmin/argmax / fmin / fmax / sort / clip
- meshgrid / hstack / vstack / concatenate / tile / insert
- allclose / isnan / isinf / isfinite / any / all

**Concepts** to know:

- view vs. copy
- broadcasting
- logical indexing

The important thing is not to try to memorize all of this. <br> 
The important thing is to understand the logic of how numpy works and get an idea about the things numpy can help you with.
Googling numpy and what mathematical operation you want to do will more often than not, give a numpy function that does what you're looking for. 
As you code during your projects, hopefully you'll then memorize the functiontionalities you find the most useful.

### 1.6. <a id='toc1_6_'></a>[Small quiz](#toc0_)

What follows is a number of codeblocks, with print statements that are commented out. Try to predict what will be printed, and check if you're correct afterwards.

In [80]:
A = np.array([1, 2, 3, 4])
B = np.array([3,3,2,2])
I = (B==3) & A>=1
print(np.nonzero(I))

(array([0], dtype=int64),)


In [87]:
A = np.ones((4,2))*2
B = A[0:2,1]+ 1 
print(A,"\n")

print(B)

[[2. 2.]
 [2. 2.]
 [2. 2.]
 [2. 2.]] 

[3. 3.]


In [88]:
A = np.ones((4,2))*2
B = A[0:2,1]+ 1 
print(B)

[3. 3.]


In [94]:
A = np.ones((2,2))
B = np.array([[5],[10]])
C=A*B
D=B*A
print(C,"\n",D)
print(np.all(A*B==B*A))

[[ 5.  5.]
 [10. 10.]] 
 [[ 5.  5.]
 [10. 10.]]
True


In [96]:
A = np.array([1,2,3,4,5])
B = A[3:]
print(B)
B[:] = 0
print(np.sum(A))

[4 5]
6


## 2. <a id='toc2_'></a>[Extra: Memory](#toc0_)

Recall that matrices in memory is structured in **rows** (as opposed to columns). You can see how B (the unravelled version of A) looks - it's a row.. 

In [98]:
A = np.array([[3.1,4.2],[5.7,9.3]])
B = A.ravel() # one-dimensional view of A
print(A.shape,A)
print(B.shape,B)

(2, 2) [[3.1 4.2]
 [5.7 9.3]]
(4,) [3.1 4.2 5.7 9.3]
