# Introduction to NumPy

**Author**: Yazeed al-Momani

## 1. Foundations

### 1.1 What is NumPy & why use it?
NumPy is a Python library for fast numerical computations. Compared to Python lists:
* It is faster because it uses C under the hood.
* It is more memory efficient.
* It has built-in math, matrix, and statistical functions.

### 1.2 Importing

In [1]:
import numpy as np

### 1.3 Creating arrays with `np.array()`
NumPy arrays are called ndarray objects where n stands for the number of dimensions.

#### 1.3.1 1D Array
Normal list

In [28]:
arr1d = np.array([1, 2, 3, 4])

#### 1.3.2 2D Array
List of lists

In [29]:
arr2d = np.array([[1, 2, 3], [4, 5, 6]])

#### 1.3.3 3D Array and more
List of lists of lists... and so on

In [30]:
arr3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7,8]]])

### 1.4 Array attributes

#### 1.4.1 shape
Outputs array dimensions

In [12]:
# (length, )
arr1d.shape

(4,)

In [13]:
# (rows, columns)
arr2d.shape

(2, 3)

In [14]:
# (blocks, rows, columns)
arr3d.shape

(2, 2, 2)

#### 1.4.2 ndim
Outputs number of dimensions

In [16]:
arr1d.ndim

1

In [17]:
arr2d.ndim

2

In [18]:
arr3d.ndim

3

#### 1.4.3 dtype
Outputs data type of elements

In [19]:
arr1d.dtype

dtype('int64')

#### 1.4.4 size
Outputs total number of elements

In [24]:
arr1d.size

4

In [25]:
arr2d.size

6

In [26]:
arr3d.size

8

#### 1.4.5 Python type
This is just to see the type of these arrays

In [31]:
type(arr1d)

numpy.ndarray

## 2. Array Creation Shortcuts
Instead of manually typing lists everytime. NumPy gives you shortcuts for creating arrays.

### 2.1 Arrays of zeros & ones

In [37]:
# All ones, shape (5, 2)
arr_ones = np.ones((5, 2))

arr_ones

array([[1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.]])

In [36]:
# All zeros, shape (4, 2, 3)
arr_zeros = np.zeros((4, 2, 3))

arr_zeros

array([[[0., 0., 0.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [0., 0., 0.]]])

### 2.2 Array from a range
Similar to Python's `range()`, but returns an ndarray

In [46]:
# From 0 to 100 jumping 5 steps.
arr_range = np.arange(0, 100, 5)

arr_range

array([ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80,
       85, 90, 95])

### 2.3 Evenly spaced numbers
`linspace` includes the stop value, unlike `arange`

In [45]:
# 5 evenly spaced numbers from 0 to 100.
arr_linspace = np.linspace(0, 100, 5)

arr_linspace

array([  0.,  25.,  50.,  75., 100.])

### 2.4 Random integers & floats

In [47]:
# random integers from 0 to 10, shape (3, 4)
rand_int = np.random.randint(0, 10, size=(3, 4))

rand_int

array([[9, 9, 7, 5],
       [7, 0, 5, 7],
       [2, 6, 4, 8]])

In [49]:
# random floats from 0 to 1, shape (3, 4). Only accepts size.
rand_floats = np.random.random((3, 4))

rand_floats

array([[0.08114117, 0.0278857 , 0.25521376, 0.88551601],
       [0.87605716, 0.17801746, 0.78926421, 0.84997492],
       [0.73381016, 0.66462584, 0.68842573, 0.55035101]])

### 2.5 Setting a random seed
Since the `seed` is set. No matter how much you run the below cell, you will always get the same numbers.

In [51]:
np.random.seed(42)

rand_seed = np.random.randint(0, 10, size=(3, 4))

rand_seed

array([[6, 3, 7, 4],
       [6, 9, 2, 6],
       [7, 4, 3, 7]])

### 2.6 Converting NumPy array to Pandas DataFrame

In [53]:
import pandas as pd

data = pd.DataFrame(rand_int)

data

Unnamed: 0,0,1,2,3
0,9,9,7,5
1,7,0,5,7
2,2,6,4,8


## 3. Indexing & Slicing

### 3.1 Basic indexing

In [55]:
# Row 2 Column 3
rand_int[2, 3]

np.int64(8)

### 3.2 Row & column selection
Typing `:` means all values

In [60]:
# All values in row 2.
rand_int[2, :]

array([2, 6, 4, 8])

In [59]:
# All values in column 2
rand_int[:, 2]

array([7, 5, 4])

### 3.3 Slicing subsets
Similar to normal slicing: `Start:Stop`. Stop is exclusive.

In [62]:
# First 2 rows, all columns
rand_int[0:2, :]

array([[9, 9, 7, 5],
       [7, 0, 5, 7]])

In [63]:
# All rows, first 2 columns
rand_int[:, 0:2]

array([[9, 9],
       [7, 0],
       [2, 6]])

### 3.4 Boolean indexing
You can use comparasion operations to filter values.

In [70]:
# Returns boolean array
rand_int > 7

array([[ True,  True, False, False],
       [False, False, False, False],
       [False, False, False,  True]])

In [71]:
# Returns array of numbers > 7
rand_int[rand_int > 7]

array([9, 9, 8])

### 3.5 Unique values
Returns sorted unique elements.

In [72]:
np.unique(rand_int)

array([0, 2, 4, 5, 6, 7, 8, 9])

## 4. Math & Operations

### 4.1 Element-wise operations
Every operation works element-by-element if the shapes match. This is way faster than Python loops.

In [87]:
# Define arr_x
arr_x = np.random.randint(1, 10, size=(3, 5))

arr_x

array([[9, 4, 9, 3, 7],
       [6, 8, 9, 5, 1],
       [3, 8, 6, 8, 9]])

In [88]:
# Define arr_y
arr_y = np.random.randint(1, 10, size=(3, 5))

arr_y

array([[4, 1, 1, 4, 7],
       [2, 3, 1, 5, 1],
       [8, 1, 1, 2, 2]])

In [89]:
# Addition
arr_x + arr_y

array([[13,  5, 10,  7, 14],
       [ 8, 11, 10, 10,  2],
       [11,  9,  7, 10, 11]])

In [90]:
# Subtraction
arr_x - arr_y

array([[ 5,  3,  8, -1,  0],
       [ 4,  5,  8,  0,  0],
       [-5,  7,  5,  6,  7]])

In [91]:
# Multiplication
arr_x * arr_y

array([[36,  4,  9, 12, 49],
       [12, 24,  9, 25,  1],
       [24,  8,  6, 16, 18]])

In [92]:
# Division
arr_x / arr_y

array([[2.25      , 4.        , 9.        , 0.75      , 1.        ],
       [3.        , 2.66666667, 9.        , 1.        , 1.        ],
       [0.375     , 8.        , 6.        , 4.        , 4.5       ]])

### 4.2 Broadcasting
When doing element-wise operation, if shapes don't match exactly, NumPy tries to stretch one of the arrays to fit.

**Rules**:
* Compare shape numbers from right to left.
* Broadcasting succeeds if all compared numbers are **equal** or **one of them is 1**. Otherwise, it fails
* During comparasion, numbers 1 will be stretched to become equal to the other number. If both numbers are equal then they remain the same.

**Success Example**

In [96]:
# Shape (1, 3)
a = np.array([1, 2, 3])

# Shape (2, 1)
b = np.array([[10], [20]])

a + b

array([[11, 12, 13],
       [21, 22, 23]])

**Explanation**
1. We start with the right most shape numbers.
2. 3 vs 1: Fits criteria. Since b has 1, it will stretch to 3 if broadcasting succeeds.
3. 1 vs 2: Fits criteria. Since a has 1, it will stretch to 2 if broadcasting succeeds.
4. Since both shape numbers match criteria, the broadcasting succeeds and both arrays stretch to match each other.
5. Array a's new shape becomes (2, 3): [[1, 2, 3], [1, 2, 3]] (a's row got copied)
6. Array b's new shape becomes (2, 3): [[10, 10, 10],[20, 20, 20]] (b's column got copied twice)
7. Operation proceeds normally.

**Fail Example**

In [98]:
# Shape (3, 3)
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Shape (2, 1)
b = np.array([[10], [20]])

a + b

ValueError: operands could not be broadcast together with shapes (3,3) (2,1) 

**Explanation**
1. We start with the right most shape numbers.
2. 3 vs 1: Fits criteria. Since b has 1, it will stretch to 3 if broadcasting succeeds.
3. 3 vs 2: Does NOT fit criteria. Broadcasting fails.

### 4.3 Math functions

In [104]:
rand_int

array([[9, 9, 7, 5],
       [7, 0, 5, 7],
       [2, 6, 4, 8]])

In [100]:
np.square(rand_int)

array([[81, 81, 49, 25],
       [49,  0, 25, 49],
       [ 4, 36, 16, 64]])

In [101]:
np.mean(rand_int)

np.float64(5.75)

In [105]:
np.max(rand_int)

np.int64(9)

In [106]:
np.min(rand_int)

np.int64(0)

In [108]:
np.var(rand_int) # Variance

np.float64(6.854166666666667)

In [107]:
np.std(rand_int) # Standard Deviation

np.float64(2.6180463454008347)

### 4.4 Reshaping & Transposing

#### 4.4.1 Reshaping
When reshaping, the total number of elements must stay the same.

For example, if you have 12 elements, you can reshape to (3, 4), (4, 3), (2, 6), (12, 1), but not (5, 5).

In [122]:
print("Shape: ", rand_int.shape)
print("Size: ", rand_int.size)

Shape:  (3, 4)
Size:  12


In [123]:
rand_int

array([[9, 9, 7, 5],
       [7, 0, 5, 7],
       [2, 6, 4, 8]])



**Success Examples**

In [124]:
print(rand_int.reshape(4, 3))

[[9 9 7]
 [5 7 0]
 [5 7 2]
 [6 4 8]]


In [125]:
print(rand_int.reshape(2, 6))

[[9 9 7 5 7 0]
 [5 7 2 6 4 8]]


In [127]:
print(rand_int.reshape(1, 12))

[[9 9 7 5 7 0 5 7 2 6 4 8]]


**Fail Example**

In [128]:
print(rand_int.reshape(5, 5))

ValueError: cannot reshape array of size 12 into shape (5,5)

#### 4.4.2 Transposing
`.T` just flips the axis. So, for example, (3, 4) becomes (4, 3).

In [131]:
rand_int.shape

(3, 4)

In [132]:
rand_int

array([[9, 9, 7, 5],
       [7, 0, 5, 7],
       [2, 6, 4, 8]])

In [133]:
rand_int.T.shape

(4, 3)

In [134]:
rand_int.T

array([[9, 7, 2],
       [9, 0, 6],
       [7, 5, 4],
       [5, 7, 8]])

## 5. Dot Product
For two arrays A (x, y) and B (y, z) result is (x, z). 

Note that inner numbers have to be equal. 

In [148]:
A = np.random.randint(1, 10, size=(2, 3))

A

array([[1, 2, 1],
       [5, 5, 7]])

In [149]:
B = np.random.randint(1, 10, size=(3, 4))

B

array([[9, 9, 3, 3],
       [3, 4, 8, 6],
       [8, 1, 8, 4]])

In [154]:
# (2, 3) dot (3, 4) = (2, 4)
np.dot(A, B)

array([[ 23,  18,  27,  19],
       [116,  72, 111,  73]])

In [155]:
np.dot(A, B).shape

(2, 4)

Dot product fails when inner numbers don't match. This is when reshaping and transposing techniques come to use.

In [152]:
C = np.random.randint(1, 10, size = (3, 4))

C

array([[1, 8, 4, 6],
       [8, 4, 3, 9],
       [3, 9, 2, 2]])

In [153]:
D = np.random.randint(1, 10, size = (3, 4))

D

array([[2, 6, 3, 9],
       [4, 1, 4, 1],
       [5, 4, 8, 8]])

In [156]:
# (3, 4) dot (3, 4) = fail
np.dot(C, D)

ValueError: shapes (3,4) and (3,4) not aligned: 4 (dim 1) != 3 (dim 0)

Fixing the issue with reshaping and transposing techniques...

In [158]:
# (3, 4) dot (4, 3) = (3, 3) success
np.dot(C, D.T)

array([[116,  34, 117],
       [130,  57, 152],
       [ 84,  31,  83]])

## 6. Comparasions & Sorting

### 6.1 Element-wise comparasions
Compares each element.

In [174]:
A = np.random.randint(0, 10, size=(3, 5))

A

array([[7, 1, 5, 6, 1],
       [9, 1, 9, 0, 7],
       [0, 8, 5, 6, 9]])

In [175]:
B = np.random.randint(0, 10, size=(3, 5))

B

array([[6, 9, 2, 1, 8],
       [7, 9, 6, 8, 3],
       [3, 0, 7, 2, 6]])

In [176]:
A > B

array([[ True, False,  True,  True, False],
       [ True, False,  True, False,  True],
       [False,  True, False,  True,  True]])

In [177]:
A == B

array([[False, False, False, False, False],
       [False, False, False, False, False],
       [False, False, False, False, False]])

In [178]:
A > 4

array([[ True, False,  True,  True, False],
       [ True, False,  True, False,  True],
       [False,  True,  True,  True,  True]])

### 6.2 Boolean masks
Same as boolean indexing, you define a mask with array comparasion, then use it as index to filter values. This is called masking.

In [179]:
mask = A > B

A[mask]

array([7, 5, 6, 9, 9, 7, 8, 6, 9])

### 6.3 Sorting arrays

In [180]:
# Sorts each row individually
np.sort(A)

array([[1, 1, 5, 6, 7],
       [0, 1, 7, 9, 9],
       [0, 5, 6, 8, 9]])

### 6.4 Sorting indexes
Similar to `sort` however turns numbers to their indexes then sort them in terms of their original values.

In [181]:
np.argsort(A)

array([[1, 4, 2, 3, 0],
       [3, 1, 4, 0, 2],
       [0, 2, 3, 1, 4]])

### 6.5 Argmax & Argmin

In [183]:
# Index of maximum value (flattened array)
np.argmax(A)

np.int64(5)

In [184]:
# Index of minimum value (flattened array)
np.argmin(A)

np.int64(8)

### 6.6 Axis concept
Many methods take a parameter called `axis`. When `axis = 1`, it means rows and when `axis = 0`, it means columns.

In [187]:
# Sorts columns instead of rows
np.sort(A, axis=0)

array([[0, 1, 5, 0, 1],
       [7, 1, 5, 6, 7],
       [9, 8, 9, 6, 9]])

In [188]:
# Sorts entire array flattened
np.sort(A, axis=None)

array([0, 0, 1, 1, 1, 5, 5, 6, 6, 7, 7, 8, 9, 9, 9])

In [189]:
# Index of max value per column
np.argmax(A, axis=0)

array([1, 2, 1, 0, 2])

In [190]:
# Max value per column
np.max(A, axis=0)

array([9, 8, 9, 6, 9])

## The End
**Author:** Yazeed al-Momani  