<a href="https://colab.research.google.com/github/victorviro/Machine-Learning-Python/blob/master/Numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to Numpy

[NumPy](https://github.com/numpy/numpy) is a Python library optimized for numerical computing. It is powerful when used in conjunction with other packages such as SciPy for various scientific functions, Matplotlib for visualization, and Pandas for data analysis. NumPy is short for numerical python.

NumPy’s core strength lies in its ability to create and manipulate n-dimensional arrays. This is particularly critical for building machine learning and deep learning models. Data is often represented in a matrix, where each row represents an observation and each column a variable or feature. Hence, NumPy’s 2-D array is a natural fit for storing and manipulating datasets.

This tutorial will cover the basics of NumPy to get us very comfortable working with the package and also get us to appreciate the thinking behind how NumPy works. This understanding forms a foundation from which one can extend and seek solutions from the NumPy reference documentation when a specific functionality is needed.

To begin using NumPy, we’ll start by importing the NumPy module:

In [2]:
import numpy as np

## NumPy 1-D array

Let’s create a simple 1-D NumPy array:

In [16]:
my_array = np.array([2,4,6,8,10])
print(my_array)

# The data-type of a NumPy array is the ndarray
print(type(my_array))

# A NumPy 1-D array can also be seen a vector with 1 dimension
print(my_array.ndim)

[ 2  4  6  8 10]
<class 'numpy.ndarray'>
1


Let's check the shape of the array to get the number of rows and columns in the array, read as (rows, columns).

In [9]:
my_array.shape

(5,)

We can also create an array from a Python list.

In [10]:
my_list = [9, 5, 2, 7]
list_to_array = np.array(my_list) # or np.asarray(my_list)

Let’s explore other useful methods often employed for creating arrays.

In [12]:
# Create an array from a range of numbers
print(np.arange(10))

# Create an array from start to end (exclusive) via a step size (start, stop, step)
print(np.arange(2, 10, 2))

# Create a range of points between two numbers
print(np.linspace(2, 10, 5))

# Create an array of ones
print(np.ones(5))

# Create an array of zeros
print(np.zeros(5))

[0 1 2 3 4 5 6 7 8 9]
[2 4 6 8]
[ 2.  4.  6.  8. 10.]
[1. 1. 1. 1. 1.]
[0. 0. 0. 0. 0.]


## NumPy datatypes

NumPy boasts a broad range of numerical datatypes in comparison with vanilla Python. This extended datatype support is useful for dealing with different kinds of signed and unsigned integer and floating-point numbers as well as booleans and complex numbers for scientific computation. NumPy datatypes include the **bool_**, **int**(8,16,32,64), **uint**(8,16,32,64), **float**(16,32,64), **complex**(64,128) as well as the **int_**, **float_**, and **complex_**, to mention just a few.

The datatypes with a **_** appended are base Python datatypes converted to NumPy datatypes. The parameter `dtype` is used to assign a datatype to a NumPy function. The default NumPy type is **float_**. Also, NumPy infers contiguous arrays of the same type.

Let’s explore a bit with NumPy datatypes:

In [14]:
# Ints
my_ints = np.array([3, 7, 9, 11])
print(my_ints.dtype)

# Floats
my_floats = np.array([3., 7., 9., 11.])
print(my_floats.dtype)

# Non-contiguous types - default: float
my_array = np.array([3., 7., 9, 11])
print(my_array.dtype)

# Manually assigning datatypes
my_array = np.array([3, 7, 9, 11], dtype="float64")
print(my_array.dtype)

int64
float64
float64
float64


## Indexing + fancy indexing (1-D)

We can index a single element of a NumPy 1-D array similar to how we index a Python list.

In [15]:
# Create a random numpy 1-D array
my_array = np.random.rand(10)
print(my_array)

# Index the first element
print(my_array[0])

# Index the last element
print(my_array[-1])

[0.79509962 0.70397551 0.76151385 0.77682588 0.5540919  0.56043872
 0.36320874 0.43293778 0.9524822  0.869757  ]
0.7950996212629662
0.8697569987639644


Fancy indexing in NumPy is an advanced mechanism for indexing array elements based on integers or boolean. This technique is also called *masking*.

### Boolean Mask

Let’s index all the even integers in the array using a boolean mask.

In [17]:
# Create 10 random integers between 1 and 20
my_array = np.random.randint(1, 20, 10)
print(my_array)

# Index all even integers in the array using a boolean mask
print(my_array[my_array % 2 == 0])

[14 16 10  7  5 10  1  1  7 11]
[14 16 10 10]


Observe that the code `my_array % 2 == 0` outputs an array of booleans.


In [18]:
my_array % 2 == 0

array([ True,  True,  True, False, False,  True, False, False, False,
       False])

### Integer Mask

Let’s select all elements with even indices in the array.

In [19]:
my_array[np.arange(1,10,2)]

array([16,  7, 10,  1, 11])

Remember that array indices are indexed from 0. So the second element, 18, is in index 1.


### Slicing a 1-D Array

Slicing a NumPy array is also similar to slicing a Python list.

In [20]:
my_array = np.array([14,  9,  3, 19, 16,  1, 16,  5, 13,  3])
print(my_array)

# Slice the first 2 elements
print(my_array[:2])

# slice the last 3 elements
print(my_array[-3:])

[14  9  3 19 16  1 16  5 13  3]
[14  9]
[ 5 13  3]


## Basic Math Operations on Arrays: Universal Functions


The core power of NumPy is in its highly optimized vectorized functions for various mathematical, arithmetic, and string operations. In NumPy these functions are called universal functions. We’ll explore a couple of basic arithmetic with NumPy 1-D arrays.

In [22]:
# Create an array of even numbers between 2 and 10
my_array = np.arange(2,11,2)

# Sum of array elements
print(np.sum(my_array)) # or my_array.sum()

# Square root
print(np.sqrt(my_array))

# Log
print(np.log(my_array))

# Exponent
print(np.exp(my_array))

30
[1.41421356 2.         2.44948974 2.82842712 3.16227766]
[0.69314718 1.38629436 1.79175947 2.07944154 2.30258509]
[7.38905610e+00 5.45981500e+01 4.03428793e+02 2.98095799e+03
 2.20264658e+04]


## Higher-Dimensional Arrays

As we’ve seen earlier, the strength of NumPy is its ability to construct and manipulate n-dimensional arrays with highly optimized (i.e., vectorized) operations. Previously, we covered the creation of 1-D arrays (or vectors) in NumPy to get a feel of how NumPy works.

This section will now consider working with 2-D and 3-D arrays. 2-D arrays are ideal for storing data for analysis. Structured data is usually represented in a grid of rows and columns. And even when data is not necessarily represented in this format, it is often transformed into a tabular form before doing any data analytics or machine learning. Each column represents a feature or attribute and each row an observation.

Also, other data forms like images are adequately represented using 3-D arrays. A colored image is composed of $n \times n$ pixel intensity values with a color depth of three for the red, green, and blue (RGB) color profiles.

### Creating 2-D Arrays (Matrices)

Let us construct a simple 2-D array.

In [23]:
# Construct a 2-D array
my_2D = np.array([[2,4,6],[8,10,12]])
print(my_2D)

# Check the number of dimensions
print(my_2D.ndim)

# Get the shape of the 2-D array - this example has 2 rows and 3 columns: (r, c)
print(my_2D.shape)

[[ 2  4  6]
 [ 8 10 12]]
2
(2, 3)


Let’s explore common methods in practice for creating 2-D NumPy arrays, which are also matrices.

In [24]:
# Create a 3x3 array of ones
print(np.ones([3,3]))

# Create a 3x3 array of zeros
print(np.zeros([3,3]))

# Create a 3x3 array of a particular scalar - full(shape, fill_value)
print(np.full([3,3], 2))

# Create a 3x3, empty uninitialized array
print(np.empty([3,3]))

# Create a 4x4 identity matrix - i.e., a matrix with 1's on its diagonal
print(np.eye(4)) # or np.identity(4)

[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]
[[2 2 2]
 [2 2 2]
 [2 2 2]]
[[1.e-323 1.e-323 1.e-323]
 [1.e-323 1.e-323 1.e-323]
 [1.e-323 1.e-323 1.e-323]]
[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]


###  Creating 3-D Arrays


Let’s construct a basic 3-D array.

In [25]:
# Construct a 3-D array
my_3D = np.array([[[2,4,6],[8,10,12]],
                  [[1,2,3],[7,9,11]]])
print(my_3D) 

# Check the number of dimensions
print(my_3D.ndim)

# Get the shape of the 3-D array - this example has 2 pages, 2 rows and 3 columns: (p, r, c)
print(my_3D.shape)

[[[ 2  4  6]
  [ 8 10 12]]

 [[ 1  2  3]
  [ 7  9 11]]]
3
(2, 2, 3)


We can also create 3-D arrays with methods such as **ones**, **zeros**, **full**, and **empty** by passing the configuration for `[page, row, columns]` into the shape parameter of the methods. For example:

In [26]:
# Create a 2-page, 3x3 array of ones
print(np.ones([2,3,3]))

# Create a 2-page, 3x3 array of zeros
print(np.zeros([2,3,3]))

[[[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]

 [[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]]
[[[0. 0. 0.]
  [0. 0. 0.]
  [0. 0. 0.]]

 [[0. 0. 0.]
  [0. 0. 0.]
  [0. 0. 0.]]]


### Indexing/Slicing of Matrices

Let’s see some examples of indexing and slicing 2-D arrays. The concept extends nicely from doing the same with 1-D arrays.

In [27]:
# Create a 3x3 array contain random normal numbers
my_3D = np.random.randn(3,3) 

# Select a particular cell (or element) from a 2-D array.
print(my_3D[1,1])    # In this case, the cell at the 2nd row and column

# Slice the last 3 columns
print(my_3D[:,1:3])

# Slice the first 2 rows and columns
print(my_3D[0:2, 0:2]) 

-0.6621639758466638
[[ 0.69324994  0.63780168]
 [-0.66216398 -0.4319984 ]
 [-0.34869876 -0.03932028]]
[[ 0.379065    0.69324994]
 [ 0.97688821 -0.66216398]]


## Matrix Operations: Linear Algebra

Linear algebra is a convenient and powerful system for manipulating a set of data features and is one of the strong points of NumPy. Linear algebra is a crucial component of machine learning and deep learning research and implementation of learning algorithms. NumPy has vectorized routines for various matrix operations. Let’s go through a few of them.


### Matrix Multiplication (Dot Product)


First let’s create random integers using the method `np.random.randint(low, high=None, size=None,)` which returns random integers from low (inclusive) to high (exclusive).

In [28]:
# Create a 3x3 matrix of random integers in the range of 1 to 50
A = np.random.randint(1, 50, size=[3,3])
B = np.random.randint(1, 50, size=[3,3])

# Print the arrays
print(A) 
print(B)

[[44 21 15]
 [27  6 43]
 [32 30 35]]
[[34 28 22]
 [22 44 39]
 [ 9 27 20]]


We can use the following routines for matrix multiplication, `np.matmul(a,b)` or `a @ b` if using Python 3.6. Using `a @ b` is preferred. Remember that when multiplying matrices, the inner matrix dimensions must agree. For example, if A is an $m \times n$ matrix and B is an $n \times p$ matrix, the product of the matrices will be an $m \times p$ matrix.

![](https://i.ibb.co/NynHpc9/matrix-multiplication.png)

In [29]:
# Multiply the two matrices A and B (dot product)
A @ B    # or np.matmul(A,B)

array([[2093, 2561, 2087],
       [1437, 2181, 1688],
       [2063, 3161, 2574]])

### Element-Wise Operations

Element-wise matrix operations involve matrices operating on themselves in an element-wise fashion. The action can be an addition, subtraction, division, or multiplication (which is commonly called the Hadamard product). The matrices must be of the same shape. Note that while a matrix is of shape $n \times n$, a vector is of shape $n \times 1$. These concepts easily apply to vectors as well. 

![](https://i.ibb.co/q13WnHr/element-wise-operation.png)

In [30]:
# Hadamard multiplication of A and B
print(A * B) 

# add A and B
print(A + B)

# subtract A from B
print(B - A)

# divide A with B
print(A / B) 

[[1496  588  330]
 [ 594  264 1677]
 [ 288  810  700]]
[[78 49 37]
 [49 50 82]
 [41 57 55]]
[[-10   7   7]
 [ -5  38  -4]
 [-23  -3 -15]]
[[1.29411765 0.75       0.68181818]
 [1.22727273 0.13636364 1.1025641 ]
 [3.55555556 1.11111111 1.75      ]]


###  Scalar Operation


A matrix can be acted upon by a scalar (i.e., a single numeric entity) in the same way element-wise fashion. This time the scalar operates upon each element of the matrix or vector.

![](https://i.ibb.co/Gx4X50X/scalar-operations.png)

In [31]:
# Hadamard multiplication of A and a scalar, 0.5
print(A * 0.5) 

# Add A and a scalar, 0.5
print(A + 0.5)

# Subtract a scalar 0.5 from B
print(B - 0.5)

# Divide A and a scalar, 0.5
print(A / 0.5)

[[22.  10.5  7.5]
 [13.5  3.  21.5]
 [16.  15.  17.5]]
[[44.5 21.5 15.5]
 [27.5  6.5 43.5]
 [32.5 30.5 35.5]]
[[33.5 27.5 21.5]
 [21.5 43.5 38.5]
 [ 8.5 26.5 19.5]]
[[88. 42. 30.]
 [54. 12. 86.]
 [64. 60. 70.]]


### Matrix Transposition

Transposition is a vital matrix operation that reverses the rows and columns of a matrix by flipping the row and column indices. The transpose of a matrix is denoted as $A^T$. Observe that the diagonal elements remain unchanged.

![](https://i.ibb.co/0tsh9WS/matrix-transponse.png)

In [32]:
A = np.array([[15, 29, 24],
              [ 5, 23, 26],
              [30, 14, 44]])
# Transpose A
A.T   # or A.transpose()

array([[15,  5, 30],
       [29, 23, 14],
       [24, 26, 44]])

###  The Inverse of a Matrix


A $m \times m$ matrix A (also called a square matrix) has an inverse if A times another matrix B results in the identity matrix, also of shape $m \times m$. This matrix B is called the inverse of A and is denoted as $A^{-1}$. This relationship is formally written as

$$ A{A}^{-1}={A}^{-1}A=I $$

However, not all matrices have an inverse. A matrix with an inverse is called a *nonsingular* or *invertible* matrix, while those without an inverse are known as *singular*.

**Note**: A square matrix is a matrix that has the same number of rows and columns.

Let’s use NumPy to get the inverse of a matrix. Some linear algebra modules are found in a sub-module of NumPy called `linalg`.

In [33]:
A = np.array([[15,29,24],
              [5,23,26],
              [30,14,44]])
# Find the inverse of A
np.linalg.inv(A) 

array([[ 0.05848375, -0.08483755,  0.01823105],
       [ 0.05054152, -0.00541516, -0.02436823],
       [-0.05595668,  0.05956679,  0.01805054]])

NumPy also implements the *Moore-Penrose pseudo inverse*, which gives an inverse derivation for singular matrices. Here, we use the `pinv` method to find the inverses of invertible matrices.

In [34]:
# Using pinv()
np.linalg.pinv(A) 

array([[ 0.05848375, -0.08483755,  0.01823105],
       [ 0.05054152, -0.00541516, -0.02436823],
       [-0.05595668,  0.05956679,  0.01805054]])

## Reshaping

A NumPy array can be restructured to take on a different shape. Let’s convert a 1-D array to a $m \times n$ matrix.

In [35]:
# Make 20 elements evenly spaced between 0 and 5
a = np.linspace(0,5,20)
print(a) 

# Observe that a is a 1-D array
print(a.shape)

# Reshape into a 5 x 4 matrix
A = a.reshape(5, 4)
print(A) 

# The vector a has been reshaped into a 5 by 4 matrix A
print(A.shape) 

[0.         0.26315789 0.52631579 0.78947368 1.05263158 1.31578947
 1.57894737 1.84210526 2.10526316 2.36842105 2.63157895 2.89473684
 3.15789474 3.42105263 3.68421053 3.94736842 4.21052632 4.47368421
 4.73684211 5.        ]
(20,)
[[0.         0.26315789 0.52631579 0.78947368]
 [1.05263158 1.31578947 1.57894737 1.84210526]
 [2.10526316 2.36842105 2.63157895 2.89473684]
 [3.15789474 3.42105263 3.68421053 3.94736842]
 [4.21052632 4.47368421 4.73684211 5.        ]]
(5, 4)


### Reshape vs. Resize Method

NumPy has the `np.reshape` and `np.resize` methods. The reshape method returns an ndarray with a modified shape without changing the original array, whereas the resize method changes the original array. Let’s see an example.

In [36]:
# Generate 9 elements evenly spaced between 0 and 5
a = np.linspace(0,5,9)
print(a)

# The original shape
print(a.shape)

# Call the reshape method
print(a.reshape(3,3))

# The original array maintained its shape
print(a.shape)

# Call the resize method - resize does not return an array
print(a.resize(3,3))

# The resize method has changed the shape of the original array
print(a.shape)

[0.    0.625 1.25  1.875 2.5   3.125 3.75  4.375 5.   ]
(9,)
[[0.    0.625 1.25 ]
 [1.875 2.5   3.125]
 [3.75  4.375 5.   ]]
(9,)
None
(3, 3)


### Stacking Arrays

NumPy has methods for concatenating arrays (also called stacking). The methods `hstack` and `vstack` are used to stack several arrays along the horizontal and vertical axis, respectively.

In [37]:
# Create a 2x2 matrix of random integers in the range of 1 to 20
A = np.random.randint(1, 50, size=[3,3])
B = np.random.randint(1, 50, size=[3,3])
# Print out the arrays
print(A)
print(B)

[[21 37 48]
 [42 19 21]
 [22 25 27]]
[[39 21 35]
 [27 14 21]
 [47  8 45]]


Let’s stack A and B horizontally using `hstack`. To use `hstack`, the arrays must have the same number of rows. Also, the arrays to be stacked are passed as a tuple to the `hstack` method.

In [38]:
# Arrays are passed as tuple to hstack
np.hstack((A,B)) 

array([[21, 37, 48, 39, 21, 35],
       [42, 19, 21, 27, 14, 21],
       [22, 25, 27, 47,  8, 45]])

To stack A and B vertically using `vstack`, the arrays must have the same number of columns. The arrays to be stacked are also passed as a tuple to the `vstack` method.

In [39]:
# Arrays are passed as tuple to hstack
np.vstack((A,B)) 

array([[21, 37, 48],
       [42, 19, 21],
       [22, 25, 27],
       [39, 21, 35],
       [27, 14, 21],
       [47,  8, 45]])

## Broadcasting

NumPy has an elegant mechanism for arithmetic operation on arrays with different dimensions or shapes. As an example, when a scalar is added to a vector (or 1-D array). The scalar value is conceptually broadcasted or stretched across the rows of the array and added element-wise.

![](https://i.ibb.co/Gp9zvCC/broadcasting-vector.png)

Matrices with different shapes can be broadcasted to perform arithmetic operations by stretching the dimension of the smaller array. Broadcasting is another vectorized operation for speeding up matrix processing. However, not all arrays with different shapes can be broadcasted. For broadcasting to occur, the trailing axes for the arrays must be the same size or 1.
In the example that follows, the matrices A and B have the same rows, but the column of matrix B is 1. Hence, an arithmetic operation can be performed on them by broadcasting and adding the cells element-wise.

See Figure 10-6 for more illustration.

![](https://i.ibb.co/JRLyrBQ/broadcasting-matrix.png)

In [40]:
# Create a 4 X 3 matrix of random integers between 1 and 10
A = np.random.randint(1, 10, [4, 3])
print(A)

# Create a 4 X 1 matrix of random integers between 1 and 10
B = np.random.randint(1, 10, [4, 1])
print(B) 

# Add A and B
print(A + B) 

[[6 8 1]
 [1 2 1]
 [9 2 8]
 [6 6 3]]
[[9]
 [2]
 [5]
 [9]]
[[15 17 10]
 [ 3  4  3]
 [14  7 13]
 [15 15 12]]


The example that follows cannot be broadcasted and will result in a *ValueError: operands could not be broadcasted together with shapes (4,3) (4,2)* because the matrices A and B have different columns and do not fit with the mentioned rules of broadcasting that the trailing axes for the arrays must be the same size or 1.

In [42]:
A = np.random.randint(1, 10, [4, 3])
B = np.random.randint(1, 10, [4, 2])
# A + B 