# COMS 4771 - Machine Learning, Fall 2023

## NumPy Basics

The main numpy object is called “homogeneous multidimensional array”. It allows faster performance in mathematical operations than normal Python array/list. The below is a nice and simple introduction to basic NumPy functions and properties.

In [1]:
import numpy as np

### Initializing Numpy arrays

In [2]:
a = np.array([1, 2, 3, 4, 5])
b = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]], dtype = float)
c = np.array([[0, 5], [1, 6], [2, 7], [3, 8], [4, 9]], dtype = int)

In [3]:
np.zeros(5)            # a vector/1D array of zeros with size of 5

array([0., 0., 0., 0., 0.])

In [4]:
np.ones((3, 4))        # a matrix/2D array of ones with shape 3x4 

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [5]:
np.random.rand(3,2)    # a 3x2 matrix with random entries from a uniform distribution over [0, 1)

array([[0.69520009, 0.48675928],
       [0.09465856, 0.05482368],
       [0.83264285, 0.94659103]])

In [6]:
np.full((2, 3), 4)     # a 2x3 matrix filled with 4

array([[4, 4, 4],
       [4, 4, 4]])

In [7]:
np.arange(0, 10)       # a vector of evenly spaced values ranging from 0 to 9

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

### Getting properties of a Numpy array

In [8]:
b.shape              # dimension of the matrix

(2, 5)

In [9]:
b.size               # total number of elements in the matrix

10

In [10]:
b.ndim               # number of dimensions

2

In [11]:
b.dtype              # data type of the elements

dtype('float64')

### Numpy array basic operations

Numpy supports arrays operations (e,g. addition, subtraction, multiplication and division) as well as other useful functions.

In [12]:
x = np.arange(0, 5)
y = np.arange(5, 10)

In [13]:
x + y                 # element-wise addition

array([ 5,  7,  9, 11, 13])

In [14]:
x - y                 # element-wise subtraction

array([-5, -5, -5, -5, -5])

In [15]:
x * y                 # element-wise multiplication

array([ 0,  6, 14, 24, 36])

In [16]:
x / y                 # element-wise division

array([0.        , 0.16666667, 0.28571429, 0.375     , 0.44444444])

In [17]:
x == y                # element-wise comparison

array([False, False, False, False, False])

In [18]:
x.max()               # max value in x  

4

In [19]:
x.min()               # min value in x  

0

In [20]:
x.sum()               # max value in x  

10

In [21]:
x.mean()              # mean of x

2.0

The above are the most basic operations and functions for Numpy arrays. There are many more interesting functions such as <i>ndarray.flatten(), numpy.squeeze(), numpy.expand_dims,...</i> which are used to manipulate the dimensions of array as well as other functions for different purposes.

Please be careful about performing operations of arrays with different sizes. Instead of throwing errors, Numpy will sometimes do things such as automatically increasing the dimensionality of the array smaller in size to match the larger one. 

For example:

In [22]:
x1 = np.full((3,2), 2)
print(x1)

[[2 2]
 [2 2]
 [2 2]]


In [23]:
x2 = np.expand_dims(np.arange(1,4), axis=1)
print(x2)

[[1]
 [2]
 [3]]


In [24]:
print(x1 * x2)

[[2 2]
 [4 4]
 [6 6]]


Although x1 and x2 are different in shape, we can see that Numpy is smart in the above operation. It multiplies each element in x2 to corresponding inner array of x1.

However, this does not always happens. For example, if you try adding (element wise) a numpy array of 2x3 with another of 3x2, it will (intuitively) trigger error.

If you are not confident about any operations, double-check to save your debugging time later as it is not always obvious.

#### Exercise 1

Write a one-liner function to compute Euclidean distance between two vectors x and y given below. Manual check your function.

\begin{align}
    ED(\vec{x},\vec{y})= \sqrt{\sum_{i=1}^n (x_i-y_i)^2}
\end{align}

In [31]:
x = np.arange(0, 5)
y = np.arange(5, 10)

# You might want to use np.sqrt(), np.square() and np.sum()
def ED(a, b):
    return np.sqrt(np.sum(np.square(x-y)))
    
print(ED(x,y))

[0 1 2 3 4]
[5 6 7 8 9]
[-5 -5 -5 -5 -5]
[25 25 25 25 25]
125
11.180339887498949


### Numpy matrix operations

A two-dimensional matrix in Linear Algebra could be expressed as a 2D array in Numpy.

\begin{align}
    \mathbf{X} = \begin{pmatrix} 
        1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 
    \end{pmatrix} 
    \quad \text{and} \quad 
    \mathbf{Y} = \begin{pmatrix} 
        1 & 3 & 2 \\ 5 & 2 & 1 \\ 2 & 3 & 1 
    \end{pmatrix}
\end{align}

Matrices X and Y could be expressed in Numpy as:

In [33]:
X = np.array([[1,2,3],[4,5,6],[7,8,9]])
Y = np.array([[1,3,2],[5,2,1],[2,3,1]])

Below are the common and basic Linear Algebra operations:

In [27]:
np.dot(X,Y)                        # Dot product of X & Y, different from X*Y which is elementwise multiplication

array([[17, 16,  7],
       [41, 40, 19],
       [65, 64, 31]])

In [36]:
print(X)
np.linalg.det(X)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


0.0

In [40]:
np.linalg.inv(Y)                   # Inverse matrix of Y; X is not invertible

array([[-0.08333333,  0.25      , -0.08333333],
       [-0.25      , -0.25      ,  0.75      ],
       [ 0.91666667,  0.25      , -1.08333333]])

In [41]:
np.transpose(X)                    # Transpose of X

array([[1, 4, 7],
       [2, 5, 8],
       [3, 6, 9]])

In [42]:
np.eye(3)                          # An 3x3 identity matrix

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [43]:
arr = np.reshape(range(9),(3,3))   # Reshape the 1D row (length 9) to a 3x3 matrix
print(arr)

[[0 1 2]
 [3 4 5]
 [6 7 8]]


Double check if you are not confident whenever using <b><i>numpy.reshape()</i></b>. This is one of the most used functions to manipulate matrix in Machine Learning applications. Reading through the official document is highly recommended https://numpy.org/doc/stable/reference/generated/numpy.reshape.html

Other useful matrix manipulation functions are <i>numpy.delete()</i> and <i>numpy.concatenate()</i> which are used to delete/add row or column from the matrix, 

#### Exercise 2

We have matrix <b>A</b> as below
\begin{align}
    \begin{pmatrix} 
        1 & 2 & 3 \\ 4 & 5 & 6 
    \end{pmatrix} 
\end{align}

Vector <b>B</b> as \begin{pmatrix} 7 & 8 & 9\end{pmatrix}

And vector <b>C</b> as \begin{pmatrix} 10 & 11 & 12\end{pmatrix}

Create the matrix and vectors above in Numpy and try the below tasks:

(a) Add <b>B</b> to the 3rd row of <b>A</b>

(b) Add <b>C</b> as the 4th column to the resulted matrix in (a)

(c) Remove the first column from the resulted matrix in (b)

In [50]:
A = np.array([[1,2,3],[4,5,6]])
B = np.array([7,8,9])
C = np.array([10,11,12])

M_a = np.vstack((A,B))
print(M_a)

M_b = np.hstack((M_a,C.reshape(3,1)))
print(M_b)

M_c = M_b[:,1:]
print(M_c)

[[1 2 3]
 [4 5 6]
 [7 8 9]]
[[ 1  2  3 10]
 [ 4  5  6 11]
 [ 7  8  9 12]]
[[ 2  3 10]
 [ 5  6 11]
 [ 8  9 12]]
