As a preparation for the new material about artificial neural networks, we review basic concepts covered in term 1. 

# basic data structures

Python has several built-in data structures that you can use to store and organize data. These data structures include lists, tuples, sets, and dictionaries.

Lists are ordered collections of items. They are mutable, which means that you can change the items in a list after it is created. You can create a list by enclosing a comma-separated sequence of items in square brackets []. For example:

In [1]:
my_list = [1, 2, 3, 4]

Tuples are also ordered collections of items, but they are immutable, which means that you cannot modify the items in a tuple after it is created. You can create a tuple by enclosing a comma-separated sequence of items in parentheses (). For example:

In [2]:
my_tuple = (1, 2, 3, 4)

Sets are unordered collections of unique items. They are mutable and are created using the set keyword or by enclosing a comma-separated sequence of items in curly braces {}. For example:

In [3]:
my_set = {1, 2, 3, 4}

Dictionaries are unordered collections of key-value pairs. They are mutable and are created using the dict keyword or by enclosing a comma-separated sequence of key-value pairs in curly braces {}. The key and value are separated by a colon :. For example:

In [4]:
my_dict = {'a': 1, 'b': 2, 'c': 3, 'd': 4}

There are several ways to store a matrix in Python. One way is to use a list of lists, where each inner list represents a row of the matrix. For example, here's how you could create a 2x2 matrix using a list of lists:

In [5]:
matrix = [[1, 2], [3, 4]]
matrix[1][0]  # Output: 3

[[1, 2], [3, 4]]


Another option is to use a NumPy array, which is a multidimensional array object provided by the NumPy library. NumPy arrays are more efficient for storing and manipulating large matrices. Here's how you could create a 2x2 matrix using a NumPy array:

In [7]:
import numpy as np

matrix = np.array([[1, 2], [3, 4]])
matrix[1, 0]  # Output: 3

3

A tensor is a multi-dimensional array of data, therefore can be created using the np.array method

In [8]:
tensor = np.array([[[1,2],[3,4]],[[5,6],[7,8]]])
tensor[1,0,1]

6

# computation

In Python, you can perform matrix multiplication using the dot function from the NumPy library.

In [9]:
import numpy as np

matrix_a = np.array([[1, 2], [3, 4]])
matrix_b = np.array([[5, 6], [7, 8]])

result = np.dot(matrix_a, matrix_b)

print(result)  # Output: [[19 22], [43 50]]


[[19 22]
 [43 50]]


In NumPy, you can perform element-wise multiplication of two arrays using the * operator or the multiply function.

In [10]:
import numpy as np

array_a = np.array([1, 2, 3])
array_b = np.array([4, 5, 6])

result = array_a * array_b
print(result)  # Output: [4 10 18]


[ 4 10 18]


You can also use the multiply function to perform element-wise multiplication of two arrays:

In [11]:
result = np.multiply(array_a, array_b)
print(result)  # Output: [4 10 18]


[ 4 10 18]


# non-linear activation functions

The sigmoid function is a non-linear activation function that maps its input to values between 0 and 1.

The ReLU function is another commonly used activation function that maps negative input values to 0 and leaves positive input values unchanged.

The softmax function is a generalization of the logistic function that maps a K-dimensional vector of real values to a K-dimensional vector of values between 0 and 1 that sum to 1. It is often used as an activation function in the output layer of a multi-class classification model.

In [12]:
import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def relu(x):
    return np.maximum(0, x)

def softmax(x):
    # Subtract the maximum value from each element to prevent overflow
    x = x - np.max(x)
    # Calculate the exponent of each element
    exp_x = np.exp(x)
    # Return the normalized exponents
    return exp_x / np.sum(exp_x)


In [13]:
x = np.array([-1, 2, 3])
y = sigmoid(x)
z = relu(x)
w = softmax(x)

In [14]:
y, z, w

(array([0.26894142, 0.88079708, 0.95257413]),
 array([0, 2, 3]),
 array([0.01321289, 0.26538793, 0.72139918]))

# loss functions

Cross entropy loss is a common loss function used in supervised learning problems, particularly in classification tasks. It measures the difference between the predicted probability distribution and the true probability distribution of the target classes.

Here's how you can define the cross entropy loss function in Python:

In [15]:
import numpy as np

def cross_entropy_loss(y_pred, y_true):
    # Clip the predicted probabilities to prevent log(0) errors
    y_pred = np.clip(y_pred, 1e-7, 1 - 1e-7)
    # Calculate the cross entropy loss
    loss = -np.sum(y_true * np.log(y_pred))
    return loss

y_pred = np.array([[0.1, 0.3, 0.6]])
y_true = np.array([[0, 1, 0]])

loss = cross_entropy_loss(y_pred, y_true)
print(loss)  # Output: 1.2039728043259361

1.2039728043259361


In [16]:
x = np.array([0.1,0.5,0.4])
y = np.array([[0.1,0.5,0.4]])
z = np.array([[[0.1,0.5,0.4]]])

In [17]:
x.shape, y.shape, z.shape

((3,), (1, 3), (1, 1, 3))

In [18]:
z.size

3

# gradient descent

Gradient descent is an optimization algorithm that is used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function (J). The algorithm works by iteratively updating the values of the parameters in the opposite direction of the gradient of the cost function with respect to the parameters.

Here's a simple implementation of the gradient descent algorithm in Python:

In [19]:
import numpy as np

def gradient_descent(f, x0, lr=0.1, num_iter=100):
    x = x0.copy()
    for i in range(num_iter):
        # Calculate the gradient of the cost function
        grad = gradient(f, x)
        # Update the parameters
        x -= lr * grad
    return x

def gradient(f, x):
    # Set a small value for h
    h = 1e-7
    # Calculate the gradient
    grad = np.zeros_like(x)
    for i in range(x.size):
        # Save the current value of x[i]
        tmp = x[i]
        # Calculate f(x + h)
        x[i] = tmp + h
        fxh1 = f(x)
        # Calculate f(x - h)
        x[i] = tmp - h
        fxh2 = f(x)
        # Restore the value of x[i]
        x[i] = tmp
        # Approximate the gradient
        grad[i] = (fxh1 - fxh2) / (2 * h)
    return grad

def f(x):
    return np.sum(x*x)

x0 = np.array([1.0, 5.0])
gradient_descent(f,x0,lr=0.1,num_iter=50)

array([1.42724769e-05, 7.13623846e-05])

you can use PyTorch to perform automatic differentiation. PyTorch is a popular deep learning library that provides automatic differentiation through a system called autograd.

To use autograd in PyTorch, you need to define a torch.Tensor object and set its requires_grad attribute to True. This tells PyTorch to keep track of the operations performed on the tensor and compute the gradient when necessary.


Here's an example of how you can use automatic differentiation to implement the gradient descent algorithm in PyTorch:

In [1]:
import torch

torch.cuda.is_available()

False

In [21]:
import torch
import numpy as np

# Set the device to use for computations
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

def gradient_descent(f, x0, lr=0.1, num_iter=100):
    # Convert x0 to a tensor with requires_grad=True
    x = torch.tensor(x0, requires_grad=True, device=device)
    for i in range(num_iter):
        # Calculate the gradient of the cost function
        y = f(x)
        y.backward()
        # Update the parameters
        with torch.no_grad():
            x -= lr * x.grad
        # Zero the gradients
        x.grad.zero_()
    return x.detach().cpu().numpy()

# Define the cost function
def cost_function(x):
    return torch.sum(x*x)

# Set the initial values of the parameters
x0 = np.array([1, 2],dtype=float)

# Set the learning rate
lr = 0.1

# Set the number of iterations
num_iter = 100

# Find the optimal values of the parameters
x = gradient_descent(cost_function, x0, lr, num_iter)

print(x) 



[2.03703598e-10 4.07407195e-10]


## remark 1 

In PyTorch, the torch.no_grad context manager is used to temporarily disable gradient calculation. When gradient calculation is disabled, PyTorch will not track the operations performed within the context, and the gradients of tensors will not be updated.

This can be useful in situations where you want to perform an operation that does not require gradient calculation, such as updating the parameters of a model. By using torch.no_grad, you can avoid unnecessary overhead and improve the performance of your code.

## remark 2

the gradient_descent function returns the result of calling the detach, cpu, and numpy methods on the input tensor x.

The detach method is used to create a new tensor that does not require gradient calculation. This is useful when you want to return a tensor from a PyTorch function and use it outside of a PyTorch context, as the tensor will not retain a reference to the computation graph and will not consume additional memory.

The cpu method is used to move the tensor from the GPU (if it is on the GPU) to the CPU. This can be useful if you want to use the tensor with a NumPy function or save it to a file.

The numpy method is used to convert the tensor to a NumPy array. This can be useful if you want to use the tensor with other Python libraries that do not support PyTorch tensors.

Alternatively, you could return the tensor x directly without calling these methods. However, the tensor would still be a PyTorch tensor that requires gradient calculation and is stored on the GPU (if applicable), which might not be what you want.