In [None]:
%cd -

/content


In [None]:
!pwd

/content


In [None]:
%cd drive/MyDrive/ams_595_python_teaching

/content/drive/MyDrive/ams_595_python_teaching


# **Lecture 6 Neural Networks**

## Recap: Logistic Regression

Before we dive into neural networks, let's briefly recap what you've learned about logistic regression.

### A. Logistic Regression Basics

- **Binary Classification**: Logistic regression is a supervised learning algorithm used for binary classification problems, where the goal is to classify data into one of two classes (e.g., spam or not spam, fraud or not fraud).
- S**igmoid Function**: Logistic regression uses the sigmoid function to map input features to a probability score between 0 and 1. The formula for logistic regression is: $ P(Y=1|X) = \frac{1}{1 + e^{-(w_1X_1 + w_2X_2 + \ldots + w_nX_n + b)}} $
- **Cost Function**: The cost function in logistic regression is typically the log loss or cross-entropy loss, which measures the difference between predicted probabilities and actual labels.
- **Gradient Descent**: To train a logistic regression model, we use gradient descent to minimize the cost function and find the optimal values of coefficients ($\theta^*$) that make accurate predictions.

### B. Limitations of Logistic Regression

While logistic regression is a powerful algorithm, it has some limitations:

* It is only suitable for binary classification, we need more powerful learning algorithms to achieve other tasks.
* It can only model linear relationships between features and the target.
* It may not perform well when dealing with complex, non-linear data.

## Introduction to Neural Networks

Now, let's move on to the exciting world of neural networks, which can address some of the limitations of logistic regression.

Data:

![link text](https://sandipanweb.files.wordpress.com/2017/11/data1.png)

Ref: https://sandipanweb.wordpress.com/2017/11/25/some-deep-learning-with-python-tensorflow-and-keras/

This is actually linearly separable

![link text](https://sandipanweb.files.wordpress.com/2017/11/kernel.png)

To classify these data points, we can use linear classifiers such as logistic models, support vector machines, etc..

However, in many, if not most, real life applications, we have data that looks like this:

![link text](https://sandipanweb.files.wordpress.com/2017/11/flower.png)

This is not linearly separable. Thus, we need more powerful models to achieve the same task.


### What Are Neural Networks?

Neural networks are a class of machine learning models inspired by the structure and function of the human brain. They consist of interconnected nodes (neurons) organized into layers. Neural networks are used for various tasks, including classification, regression, and even more complex tasks like image recognition and natural language processing.

### Basic Components of a Neural Network

1. Input Layer: The input layer receives the features or data points. Each neuron in the input layer represents a feature.

2. Hidden Layers: Between the input and output layers, there can be one or more hidden layers. These layers contain neurons that process the input data using weights and activation functions.

3. Output Layer: The output layer provides the final prediction. The number of neurons in the output layer depends on the problem (e.g., one neuron for binary classification).

4. Weights and Bias: Each connection between neurons is associated with a weight. These weights are adjusted during training to learn the underlying patterns in the data. A bias term is also used to shift the output of a neuron.

5. Activation Functions: Activation functions introduce non-linearity into the model. Common activation functions include the sigmoid, ReLU (Rectified Linear Unit), and softmax functions.





![link text](https://miro.medium.com/v2/resize:fit:1199/1*N8UXaiUKWurFLdmEhEHiWg.jpeg)
Ref: https://miro.medium.com/v2/resize:fit:1199/1*N8UXaiUKWurFLdmEhEHiWg.jpeg




### Why Deep Neural Networks work?

0. Many tasks can be viewed as functions. For example, image classification is a function that maps the input image to the class it belongs to.
1. [**Universal Approximation Theorem**](https://en.wikipedia.org/wiki/Universal_approximation_theorem) for Functions, which suggests that neural networks can represent almost any function given a sufficient number of neurons in the hidden layers.
2. **Universal Approximation Theorem for Operators** (Chen \& Chen, IEEE Trans. Neural Netw., 1995)
> Suppose that $\sigma$ is a continuous non-polynomial function, $X$ is a Banach Space, $K_1 \subset X, K_2 \subset \mathbb{R}^d$ are two compact sets in $X$ and $\mathbb{R}^d$, respectively, $V$ is a compact set in $C\left(K_1\right), G$ is a nonlinear continuous operator, which maps $V$ into $C\left(K_2\right)$. Then for any $\epsilon>0$, there are positive integers $n, p$, $m$, constants $c_i^k, \xi_{i j}^k, \theta_i^k, \zeta_k \in \mathbb{R}, w_k \in \mathbb{R}^d, x_j \in K_1, i=1, \ldots, n$, $k=1, \ldots, p, j=1, \ldots, m$, such that
$$
|G(u)(y)-\sum_{k=1}^p \underbrace{\sum_{i=1}^n c_i^k \sigma\left(\sum_{j=1}^m \xi_{i j}^k u\left(x_j\right)+\theta_i^k\right)}_{\text {branch }} \underbrace{\sigma\left(w_k \cdot y+\zeta_k\right)}_{\text {trunk }}|<\epsilon
$$
holds for all $u \in V$ and $y \in K_2$.

## Forward Process


The process of making predictions in a neural network is called the forward process. It involves the following steps:
- Input Propagation: The input values are propagated through the network from the input layer to the output layer. Each neuron's output is computed based on its weighted sum of inputs and activation function.

- Activation Function: The weighted sum is passed through the activation function to introduce non-linearity.

- Output Prediction: The final output is produced by the output layer, which may represent probabilities for classification problems, the location of a certain object in a iamge, a sentence, etc..

## Learning in Neural Networks

- Forward Propagation: During forward propagation, the input data is fed into the network, and predictions are made.

- Backpropagation: After making predictions, the network's performance is evaluated using a cost function, similar to logistic regression. Backpropagation is the process of calculating gradients of the cost function with respect to the weights and biases in the network. These gradients are used to update the weights and improve the model's performance.

## Illustration

Additional details and derivations were discussed during the class. If you missed the class, it is your responsibility to ensure that you understand how a basic feedforward neural network operates.

Next week, we will start coding with Pytorch, a python machine learning framework.