# Structure Of Neural Network

**Introduction to Neural Networks: Part 1:**  you will learn the architecture of a Neural Network and the working principles of artificial neural networks, including information flow in feedforward networks.

**Introduction to Neural Networks: Part 2:** you will learn to implement neural networks using Keras (a high-level, user-friendly interface) with TensorFlow (a low-level workhorse library) as the back end. You will be able to build a neural network, modify its architecture, optimise and train it.

### In this Module
In this module, you will learn about – what are arguably the most sophisticated and cutting-edge models in machine learning – Artificial Neural Networks (or ANNs). Inspired by the structure of the human brain, neural networks have established a reputation for successfully learning complex tasks such as object recognition in images, automatic speech recognition (ASR), machine translation, image captioning, and video classification.

 

### In this session
To begin with, you will get an intuitive idea about neural networks. The following topics will be covered in this session:

- Structure of ANNs, inspired by the human brain
- Perceptron: A simple idea as the basis for larger neural networks
- Workings of artificial neurons
- Structure and topology of neural networks
- Hyperparameters and simplifying assumptions
 

### Prerequisites
As the main prerequisites for this session, you must have a basic understanding of the concepts of vectors, matrix multiplication, derivatives, and partial derivatives and must have completed the previous courses on statistics and ML.


## Biological to Artificial Neuron
 

As the name Artificial Neural Networks (ANNs) suggests, the design of ANNs is inspired by the human brain. Although not as powerful as the brain (yet), artificial neural networks are the most powerful learning models in the field of machine learning.


In the past few years, deep artificial neural networks have proven to perform surprisingly well on complex tasks such as speech recognition (converting speech into text), machine translation, and image and video classification. Such models are also commonly called deep learning models.


Let’s begin our journey into deep learning with an introduction to artificial neural networks.

## Deep Learning Applications
1. Image Recognition
2. Image tagging and video analysis
3. Auto text generation
4. Annotations for text and video
5. Speech recognition
6. Grammar change recommendations
7. Translating text
8. Automating games
9. Search text and draw inferences
from it

Artificial neural networks are said to be inspired by the structure of the human brain. Let’s first learn about the basic structure of the brain and the anatomy of a neuron and understand how information travels through neurons.

![1.png](attachment:bef143a5-9a4e-49f2-b6d2-ecce2b853aa5.png)

In simple words, the biological neuron works as follows: it receives signals through its dendrites, which are either amplified or inhibited, as they pass through the axons to the dendrites of other neurons.

To summarise, the main bottleneck in using neural networks is the availability of abundant training data. Neural networks find applications across various domains such as images and videos (computer vision), text and speech. Note that the words ‘deep learning’ and ‘neural networks’ are often used interchangeably.


Also, artificial neural networks are a collection of many simple devices called artificial neurons. The network ‘learns’ to conduct certain tasks, such as recognising a cat, by training the neurons to ‘fire’ in a certain way when given a particular input, such as a cat. In other words, the network learns to inhibit or amplify the input signals to perform a certain task, such as recognising a cat, speaking a word or identifying a tree.  


In the next segment, you will study the basics of a perceptron. The perceptron was one of the earliest proposed models for learning simple classification tasks, which later became the fundamental building block of artificial neural networks.



## Perceptron
 

In this segment, you will study the basics of a simple device called the perceptron, which was the first step towards creating the large neural networks that we have developed today. 

Let's take an example to understand how a perceptron works.

Consider a sushi place you plan to visit this Saturday. There are various factors that would affect this decision, such as:

- The distance between the sushi place and your home
- The cost of the food they serve there
- The number of people accompanying you

You make such a decision based on multiple such factors. Also, **each decision factor has a different ‘weight’**, for example, the distance of the place might be more important than the number of people accompanying you. 

 

Perceptrons work in a similar manner. They take some signals as inputs and perform a set of simple calculations to arrive at a decision. 

![2.png](attachment:678ca66d-60c4-4de5-a9c9-a39a03bf2050.png)




<div class="text_component" data-testid="text-component"><p>A <strong>perceptron </strong>acts as a tool that enables you to make a decision based on multiple factors. Each decision factor holds a different ‘weight’, for example, your neighbor, Rohit, may consider the amenities around the house to be more important than the other two factors. Similarly, perceptrons take such different factors as input signals, attach a certain weight based on the importance they give to the corresponding factors, and perform basic operations to decide what to do.</p><p>In other terms, the perceptron takes a weighted sum of multiple inputs (with bias) as the cumulative input and applies an output function on the cumulative input to get the output, which then assists in making a decision. You can observe the cumulative input in the formula given below,</p><p style="text-align: center;"><img alt="Equation" data-latex="Cumulative Input = w_{1}x_{1}+ w_{2}x_{2}+ w_{3}x_{3} + b" src="https://latex.upgrad.com/render?formula=Cumulative%20Input%20%3D%20w_%7B1%7Dx_%7B1%7D%2B%20w_%7B2%7Dx_%7B2%7D%2B%20w_%7B3%7Dx_%7B3%7D%20%2B%20b"></p><p>Where, <img alt="Equation" data-latex="x_{i}" src="https://latex.upgrad.com/render?formula=x_%7Bi%7D" style="vertical-align: middle;display: inline;">’s represent the inputs,&nbsp;<img alt="Equation" data-latex="w_{i}" src="https://latex.upgrad.com/render?formula=w_%7Bi%7D" style="vertical-align: middle;display: inline;">’s represent the weights associated with inputs and b represents bias.</p><p>Soon, you will be talking about everything in terms of vectors and matrices. So, let's start using these terms from now. Let’s say&nbsp;<img alt="Equation" data-latex="w" src="https://latex.upgrad.com/render?formula=w" style="vertical-align: middle;display: inline;"> and&nbsp;<img alt="Equation" data-latex="x" src="https://latex.upgrad.com/render?formula=x" style="vertical-align: middle;display: inline;"> are vectors representing weights and inputs as follows (note that, by default, a vector is assumed to be a column vector):<br>&nbsp;</p><p style="text-align: center;"><img alt="Equation" d src="https://latex.upgrad.com/render?formula=w%3D%5Cbegin%7Bbmatrix%7Dw_%7B1%7D%0A%5C%5C%20w_%7B2%7D%0A%5C%5C%20.%0A%5C%5C%20.%0A%5C%5C%20w_%7Bk%7D%0A%5C%5C%0A%5Cend%7Bbmatrix%7D%20%2C%20x%3D%5Cbegin%7Bbmatrix%7Dx_%7B1%7D%0A%5C%5C%20x_%7B2%7D%0A%5C%5C%20.%0A%5C%5C%20.%0A%5C%5C%20x_%7Bk%7D%0A%0A%5Cend%7Bbmatrix%7D"></p><p><br>A neat and concise way to represent the weighted sum of&nbsp;<img alt="Equation" data-latex="w" src="https://latex.upgrad.com/render?formula=w" style="vertical-align: middle;display: inline;"> and&nbsp;<img alt="Equation" data-latex="x" src="https://latex.upgrad.com/render?formula=x" style="vertical-align: middle;display: inline;"> is using the <strong>dot product</strong> of the transpose of the weight vector&nbsp;<img alt="Equation" data-latex="w^{T}" src="https://latex.upgrad.com/render?formula=w%5E%7BT%7D" style="vertical-align: middle;display: inline;"> and the input vector <img alt="Equation" data-latex="x" src="https://latex.upgrad.com/render?formula=x" style="vertical-align: middle;display: inline;">. Let’s understand this concept of taking the dot product of the transpose of weight vectors and input vectors.</p><p>The transpose of w is <img alt="Equation" data-latex="w^{T} =\begin{bmatrix} w_{1} &amp;w_{2} &amp;.. &amp; w_{k} \end{bmatrix}" src="https://latex.upgrad.com/render?formula=w%5E%7BT%7D%20%3D%5Cbegin%7Bbmatrix%7D%20w_%7B1%7D%20%26w_%7B2%7D%20%26..%20%26%20w_%7Bk%7D%20%5Cend%7Bbmatrix%7D" style="vertical-align: middle;display: inline;">, a row vector of size 1 x k. Taking the dot product of&nbsp; <img alt="Equation" data-latex="w^{T}" src="https://latex.upgrad.com/render?formula=w%5E%7BT%7D" style="vertical-align: middle;display: inline;">with&nbsp;<img alt="Equation" data-latex="x" src="https://latex.upgrad.com/render?formula=x" style="vertical-align: middle;display: inline;">&nbsp;, we get the following:<br>&nbsp;</p><p style="text-align: center;"><img alt="Equation" src="https://latex.upgrad.com/render?formula=w%5E%7BT%7D.x%3D%5Cbegin%7Bbmatrix%7D%0Aw_%7B1%7D%20%26w_%7B2%7D%20%20%26.%20%20%26%20.%20%26w_%7Bk%7D%20%0A%5Cend%7Bbmatrix%7D%20.%5Cbegin%7Bbmatrix%7D%0Ax_%7B1%7D%5C%5Cx_%7B2%7D%20%0A%5C%5C%20.%0A%5C%5C%20.%0A%5C%5C%20x_%7Bk%7D%0A%0A%5Cend%7Bbmatrix%7D%20%3Dw_%7B1%7Dx_%7B1%7D%2Bw_%7B2%7Dx_%7B2%7D%2B....%2Bw_%7Bk%7Dx_%7Bk%7D"><br>&nbsp;</p><p>After adding bias to <img alt="Equation" data-latex="w^{T}.x" src="https://latex.upgrad.com/render?formula=w%5E%7BT%7D.x" style="vertical-align: middle;display: inline;">, you will get the following equation:<br>&nbsp;</p><p style="text-align: center;"><img alt="Equation" data-latex="Cumulative Input = w^{T}.x  + b =  w_{1}x_{1}+ w_{2}x_{2}+ w_{3}x_{3} + b " src="https://latex.upgrad.com/render?formula=Cumulative%20Input%20%3D%20w%5E%7BT%7D.x%20%20%2B%20b%20%3D%20%20w_%7B1%7Dx_%7B1%7D%2B%20w_%7B2%7Dx_%7B2%7D%2B%20w_%7B3%7Dx_%7B3%7D%20%2B%20b%20"><br>&nbsp;</p><p>We then apply the step function to the cumulative input. According to the step function, if this cumulative sum of inputs is greater than 0, then the output is 1/yes; or else, it is 0/no. So, in Rohit’s case, if upon applying the step function on the cumulative input the output is 1, then he would like to visit the sushi place on the upcoming Saturday.<br><br>&nbsp;</p><p>Now, you have a basic understanding of a perceptron.&nbsp;</p></div>

In this segment, you will understand how a single artificial neuron works, i.e., how it converts inputs into outputs. You will also understand the topology or structure of large neural networks. Let’s get started by understanding the basic structure of an artificial neuron.

![3.png](attachment:866c4b6e-2ec5-4cdd-a719-852039cb8815.png)

Here, Equation's represent the inputs, Equation's represent the weights associated with the inputs, and Equation represents the bias of the neuron.

A neuron is quite similar to a perceptron. However, in perceptrons, the commonly used activation/output is the step function, whereas in the case of ANNs, the activation functions are non-linear functions.

Understand how large neural networks are designed using multiple individual neurons.

![4.png](attachment:16dc71f8-a2b3-462c-a7ab-142a14fc20d3.png)

![5.png](attachment:25410793-b25c-4456-9d15-5551d23b8201.png)


<div class="text_component" data-testid="text-component"><p>As you learned in the video above, multiple artificial neurons in a neural network are arranged in different layers. The first layer is known as the <strong>input layer</strong>, and the last layer is called the <strong>output layer</strong>. The layers in between these two are the <strong>hidden layers</strong>.</p><p>&nbsp;</p><p>The number of neurons in the input layer is equal to the number of attributes in the data set, and the <strong>number of neurons in the output layer is determined by the number of classes of the target variable </strong>(for a classification problem).</p><p>&nbsp;</p><p>For a <strong>regression problem</strong>, the number of neurons in the output layer would be 1 (a numeric variable). Take a look at the image given below to understand the topology of neural networks in the case of classification and regression problems.&nbsp;<br>&nbsp;</p><p style="text-align: center;"><img class="image-editor" maxwidth="100%" src="https://d35ev2v1xsdze0.cloudfront.net/4f0413fe-8c2b-4742-ad4f-1977feed2c60-usvgy4ec.png"></p><p><br>Note that the number of hidden layers or the number of neurons in each hidden layer or the activation functions used in the neural network changes according to the problem, and these details determine the topology or structure of the neural network. We will discuss this in the subsequent segments.</p></div>

**What all we need to specify in order to completely describe a neural network?**

![6.png](attachment:1de6bf6d-f6de-4048-8799-abfcaca33e13.png)

![7.png](attachment:11a903c9-6a5f-4900-a019-084717eb59f5.png)

![8.png](attachment:f6558d40-3b19-4977-a916-fde78340ff07.png)


So far, you have understood the basic structure of artificial neural networks. To summarise, there are six main elements that must be specified for any neural network. They are as follows:

- Input layer
- Output layer
- Hidden layers
- Network topology or structure
- Weights and biases
- Activation functions
  
You might have some questions, such as ‘How do we decide the number of neurons in a layer?’ or ‘How are weights and biases determined?’. You will be able to answer these questions in the next few segments, wherein you will learn about each of these specifications in depth.

## Inputs and Outputs of a Neural Network

As you learnt in the previous segment, **the number of neurons in the input layer is determined by the input given to the network, and the number of neurons in the output layer is equal to the number of classes** (for a classification task) or is one (for a regression task). Now, let’s take a look at some examples to understand the inputs and outputs of ANNs better.

 

Let’s get started by understanding the inputs and outputs of an ANN.

### Input Layer

<div class="MuiBox-root css-1bi8ut6"><div class="text_component" data-testid="text-component"><p>The most important thing to note is that inputs can only be numeric. For different types of input data, you need to use different ways to convert the inputs into a numeric form. The most commonly used inputs for ANNs are as follows:</p><ul><li><strong>Structured data</strong>: The type of data that we use in standard machine learning algorithms with multiple features and available in two dimensions, such that the data can be represented in a tabular format, can be used as input for training ANNs. Such data can be stored in <strong>CSV files, MAT files, Excel files, etc</strong>. This is highly convenient because the input to an ANN is usually given as a numeric feature vector. Such structured data eases the process of feeding the input into the ANN.&nbsp;</li><li><strong>Text data</strong>: For text data, you can use a<strong> one-hot vector</strong> or <strong>word embeddings</strong> corresponding to a certain word. For example, in one hot vector encoding, if the vocabulary size is |V|, then you can represent the word&nbsp;<img alt="Equation" data-latex="w_{n}" src="https://latex.upgrad.com/render?formula=w_%7Bn%7D" style="vertical-align: middle;display: inline;"> as a one-hot vector of size |V| with '1' at the nth element with all other elements being zero.&nbsp;The problem with one-hot representation is that, usually, the vocabulary size |V| is huge, in tens of thousands at least; hence, it is often better to use word embeddings that are a lower-dimensional representation of each word. The one-hot encoded array of the digits 0–9 will look as shown below.<br>&nbsp;</li></ul><div class="code-snippet-container" data-lang="python" id="code-snippet-cke_11690"><div class="code" contenteditable="false"><pre style="margin: 0; line-height: 125%;"><span></span>data <span style="color: #333333">=</span> np<span style="color: #333333">.</span>array([<span style="color: #0000DD; font-weight: bold">0</span>,<span style="color: #0000DD; font-weight: bold">1</span>,<span style="color: #0000DD; font-weight: bold">2</span>,<span style="color: #0000DD; font-weight: bold">3</span>,<span style="color: #0000DD; font-weight: bold">4</span>,<span style="color: #0000DD; font-weight: bold">5</span>,<span style="color: #0000DD; font-weight: bold">6</span>,<span style="color: #0000DD; font-weight: bold">7</span>,<span style="color: #0000DD; font-weight: bold">8</span>,<span style="color: #0000DD; font-weight: bold">9</span>])
<span style="color: #007020">print</span>(data<span style="color: #333333">.</span>shape)
one_hot(data)
(<span style="color: #0000DD; font-weight: bold">10</span>,)
array([[<span style="color: #6600EE; font-weight: bold">1.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,],
[<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">1.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,],
[<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">1.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,],
[<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">1.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,],
[<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">1.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,],
[<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">1.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,],
[<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">1.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,],
[<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">1.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,],
[<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">1.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,],
[<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">0.</span>,<span style="color: #6600EE; font-weight: bold">1.</span>,]])
</pre></div></div><ul><li><strong>Images</strong>: Images are naturally represented as arrays of numbers and can thus be fed into the network directly. These numbers are the <strong>raw pixels of an image</strong>. ‘Pixel’ is short for ‘picture element’. In images, pixels are arranged in rows and columns (an array of pixel elements). The figure given below shows the image of a handwritten 'zero' in the MNIST data set (black and white) and its corresponding representation in NumPy as an array of numbers. The pixel values are high where the intensity is high, i.e., the color is bright, while the values are low in the black regions, as shown below.</li></ul><p style="text-align: center;"><img class="image-editor" maxwidth="100%" src="https://d35ev2v1xsdze0.cloudfront.net/6415e9eb-b985-4269-855e-efd89c738755-gm8e37ux.png"></p><ul><li>Images (cont.): In a neural network, each <strong>pixel</strong> of the input image is a <strong>feature</strong>. For example, the image provided above is an 18 x 18 array. Hence, it will be fed as a <strong>vector of size 324</strong> into the network. Note that the image given above is <strong>black and white</strong> (also called a grayscale image), and thus, each pixel has only one <strong>‘channel’</strong>. If it were a <strong>colored image</strong>&nbsp;called an RGB (Red, Green and Blue) image, each pixel would have <strong>three channels</strong>, one each for red, blue, and green, as shown below. Hence, the number of neurons in the input layer would be 18 x 18 x 3 = 972. The three channels of an RGB image are shown below.</li></ul><p style="text-align: center;"><img class="image-editor" maxwidth="100%" src="https://d35ev2v1xsdze0.cloudfront.net/1b7c3d98-b4ba-42cc-8d5d-281cb8d3c0a5-a0q24uik.png"></p><ul><li><strong>Speech:</strong> In the case of a speech/voice input, the basic input unit is in the form of <strong>phonemes</strong>. These are the distinct units of speech in any language. The speech signal is in the form of waves, and to convert these waves into numeric inputs, you need to use Fourier Transforms (you do not need to worry about this as it is covering areas of specialized mathematics that will not be covered in this course). Note that the input after conversion should be numeric, so you are able to feed it into a neural network.</li></ul></div></div>

### Output Layer

Now that you have learnt how to feed input vectors into neural networks, let’s understand how the output layers are specified.

Depending on the nature of the given task, the **outputs of neural networks can either be** in the form of classes (if it is a **classification problem**) or numeric (if it is a **regression problem**). 


## SoftMax function 

One of the commonly used output functions is the **softmax function for classification**. Take a look at the graphical representation of the softmax function shown below.

<div class="MuiBox-root css-1bi8ut6"><div class="text_component" data-testid="text-component"><p>Depending on the nature of the given task, the outputs of neural networks can either be in the form of classes (if it is a classification problem) or numeric (if it is a regression problem).&nbsp;<br>One of the commonly used output functions is the <strong>softmax function</strong> for classification. Take a look at the graphical representation of the softmax function shown below.</p><p style="text-align: center;"><img class="image-editor" maxwidth="100%" src="https://d35ev2v1xsdze0.cloudfront.net/d5ff776f-35ef-45a5-86e3-aa4fd5782b5c-2l6isee2.png"></p><p>&nbsp;A softmax output is similar to what we get from a multiclass logistic function commonly used to compute the probability of an output belonging to one of the multiple classes. It is given by the following formula:&nbsp;</p><p style="text-align: center;"><img alt="Equation" data-latex="p_{i}=\frac{e^{w_{i}{x}'}}{\sum _{t=o}^{c-1}e^{w_{t}.{x}'}}" src="https://latex.upgrad.com/render?formula=p_%7Bi%7D%3D%5Cfrac%7Be%5E%7Bw_%7Bi%7D%7Bx%7D'%7D%7D%7B%5Csum%20_%7Bt%3Do%7D%5E%7Bc-1%7De%5E%7Bw_%7Bt%7D.%7Bx%7D'%7D%7D"></p><p>where c is the number of classes or neurons in the output layer,&nbsp;<img alt="Equation" data-latex="{x}'" src="https://latex.upgrad.com/render?formula=%7Bx%7D%27" style="vertical-align: middle;display: inline;">&nbsp;is the input to the network, and <img alt="Equation" data-latex="w_{i}" src="https://latex.upgrad.com/render?formula=w_%7Bi%7D" style="vertical-align: middle;display: inline;">’s are the weights associated with the inputs.</p><p>Suppose the output layer of a data set has 3 neurons and all of them have the same input&nbsp;<img alt="Equation" data-latex="{x}'" src="https://latex.upgrad.com/render?formula=%7Bx%7D%27" style="vertical-align: middle;display: inline;">&nbsp; (coming from the previous layers in the network). The weights associated with them are represented as <img alt="Equation" data-latex="w_{0}" src="https://latex.upgrad.com/render?formula=w_%7B0%7D" style="vertical-align: middle;display: inline;">, <img alt="Equation" data-latex="w_{1}" src="https://latex.upgrad.com/render?formula=w_%7B1%7D" style="vertical-align: middle;display: inline;">and <img alt="Equation" data-latex="w_{2}" src="https://latex.upgrad.com/render?formula=w_%7B2%7D" style="vertical-align: middle;display: inline;">. In such a case, the probability of the input belonging to each of the classes are expressed as follows:<br>&nbsp;</p><p style="text-align: center;"><img alt="Equation" data-latex="p_{0}=\frac{e^{w_{0}{x}'}}{e^{w_{0}.{x}'}+e^{w_{1}{x}'}+e^{w_{2}{x}'}}\\
p_{1}=\frac{e^{w_{1}{x}'}}{e^{w_{0}.{x}'}+e^{w_{1}{x}'}+e^{w_{2}{x}'}}\\
p_{2}=\frac{e^{w_{2}{x}'}}{e^{w_{0}.{x}'}+e^{w_{1}{x}'}+e^{w_{2}{x}'}}" src="https://latex.upgrad.com/render?formula=p_%7B0%7D%3D%5Cfrac%7Be%5E%7Bw_%7B0%7D%7Bx%7D'%7D%7D%7Be%5E%7Bw_%7B0%7D.%7Bx%7D'%7D%2Be%5E%7Bw_%7B1%7D%7Bx%7D'%7D%2Be%5E%7Bw_%7B2%7D%7Bx%7D'%7D%7D%5C%5C%0Ap_%7B1%7D%3D%5Cfrac%7Be%5E%7Bw_%7B1%7D%7Bx%7D'%7D%7D%7Be%5E%7Bw_%7B0%7D.%7Bx%7D'%7D%2Be%5E%7Bw_%7B1%7D%7Bx%7D'%7D%2Be%5E%7Bw_%7B2%7D%7Bx%7D'%7D%7D%5C%5C%0Ap_%7B2%7D%3D%5Cfrac%7Be%5E%7Bw_%7B2%7D%7Bx%7D'%7D%7D%7Be%5E%7Bw_%7B0%7D.%7Bx%7D'%7D%2Be%5E%7Bw_%7B1%7D%7Bx%7D'%7D%2Be%5E%7Bw_%7B2%7D%7Bx%7D'%7D%7D"></p><p dir="ltr">Also, it is evident from these expressions that the sum <img alt="Equation" data-latex="p_{0}+p_{1}+p_{2}=1" src="https://latex.upgrad.com/render?formula=p_%7B0%7D%2Bp_%7B1%7D%2Bp_%7B2%7D%3D1" style="vertical-align: middle;display: inline;"><b>&nbsp;</b>and that <img alt="Equation" data-latex="p_{0}" src="https://latex.upgrad.com/render?formula=p_%7B0%7D" style="vertical-align: middle;display: inline;">, <img alt="Equation" data-latex="p_{1}" src="https://latex.upgrad.com/render?formula=p_%7B1%7D" style="vertical-align: middle;display: inline;">and&nbsp;&nbsp;<img alt="Equation" data-latex="p_{2}" src="https://latex.upgrad.com/render?formula=p_%7B2%7D" style="vertical-align: middle;display: inline;">&nbsp;<img alt="Equation" data-latex="\epsilon (0,1)" src="https://latex.upgrad.com/render?formula=%5Cepsilon%20%280%2C1%29" style="vertical-align: middle;display: inline;">.&nbsp;</p><p dir="ltr">Now, try to answer the questions given below.</p></div></div>


<div class="MuiBox-root css-0"><div class="MuiBox-root css-j5iq39"><span class="MuiTypography-root MuiTypography-body1 css-y2csrh"><svg class="MuiSvgIcon-root MuiSvgIcon-fontSizeMedium css-1orqudo" focusable="false" aria-hidden="true" viewBox="0 0 24 24" data-testid="PlaylistAddCheckIcon"><path d="M3 10h11v2H3zm0-4h11v2H3zm0 8h7v2H3zm17.59-2.07-4.25 4.24-2.12-2.12-1.41 1.41L16.34 19 22 13.34z"></path></svg>Single Selection</span></div><div class="MuiBox-root css-1n0bguq"><div class="MuiBox-root css-1rclleb"><p class="MuiTypography-root MuiTypography-body1 css-1atnuew">1.</p><div class="MuiTypography-root MuiTypography-body1 css-27ssgc"><div class="fr-view clearfix MuiBox-root css-10rvbm3"><p style="text-align: justify;">Suppose the output layer has 4 neurons, and all of them have the same input ‘x’. The weights associated with them are represented as&nbsp;<img alt="Equation" data-latex="w_{0}" src="https://latex.upgrad.com/render?formula=w_%7B0%7D" style="vertical-align: middle;display: inline;">, <img alt="Equation" data-latex="w_{1}" src="https://latex.upgrad.com/render?formula=w_%7B1%7D" style="vertical-align: middle;display: inline;">,&nbsp;<img alt="Equation" data-latex="w_{2}" src="https://latex.upgrad.com/render?formula=w_%7B2%7D" style="vertical-align: middle;display: inline;"> and <img alt="Equation" data-latex="w_{3}" src="https://latex.upgrad.com/render?formula=w_%7B3%7D" style="vertical-align: middle;display: inline;">, respectively. What will be the expression for <img alt="Equation" data-latex="p_{3}" src="https://latex.upgrad.com/render?formula=p_%7B3%7D" style="vertical-align: middle;display: inline;">?</p></div></div></div></div><ul class="MuiList-root css-1uzmcsd"><div class="MuiBox-root css-31eh63"><li class="MuiListItem-root MuiListItem-gutters MuiListItem-padding  css-j808jd"><div class="MuiBox-root css-1lxun8w"><button class="MuiButtonBase-root MuiIconButton-root MuiIconButton-sizeMedium css-148fdm8" tabindex="0" type="button" data-index="0"><svg xmlns="http://www.w3.org/2000/svg" width="19" height="19" viewBox="0 0 24 24" fill="currentColor"><path d="M12,2A10,10,0,1,0,22,12,10,10,0,0,0,12,2Zm0,18a8,8,0,1,1,8-8A8,8,0,0,1,12,20Z"></path></svg><span class="MuiTouchRipple-root css-w0pj6f"></span></button></div><div class="MuiTypography-root MuiTypography-body1 css-27ssgc"><div class="MuiBox-root css-1vvnfnh"><div class="fr-view clearfix MuiBox-root css-cmsvy3"><p><img alt="Equation" data-latex="\frac{e^{w_{0}{x}'}}{e^{w_{0}.{x}'}+e^{w_{1}{x}'}+e^{w_{2}{x}'}+e^{w_{3}{x}'}}" src="https://latex.upgrad.com/render?formula=%5Cfrac%7Be%5E%7Bw_%7B0%7D%7Bx%7D%27%7D%7D%7Be%5E%7Bw_%7B0%7D.%7Bx%7D%27%7D%2Be%5E%7Bw_%7B1%7D%7Bx%7D%27%7D%2Be%5E%7Bw_%7B2%7D%7Bx%7D%27%7D%2Be%5E%7Bw_%7B3%7D%7Bx%7D%27%7D%7D" style="vertical-align: middle;display: inline;"></p></div></div></div><div class="css-1980lm8"><p class="MuiTypography-root MuiTypography-body1 css-zcyem6"></p></div></li></div><div class="MuiBox-root css-31eh63"><li class="MuiListItem-root MuiListItem-gutters MuiListItem-padding  css-j808jd"><div class="MuiBox-root css-1lxun8w"><button class="MuiButtonBase-root MuiIconButton-root MuiIconButton-sizeMedium css-148fdm8" tabindex="0" type="button" data-index="1"><svg xmlns="http://www.w3.org/2000/svg" width="19" height="19" viewBox="0 0 24 24" fill="currentColor"><path d="M12,2A10,10,0,1,0,22,12,10,10,0,0,0,12,2Zm0,18a8,8,0,1,1,8-8A8,8,0,0,1,12,20Z"></path></svg><span class="MuiTouchRipple-root css-w0pj6f"></span></button></div><div class="MuiTypography-root MuiTypography-body1 css-27ssgc"><div class="MuiBox-root css-1vvnfnh"><div class="fr-view clearfix MuiBox-root css-cmsvy3"><p><img alt="Equation" data-latex="\frac{e^{w_{1}{x}'}}{e^{w_{0}.{x}'}+e^{w_{1}{x}'}+e^{w_{2}{x}'}+e^{w_{3}{x}'}}" src="https://latex.upgrad.com/render?formula=%5Cfrac%7Be%5E%7Bw_%7B1%7D%7Bx%7D%27%7D%7D%7Be%5E%7Bw_%7B0%7D.%7Bx%7D%27%7D%2Be%5E%7Bw_%7B1%7D%7Bx%7D%27%7D%2Be%5E%7Bw_%7B2%7D%7Bx%7D%27%7D%2Be%5E%7Bw_%7B3%7D%7Bx%7D%27%7D%7D" style="vertical-align: middle;display: inline;"></p></div></div></div><div class="css-1980lm8"><p class="MuiTypography-root MuiTypography-body1 css-zcyem6"></p></div></li></div><div class="MuiBox-root css-31eh63"><li class="MuiListItem-root MuiListItem-gutters MuiListItem-padding  css-j808jd"><div class="MuiBox-root css-1lxun8w"><button class="MuiButtonBase-root MuiIconButton-root MuiIconButton-sizeMedium css-148fdm8" tabindex="0" type="button" data-index="2"><svg xmlns="http://www.w3.org/2000/svg" width="19" height="19" viewBox="0 0 24 24" fill="currentColor"><path d="M12,2A10,10,0,1,0,22,12,10,10,0,0,0,12,2Zm0,18a8,8,0,1,1,8-8A8,8,0,0,1,12,20Z"></path></svg><span class="MuiTouchRipple-root css-w0pj6f"></span></button></div><div class="MuiTypography-root MuiTypography-body1 css-27ssgc"><div class="MuiBox-root css-1vvnfnh"><div class="fr-view clearfix MuiBox-root css-cmsvy3"><p><img alt="Equation" data-latex="\frac{e^{w_{2}{x}'}}{e^{w_{0}.{x}'}+e^{w_{1}{x}'}+e^{w_{2}{x}'}+e^{w_{3}{x}'}}" src="https://latex.upgrad.com/render?formula=%5Cfrac%7Be%5E%7Bw_%7B2%7D%7Bx%7D%27%7D%7D%7Be%5E%7Bw_%7B0%7D.%7Bx%7D%27%7D%2Be%5E%7Bw_%7B1%7D%7Bx%7D%27%7D%2Be%5E%7Bw_%7B2%7D%7Bx%7D%27%7D%2Be%5E%7Bw_%7B3%7D%7Bx%7D%27%7D%7D" style="vertical-align: middle;display: inline;"></p></div></div></div><div class="css-1980lm8"><p class="MuiTypography-root MuiTypography-body1 css-zcyem6"></p></div></li></div><div class="MuiBox-root css-31eh63"><li class="MuiListItem-root MuiListItem-gutters MuiListItem-padding  css-j808jd"><div class="MuiBox-root css-1lxun8w"><button class="MuiButtonBase-root MuiIconButton-root MuiIconButton-sizeMedium css-148fdm8" tabindex="0" type="button" data-index="3"><svg xmlns="http://www.w3.org/2000/svg" width="19" height="19" viewBox="0 0 24 24" fill="currentColor"><path d="M12,2A10,10,0,1,0,22,12,10,10,0,0,0,12,2Zm0,18a8,8,0,1,1,8-8A8,8,0,0,1,12,20Z"></path></svg><span class="MuiTouchRipple-root css-w0pj6f"></span></button></div><div class="MuiTypography-root MuiTypography-body1 css-27ssgc"><div class="MuiBox-root css-1vvnfnh"><div class="fr-view clearfix MuiBox-root css-cmsvy3"><p><img alt="Equation" data-latex="\frac{e^{w_{3}{x}'}}{e^{w_{0}.{x}'}+e^{w_{1}{x}'}+e^{w_{2}{x}'}+e^{w_{3}{x}'}}" src="https://latex.upgrad.com/render?formula=%5Cfrac%7Be%5E%7Bw_%7B3%7D%7Bx%7D%27%7D%7D%7Be%5E%7Bw_%7B0%7D.%7Bx%7D%27%7D%2Be%5E%7Bw_%7B1%7D%7Bx%7D%27%7D%2Be%5E%7Bw_%7B2%7D%7Bx%7D%27%7D%2Be%5E%7Bw_%7B3%7D%7Bx%7D%27%7D%7D" style="vertical-align: middle;display: inline;"></p></div></div></div><div class="css-1980lm8"><p class="MuiTypography-root MuiTypography-body1 css-zcyem6"></p></div></li></div></ul></div>

<div class="rc-scrollbars-view" style="position: absolute; inset: 0px; overflow: scroll; margin-right: -15px; margin-bottom: -15px;"><div class="MuiBox-root css-0"><div class="MuiBox-root css-j5iq39"><span class="MuiTypography-root MuiTypography-body1 css-y2csrh"><svg class="MuiSvgIcon-root MuiSvgIcon-fontSizeMedium css-1orqudo" focusable="false" aria-hidden="true" viewBox="0 0 24 24" data-testid="PlaylistAddCheckIcon"><path d="M3 10h11v2H3zm0-4h11v2H3zm0 8h7v2H3zm17.59-2.07-4.25 4.24-2.12-2.12-1.41 1.41L16.34 19 22 13.34z"></path></svg>Single Selection</span></div><div class="MuiBox-root css-1n0bguq"><div class="MuiBox-root css-1rclleb"><p class="MuiTypography-root MuiTypography-body1 css-1atnuew">2.</p><div class="MuiTypography-root MuiTypography-body1 css-27ssgc"><div class="fr-view clearfix MuiBox-root css-10rvbm3"><p>Suppose we have two classes (0 and 1) in the output, and the probability of getting class 0 as the output is <img alt="Equation" data-latex="p_{0}" src="https://latex.upgrad.com/render?formula=p_%7B0%7D" style="vertical-align: middle;display: inline;">&nbsp;and the probability of getting class 1 as the output is <img alt="Equation" data-latex="p_{1}" src="https://latex.upgrad.com/render?formula=p_%7B1%7D" style="vertical-align: middle;display: inline;">. In the softmax output layer, if the minimum value of&nbsp;<img alt="Equation" data-latex="p_{0}" src="https://latex.upgrad.com/render?formula=p_%7B0%7D" style="vertical-align: middle;display: inline;">&nbsp;is 0.5, then what is the range of <img alt="Equation" data-latex="p_{1}" src="https://latex.upgrad.com/render?formula=p_%7B1%7D" style="vertical-align: middle;display: inline;">?</p></div></div></div></div><ul class="MuiList-root css-1uzmcsd"><div class="MuiBox-root css-31eh63"><li class="MuiListItem-root MuiListItem-gutters MuiListItem-padding  css-j808jd"><div class="MuiBox-root css-1lxun8w"><button class="MuiButtonBase-root MuiIconButton-root MuiIconButton-sizeMedium css-148fdm8" tabindex="0" type="button" data-index="0"><svg xmlns="http://www.w3.org/2000/svg" width="19" height="19" viewBox="0 0 24 24" fill="currentColor"><path d="M12,2A10,10,0,1,0,22,12,10,10,0,0,0,12,2Zm0,18a8,8,0,1,1,8-8A8,8,0,0,1,12,20Z"></path></svg><span class="MuiTouchRipple-root css-w0pj6f"></span></button></div><div class="MuiTypography-root MuiTypography-body1 css-27ssgc"><div class="MuiBox-root css-1vvnfnh"><div class="fr-view clearfix MuiBox-root css-cmsvy3"><p>0 to 1</p></div></div></div><div class="css-1980lm8"><p class="MuiTypography-root MuiTypography-body1 css-zcyem6"></p></div></li></div><div class="MuiBox-root css-31eh63"><li class="MuiListItem-root MuiListItem-gutters MuiListItem-padding  css-j808jd"><div class="MuiBox-root css-1lxun8w"><button class="MuiButtonBase-root MuiIconButton-root MuiIconButton-sizeMedium css-148fdm8" tabindex="0" type="button" data-index="1"><svg xmlns="http://www.w3.org/2000/svg" width="19" height="19" viewBox="0 0 24 24" fill="currentColor"><path d="M12,2A10,10,0,1,0,22,12,10,10,0,0,0,12,2Zm0,18a8,8,0,1,1,8-8A8,8,0,0,1,12,20Z"></path></svg><span class="MuiTouchRipple-root css-w0pj6f"></span></button></div><div class="MuiTypography-root MuiTypography-body1 css-27ssgc"><div class="MuiBox-root css-1vvnfnh"><div class="fr-view clearfix MuiBox-root css-cmsvy3"><p>0 to 0.5</p></div></div></div><div class="css-1980lm8"><p class="MuiTypography-root MuiTypography-body1 css-zcyem6"></p></div></li></div><div class="MuiBox-root css-31eh63"><li class="MuiListItem-root MuiListItem-gutters MuiListItem-padding  css-j808jd"><div class="MuiBox-root css-1lxun8w"><button class="MuiButtonBase-root MuiIconButton-root MuiIconButton-sizeMedium css-148fdm8" tabindex="0" type="button" data-index="2"><svg xmlns="http://www.w3.org/2000/svg" width="19" height="19" viewBox="0 0 24 24" fill="currentColor"><path d="M12,2A10,10,0,1,0,22,12,10,10,0,0,0,12,2Zm0,18a8,8,0,1,1,8-8A8,8,0,0,1,12,20Z"></path></svg><span class="MuiTouchRipple-root css-w0pj6f"></span></button></div><div class="MuiTypography-root MuiTypography-body1 css-27ssgc"><div class="MuiBox-root css-1vvnfnh"><div class="fr-view clearfix MuiBox-root css-cmsvy3"><p>0 to 0.25</p></div></div></div><div class="css-1980lm8"><p class="MuiTypography-root MuiTypography-body1 css-zcyem6"></p></div></li></div><div class="MuiBox-root css-31eh63"><li class="MuiListItem-root MuiListItem-gutters MuiListItem-padding  css-j808jd"><div class="MuiBox-root css-1lxun8w"><button class="MuiButtonBase-root MuiIconButton-root MuiIconButton-sizeMedium css-148fdm8" tabindex="0" type="button" data-index="3"><svg xmlns="http://www.w3.org/2000/svg" width="19" height="19" viewBox="0 0 24 24" fill="currentColor"><path d="M12,2A10,10,0,1,0,22,12,10,10,0,0,0,12,2Zm0,18a8,8,0,1,1,8-8A8,8,0,0,1,12,20Z"></path></svg><span class="MuiTouchRipple-root css-w0pj6f"></span></button></div><div class="MuiTypography-root MuiTypography-body1 css-27ssgc"><div class="MuiBox-root css-1vvnfnh"><div class="fr-view clearfix MuiBox-root css-cmsvy3"><p>0.5 to 1</p></div></div></div><div class="css-1980lm8"><p class="MuiTypography-root MuiTypography-body1 css-zcyem6"></p></div></li></div></ul></div></div>



So, we have seen the softmax function as a commonly used output function in multiclass classification. Now, let’s understand how the **softmax function translates to the sigmoid function** in the special case of binary classification.

![10.png](attachment:c81ba6ab-8cea-4beb-b17b-880aea20080b.png)

<p>&nbsp;In the case of a <strong>sigmoid output</strong>, there is only <strong>one neuron </strong>in the output layer because if there are two classes with probabilities <img alt="Equation" data-latex="p_{0}" src="https://latex.upgrad.com/render?formula=p_%7B0%7D" style="vertical-align: middle;display: inline;">&nbsp;and<strong> <strong><img alt="Equation" data-latex="p_{1}" src="https://latex.upgrad.com/render?formula=p_%7B1%7D" style="vertical-align: middle;display: inline;"></strong></strong>, we know that <img alt="Equation" data-latex="p_{0}+p_{1} = 1" src="https://latex.upgrad.com/render?formula=p_%7B0%7D%2Bp_%7B1%7D%20%3D%201" style="vertical-align: middle;display: inline;">. Hence, we need to compute the value of either <img alt="Equation" data-latex="p_{0}" src="https://latex.upgrad.com/render?formula=p_%7B0%7D" style="vertical-align: middle;display: inline;">&nbsp;or <strong><strong><img alt="Equation" data-latex="p_{1}" src="https://latex.upgrad.com/render?formula=p_%7B1%7D" style="vertical-align: middle;display: inline;"></strong></strong>. In other words, the sigmoid function is just a special case of the softmax function (since binary classification is a special case of multiclass classification).<br>In fact, we can derive the sigmoid function from the softmax function, as shown below. Let's assume that the softmax function has two neurons with the following outputs:<br>&nbsp;</p><p style="text-align: center;"><img alt="Equation" data-latex="p_{0}=\frac{e^{w_{0}{x}'}}{e^{w_{0}.{x}'}+e^{w_{1}{x}'}},
p_{1}=\frac{e^{w_{1}{x}'}}{e^{w_{0}.{x}'}+e^{w_{1}{x}'}}" src="https://latex.upgrad.com/render?formula=p_%7B0%7D%3D%5Cfrac%7Be%5E%7Bw_%7B0%7D%7Bx%7D'%7D%7D%7Be%5E%7Bw_%7B0%7D.%7Bx%7D'%7D%2Be%5E%7Bw_%7B1%7D%7Bx%7D'%7D%7D%2C%0Ap_%7B1%7D%3D%5Cfrac%7Be%5E%7Bw_%7B1%7D%7Bx%7D'%7D%7D%7Be%5E%7Bw_%7B0%7D.%7Bx%7D'%7D%2Be%5E%7Bw_%7B1%7D%7Bx%7D'%7D%7D"></p><p><br>&nbsp;Consider only&nbsp;&nbsp;<img alt="Equation" data-latex="p_{1}" src="https://latex.upgrad.com/render?formula=p_%7B1%7D" style="vertical-align: middle;display: inline;">&nbsp;and divide both the numerator and the denominator with the numerator. We can now rewrite <img alt="Equation" data-latex="p_{1}" src="https://latex.upgrad.com/render?formula=p_%7B1%7D" style="vertical-align: middle;display: inline;"><strong>&nbsp;</strong>as:<br>&nbsp;</p><p style="text-align: center;"><img alt="Equation" data-latex="p_{1}=\frac{1}{1+\frac{e^{w_{0}.{x}'}}{e^{w_{1}.{x}'}}}=\frac{1}{1+e^{(w_{0}-w_{1}).{x}'}}" src="https://latex.upgrad.com/render?formula=p_%7B1%7D%3D%5Cfrac%7B1%7D%7B1%2B%5Cfrac%7Be%5E%7Bw_%7B0%7D.%7Bx%7D'%7D%7D%7Be%5E%7Bw_%7B1%7D.%7Bx%7D'%7D%7D%7D%3D%5Cfrac%7B1%7D%7B1%2Be%5E%7B(w_%7B0%7D-w_%7B1%7D).%7Bx%7D'%7D%7D"></p><p><br>And, if we replace <img alt="Equation" data-latex="w_{1} - w_{0}" src="https://latex.upgrad.com/render?formula=w_%7B1%7D%20-%20w_%7B0%7D" style="vertical-align: middle;display: inline;">&nbsp;= some <img alt="Equation" data-latex="w" src="https://latex.upgrad.com/render?formula=w" style="vertical-align: middle;display: inline;">, we get the sigmoid function. Voila!</p></div></div>




<div class="MuiBox-root css-0"><div class="MuiBox-root css-1n0bguq"><div class="MuiBox-root css-1rclleb">Question<div class="MuiTypography-root MuiTypography-body1 css-27ssgc"><div class="fr-view clearfix MuiBox-root css-10rvbm3"><p>Consider a neural network with three output neurons for classification. The input vector is&nbsp;<img alt="Equation" data-latex="x^{'}=\begin{bmatrix} 2 \\ 1 \\1 \end{bmatrix}" src="https://latex.upgrad.com/render?formula=x%5E%7B%27%7D%3D%5Cbegin%7Bbmatrix%7D%202%20%5C%5C%201%20%5C%5C1%20%5Cend%7Bbmatrix%7D" style="vertical-align: middle;display: inline;">, and the weights are <img alt="Equation" data-latex="w_{0}=\begin{bmatrix}1 \\ 1 \\ -1 \end{bmatrix}" src="https://latex.upgrad.com/render?formula=w_%7B0%7D%3D%5Cbegin%7Bbmatrix%7D1%20%5C%5C%201%20%5C%5C%20-1%20%5Cend%7Bbmatrix%7D" style="vertical-align: middle;display: inline;">&nbsp;,<img alt="Equation" data-latex="w_{1}=\begin{bmatrix}2 \\ 0 \\ -1 \end{bmatrix}" src="https://latex.upgrad.com/render?formula=w_%7B1%7D%3D%5Cbegin%7Bbmatrix%7D2%20%5C%5C%200%20%5C%5C%20-1%20%5Cend%7Bbmatrix%7D" style="vertical-align: middle;display: inline;">&nbsp;and&nbsp;<img alt="Equation" data-latex="w_{2}=\begin{bmatrix} 1\\2 \\ 2 \end{bmatrix}" src="https://latex.upgrad.com/render?formula=w_%7B2%7D%3D%5Cbegin%7Bbmatrix%7D%201%5C%5C2%20%5C%5C%202%20%5Cend%7Bbmatrix%7D" style="vertical-align: middle;display: inline;">. What are the values of <img alt="Equation" data-latex="p_{0}" src="https://latex.upgrad.com/render?formula=p_%7B0%7D" style="vertical-align: middle;display: inline;">, <img alt="Equation" data-latex="p_{1}" src="https://latex.upgrad.com/render?formula=p_%7B1%7D" style="vertical-align: middle;display: inline;">&nbsp;and <img alt="Equation" data-latex="p_{2}" src="https://latex.upgrad.com/render?formula=p_%7B2%7D" style="vertical-align: middle;display: inline;">&nbsp;up to three decimal points?</p></div></div></div></div><ul class="MuiList-root css-1uzmcsd"><div class="MuiBox-root css-31eh63"><li class="MuiListItem-root MuiListItem-gutters MuiListItem-padding  css-j808jd"><div class="MuiBox-root css-1lxun8w"><button class="MuiButtonBase-root MuiIconButton-root MuiIconButton-sizeMedium css-148fdm8" tabindex="0" type="button" data-index="0"><svg xmlns="http://www.w3.org/2000/svg" width="19" height="19" viewBox="0 0 24 24" fill="currentColor"><path d="M12,2A10,10,0,1,0,22,12,10,10,0,0,0,12,2Zm0,18a8,8,0,1,1,8-8A8,8,0,0,1,12,20Z"></path></svg><span class="MuiTouchRipple-root css-w0pj6f"></span></button></div><div class="MuiTypography-root MuiTypography-body1 css-27ssgc"><div class="MuiBox-root css-1vvnfnh"><div class="fr-view clearfix MuiBox-root css-cmsvy3"><p>0.2, 0.3, 0.5</p></div></div></div><div class="css-1980lm8"><p class="MuiTypography-root MuiTypography-body1 css-zcyem6"></p></div></li></div><div class="MuiBox-root css-31eh63"><li class="MuiListItem-root MuiListItem-gutters MuiListItem-padding  css-j808jd"><div class="MuiBox-root css-1lxun8w"><button class="MuiButtonBase-root MuiIconButton-root MuiIconButton-sizeMedium css-148fdm8" tabindex="0" type="button" data-index="1"><svg xmlns="http://www.w3.org/2000/svg" width="19" height="19" viewBox="0 0 24 24" fill="currentColor"><path d="M12,2A10,10,0,1,0,22,12,10,10,0,0,0,12,2Zm0,18a8,8,0,1,1,8-8A8,8,0,0,1,12,20Z"></path></svg><span class="MuiTouchRipple-root css-w0pj6f"></span></button></div><div class="MuiTypography-root MuiTypography-body1 css-27ssgc"><div class="MuiBox-root css-1vvnfnh"><div class="fr-view clearfix MuiBox-root css-cmsvy3"><p>0.017, 0.047, 0.936</p></div></div></div><div class="css-1980lm8"><p class="MuiTypography-root MuiTypography-body1 css-zcyem6"></p></div></li></div><div class="MuiBox-root css-31eh63"><li class="MuiListItem-root MuiListItem-gutters MuiListItem-padding  css-j808jd"><div class="MuiBox-root css-1lxun8w"><button class="MuiButtonBase-root MuiIconButton-root MuiIconButton-sizeMedium css-148fdm8" tabindex="0" type="button" data-index="2"><svg xmlns="http://www.w3.org/2000/svg" width="19" height="19" viewBox="0 0 24 24" fill="currentColor"><path d="M12,2A10,10,0,1,0,22,12,10,10,0,0,0,12,2Zm0,18a8,8,0,1,1,8-8A8,8,0,0,1,12,20Z"></path></svg><span class="MuiTouchRipple-root css-w0pj6f"></span></button></div><div class="MuiTypography-root MuiTypography-body1 css-27ssgc"><div class="MuiBox-root css-1vvnfnh"><div class="fr-view clearfix MuiBox-root css-cmsvy3"><p>0.21, 0.31, 0.51</p></div></div></div><div class="css-1980lm8"><p class="MuiTypography-root MuiTypography-body1 css-zcyem6"></p></div></li></div><div class="MuiBox-root css-31eh63"><li class="MuiListItem-root MuiListItem-gutters MuiListItem-padding  css-j808jd"><div class="MuiBox-root css-1lxun8w"><button class="MuiButtonBase-root MuiIconButton-root MuiIconButton-sizeMedium css-148fdm8" tabindex="0" type="button" data-index="3"><svg xmlns="http://www.w3.org/2000/svg" width="19" height="19" viewBox="0 0 24 24" fill="currentColor"><path d="M12,2A10,10,0,1,0,22,12,10,10,0,0,0,12,2Zm0,18a8,8,0,1,1,8-8A8,8,0,0,1,12,20Z"></path></svg><span class="MuiTouchRipple-root css-w0pj6f"></span></button></div><div class="MuiTypography-root MuiTypography-body1 css-27ssgc"><div class="MuiBox-root css-1vvnfnh"><div class="fr-view clearfix MuiBox-root css-cmsvy3"><p>0.002, 0.034, 0.964</p></div></div></div><div class="css-1980lm8"><p class="MuiTypography-root MuiTypography-body1 css-zcyem6"></p></div></li></div></ul></div>

Now that you have understood how the **output is obtained from the softmax function** and how different types of inputs are fed into the ANN, let's learn how to **define inputs and outputs for image recognition** on the famous MNIST data set for multiclass classification.

![11.png](attachment:b39ea53c-1ee4-40df-aafa-abdd97379024.png)

There are various problems you will face while trying to recognise handwritten text using an algorithm, including:

- Noise in the image
- The orientation of the text
- Non-uniformity in the spacing of text
- Non-uniformity in handwriting 


The MNIST data set takes care of some of these problems, as the digits are written in a box. Now the only problem the network needs to handle is the non-uniformity in handwriting. Since the **images in the MNIST data set are 28 X 28 pixels, the input layer has 784 neurons (each neuron takes 1 pixel as an input) and the output layer has 10 neurons (each giving the probability of the input image belonging to any of the 10 classes)**. The image is classified into the class with the highest probability in the output layer. 

 

To revise what we have learnt in this segment, the **softmax function stated above is a general case for multiclass classification**. It is a commonly used output layer activation function for classification. You learnt how to feed input data into an ANN and obtain the output from it. In the next segment, we will move on to defining the building blocks of a neural network, which will help you understand the workings of a neuron and how to build its network.





## Workings of a Single Neuron

In this segment, you will learn how to define the input, the processing of this input and the corresponding output from a single neuron. In the video below, we will be showing you in detail the structure and working of an artificial neuron.

you will see how inputs are fed into a neuron and how outputs are obtained using activation functions, let’s reiterate the concepts with a short summary.
![12.png](attachment:65eee1ac-4221-4ae3-a7bf-864914050714.png)

In the image above, you can see that Equation, Equation and Equation are the inputs, and their weighted sum along with bias is fed into the neuron to give the calculated result as the output.

To summarise, the weights are applied to the inputs respectively, and along with the bias, the cumulative input is fed into the neuron. An activation function is then applied on the cumulative input to obtain the output of the neuron. We have seen some of the activation functions such as softmax and sigmoid in the previous segment. We will explore other types of activation functions in the next segment. These functions apply non-linearity to the cumulative input to enable the neural network to identify complex non-linear patterns present in the data set.


An in-depth representation of the cumulative input as the output is given below.

![13.png](attachment:7e22f4ae-4801-4b9e-aff2-4057c6b801db.png)


![14.png](attachment:43e9949d-0ded-48c3-98df-36b4a8be7fd6.png)

In the image above, z is the cumulative input. You can see how the weights affect the inputs depending on their magnitudes. Also, z is the dot product of the weights and inputs plus the bias.

![15.png](attachment:3cce4c23-ba68-4e83-8087-0e3eec816543.png)

In this segment, you saw how a neuron takes an input and performs some operations on it to give the output. The output is obtained through an activation function. In the next segment, we will explore some popular activation functions.


## Different Activation Functions

As mentioned in one of the previous segments, in the case of ANNs, the activation functions are non-linear. In this segment, you will learn about these non-linear activation functions. 

The image provided below shows the graphical representation of a linear function and one of the possible representations of a non-linear function.

![16.png](attachment:5ae42669-4c48-4fca-a922-b89f758bab36.png)


The activation functions introduce non-linearity in the network, thereby enabling the network to solve highly complex problems. Problems that take the help of neural networks require the ANN to recognise complex patterns and trends in the given data set. If we do not introduce non-linearity, the output will be a linear function of the input vector. This will not help us in understanding more complex patterns present in the data set. 

 

For example, as we can see in the image below, we sometimes have data in non-linear shapes such as circular or elliptical. If you want to classify the two circles into two groups, a linear model will not be able to do this, but a neural network with multiple neurons and non-linear activation functions can help you achieve this.

![17.png](attachment:348f9989-3c02-49b3-ae46-3673349d39cd.png)

Let’s learn about the various types and properties of common activation functions and understand how to choose the correct activation function.

While choosing activation functions, you need to ensure that they are:

- Non-linear,
- Continuous, and
- Monotonically increasing.
  
The different commonly used activation functions are represented below.

![18.png](attachment:9e1ee779-fe5a-435f-8403-84be80f731bf.png)

The features of these activation functions are as follows:

<ol><li><strong>Sigmoid</strong>: When this type of function is applied, the output from the activation function is bound between 0 and 1 and is not centred around zero. A sigmoid activation function is usually used when we want to regularise the magnitude of the outputs we get from a neural network and ensure that this magnitude does not blow up.</li><li><strong>Tanh (Hyperbolic Tangent)</strong>: When this type of function is applied, the output is centred around 0 and bound between -1 and 1, unlike a sigmoid function in which case, it is centred around 0.5 and will give only positive outputs. Hence, the output is centred around zero for tanh.&nbsp;</li><li><strong>ReLU (Rectified Linear Unit)</strong>: The output of this activation function is linear in nature when the input is positive and the output is zero when the input is negative. This activation function allows the network to converge very quickly, and hence, its usage is computationally efficient. However, its use in neural networks does not help the network to learn when the values are negative.</li><li><strong>Leaky ReLU (Leaky Rectified Linear Unit)</strong>: This activation function is similar to ReLU. However, it enables the neural network to learn even when the values are negative. When the input to the function is negative, it dampens the magnitude, i.e., the input is multiplied with an epsilon factor that is usually a number less than one. On the other hand, when the input is positive, the function is linear and gives the input value as the output. We can control the parameter to allow how much ‘learning emphasis’ should be given to the negative value.</li></ol>

In the next video, you will learn how to compute the output of a neuron, given the inputs, weights, biases and the **sigmoid activation function**.

![19.png](attachment:6321bf03-0463-4ba2-bb11-6d79f85569af.png)

Now that you have an idea of how to compute the output of a neuron using an activation function, try to answer the questions given below.
 

Having explored the key components in building the architecture of ANNs, let's now understand how neural networks are trained and used to make predictions. In the next segment, you will learn about the hyperparameters and parameters of neural networks.

Before you proceed further, spend some time answering the question next.
 

## Parameters and Hyperparameters of Neural Network
 

Neural networks require rigorous training, but what does it mean to train neural networks? What are the parameters that the network learns during training, and what are the hyperparameters that you (as the network is designed) need to specify beforehand?

 

**Recall that models such as linear regression and logistic regression are trained on their coefficients, i.e., the task is to find the optimal values of the coefficients to minimize a cost function.**

 
**Neural networks are no different; they are trained on weights and biases.**

 

In this segment, you will be introduced to the parameters that are learned during neural network training. You will also develop a broad understanding of how the learning algorithm works. Let’s get started by watching the upcoming video.


![20.png](attachment:81f4323b-94c0-4fa4-9ee0-091dcac9141e.png)

<div class="MuiBox-root css-1bi8ut6"><div class="text_component" data-testid="text-component"><p>&nbsp;During training, the neural network learning algorithm fits various models to the training data and selects the best prediction model. The learning algorithm is trained with a fixed set of <strong>hyperparameters </strong>associated with the network structure. Some of the important hyperparameters to consider to decide the network structure are given below:</p><ul><li>Number of layers</li><li>Number of neurons in the input, hidden and output layers</li><li>Learning rate (the step size taken each time we update the weights and biases of an ANN)</li><li>Number of epochs (the number of times the entire training data set passes through the neural network)</li></ul><p>The purpose of training the learning algorithm is to obtain optimum weights and biases that form the <strong>parameters</strong> of the network.</p><p><strong>Note</strong>: You will learn about hyperparameters such as learning rate and the number of epochs in the subsequent session. In this session, we will focus on the number of layers and the number of neurons in each layer.</p><p>The notations that you will come across going forward are as follows:</p><ol><li><strong>W</strong> represents the weight matrix.</li><li><strong>b</strong> stands for bias.</li><li><strong>x</strong> represents the input.</li><li><strong>y</strong> represents the ground truth label.</li><li><strong>p</strong> represents the probability vector of the predicted output for the classification problem.&nbsp;<img alt="Equation" data-latex="h^{L}" src="https://latex.upgrad.com/render?formula=h%5E%7BL%7D" style="vertical-align: middle;display: inline;"> represents the predicted output for the regression problem (where L represents the number of layers).&nbsp;</li><li><strong>h</strong> also represents the output of the hidden layers with appropriate superscript. The output of the <strong>second neuron in the nth hidden layer</strong> is denoted by&nbsp;&nbsp;<img alt="Equation" data-latex="h^{n}_{2}" src="https://latex.upgrad.com/render?formula=h%5E%7Bn%7D_%7B2%7D" style="vertical-align: middle;display: inline;">.</li><li><strong>z</strong> represents the accumulated input to a layer. The accumulated input to the <strong>third neuron of the nth hidden layer</strong> is <img alt="Equation" data-latex="z^{n}_{3}" src="https://latex.upgrad.com/render?formula=z%5E%7Bn%7D_%7B3%7D" style="vertical-align: middle;display: inline;">.</li><li>The bias of the <strong>first neuron of the third layer</strong> is represented as&nbsp; <img alt="Equation" data-latex="b^{3}_{1}" src="https://latex.upgrad.com/render?formula=b%5E%7B3%7D_%7B1%7D" style="vertical-align: middle;display: inline;">.</li><li>The superscript represents the layer number. The weight matrix <strong>connecting the first hidden layer to the second hidden layer</strong> is denoted by <img alt="Equation" data-latex="W^{2}" src="https://latex.upgrad.com/render?formula=W%5E%7B2%7D" style="vertical-align: middle;display: inline;">.</li><li>The subscript represents the index of the individual neuron in a given layer. The weight <strong>connecting the first neuron of the first hidden layer to the third neuron of the second hidden layer</strong> is denoted by <img alt="Equation" data-latex="w^{2}_{31}" src="https://latex.upgrad.com/render?formula=w%5E%7B2%7D_%7B31%7D" style="vertical-align: middle;display: inline;">.</li></ol><p>Having understood these notations, let’s reinforce by answering the questions given below.<br>&nbsp;</p><p style="text-align: center;"><img class="image-editor" maxwidth="100%" src="https://d35ev2v1xsdze0.cloudfront.net/f1ee6d66-adab-46ee-8f1c-bc64d13cca8e-jaifm7pp.png"></p><p>You might want to look at how the inputs of the first data point&nbsp;<img alt="Equation" data-latex="x_{1}" src="https://latex.upgrad.com/render?formula=x_%7B1%7D" style="vertical-align: middle;display: inline;"> are represented. This will help you in answering the questions.</p><p>So far, you have come across simple neural networks and have computed the outputs for them, but this is not the case with real-world applications. At times, the neural networks can be highly complex and large. Therefore, you will need some assumptions to make them easier to understand. You will learn about these assumptions in the next segment.<br><br>Before you proceed further, spend some time answering the question next.<br>&nbsp;</p></div></div>


## Assumptions for Simplifying Neural Network

Since large neural networks can potentially have extremely complex structures, certain assumptions are made to simplify the way in which information flows in them. 



![22.png](attachment:d5c837d7-39d4-463e-8ced-05b3c1a956e7.png)

![23.png](attachment:fab710f4-e9b0-4c50-8eb2-9c37f5b51d12.png)




![21.png](attachment:e7ce09ab-c308-4ee6-af55-a82244d63c60.png)


To summarise, commonly used neural network architectures make the following simplifying assumptions:

<ol><li>The neurons in an ANN are <strong>arranged in layers</strong>, and these layers are arranged <strong>sequentially.</strong></li><li>The neurons within the same layer <strong>do not interact</strong> with each other.</li><li>The inputs are fed into the network through the <strong>input layer</strong>, and the outputs are sent out from the <strong>output layer.</strong></li><li>Neurons in<strong> consecutive layers </strong>are <strong>densely connected</strong>, i.e., all neurons in layer l are connected to all neurons in layer l+1.</li><li>Every neuron in the neural network has a<strong> bias</strong> value associated with it, and each interconnection has a <strong>weight </strong>associated with it.</li><li>All neurons in a particular hidden layer use the <strong>same activation function</strong>. Different hidden layers can use different activation functions, but in a hidden layer, all neurons use the same activation function.</li></ol>

This brings us to the end of this session. In the next segment, we will quickly summarise what you learnt in this session.

## Summary

<div class="MuiBox-root css-ab8yd1" data-testid="drawer-container"><main drawerposition="left" data-testid="drawer-main-content" class="css-mpncgp"><div class="MuiBox-root css-177wn66"><div class="MuiBox-root css-i97fac"></div><div class="MuiBox-root css-1ryrkgo"><div data-resource-woolf="67fce98ddd9f9d84a41a29de_C67d9b50497a7433b5120443a_V1"><div class="MuiBox-root css-1hqgrfy-fullWidthWithAssetNameContainer"><div class="MuiBox-root css-qgb4hw"><div class="MuiBox-root css-1xzog2f" id="switch-player-content"></div><div class="MuiBox-root css-j7qwjs" data-testid="switch-player"><div class="MuiBox-root css-lrle2m-container" data-testid="online-editor-player"><div class="text_component" data-testid="online-editor-content"><p dir="ltr">Let’s take a quick look at what you have learnt in this session.</p><p dir="ltr">Some important points can be summarised as follows:</p><ol><li aria-level="1" dir="ltr"><p dir="ltr" role="presentation">Firstly, you understood the limitations of preliminary machine learning and how deep learning can be used to build complex models.</p></li><li aria-level="1" dir="ltr"><p dir="ltr" role="presentation">Next, you saw how the architecture of ANNs draws inspiration from the human brain.</p></li><li aria-level="1" dir="ltr"><p dir="ltr" role="presentation">You also learnt about the basic functioning of a perceptron.</p></li><li aria-level="1" dir="ltr"><p dir="ltr" role="presentation">Further, you learnt about the basic building block of ANNs: Neurons. The structure of an artificial neuron is shown below.<br>&nbsp;</p></li></ol><p dir="ltr" style="text-align: center;"><img data-height="252" data-width="490" height="252" src="https://images.upgrad.com/fc71715a-6e27-407b-a4f3-7ce52760acf8-NN_structure.png" width="490"></p><p dir="ltr">Here, ‘a’ represents the inputs, ‘w’ represents the weights associated with the inputs, and ‘b’ represents the bias of the neuron.</p><ol start="5"><li aria-level="1" dir="ltr"><p dir="ltr" role="presentation">You then learnt about the architecture of ANNs, including the topology, the parameters (weights and biases) on which the neural network is trained and the hyperparameters.</p></li><li aria-level="1" dir="ltr"><p dir="ltr" role="presentation">ANNs only take numerical inputs. Hence, you need to convert all types of data into a numeric format so that neural networks can process it.</p></li><li aria-level="1" dir="ltr"><p dir="ltr" role="presentation">Next, you were introduced to the most common activation functions such as sigmoid, ReLU, Leaky ReLU and tanh, which are shown below.<br>&nbsp;</p></li></ol><p dir="ltr" role="presentation" style="text-align: center;"><img class="image-editor" maxwidth="100%" src="https://d35ev2v1xsdze0.cloudfront.net/bc645ea0-abdd-4b9c-9405-3b46d51de4f5-a2hwrx8b.png"></p><ol start="8"><li aria-level="1" dir="ltr"><p dir="ltr" role="presentation">Some simplifying assumptions in the architecture of ANNs are as follows.</p><ol><li aria-level="2" dir="ltr"><p dir="ltr" role="presentation">The neurons in an ANN are <strong>arranged in layers</strong>, and these layers are arranged <strong>sequentially</strong>.</p></li><li aria-level="2" dir="ltr"><p dir="ltr" role="presentation">The neurons within the same layer <strong>do not interact</strong> with each other.</p></li><li aria-level="2" dir="ltr"><p dir="ltr" role="presentation">The inputs are fed into the network through the <strong>input layer</strong>, and the outputs are sent out from the <strong>output layer</strong>.</p></li><li aria-level="2" dir="ltr"><p dir="ltr" role="presentation">Neurons in <strong>consecutive layers</strong> are <strong>densely connected</strong>, i.e., all neurons in layer l are connected to all neurons in layer l+1.</p></li><li aria-level="2" dir="ltr"><p dir="ltr" role="presentation">Every neuron in the neural network has a <strong>bias</strong> value associated with it, and each interconnection has a <strong>weight</strong> associated with it.</p></li><li aria-level="2" dir="ltr"><p dir="ltr" role="presentation">All neurons in a particular hidden layer use the <strong>same activation function</strong>.<br>&nbsp;</p></li></ol></li><li aria-level="1" dir="ltr"><p dir="ltr" role="presentation">Finally, you fixed the following notations:</p><ol><li aria-level="2" dir="ltr"><p dir="ltr" role="presentation">W represents the weight matrix.</p></li><li aria-level="2" dir="ltr"><p dir="ltr" role="presentation">b stands for bias.</p></li><li aria-level="2" dir="ltr"><p dir="ltr" role="presentation">x represents input.</p></li><li aria-level="2" dir="ltr"><p dir="ltr" role="presentation">y represents the ground truth label.</p></li><li aria-level="2" dir="ltr"><p dir="ltr" role="presentation">p represents the probability vector of the predicted output for the classification problem.</p></li><li aria-level="2" dir="ltr"><p dir="ltr" role="presentation">h represents the output of the hidden layers, and&nbsp;<img alt="Equation" data-latex="h^{L}" src="https://latex.upgrad.com/render?formula=h%5E%7BL%7D" style="vertical-align: middle;display: inline;"> represents the output prediction for the regression problem.</p></li><li aria-level="2" dir="ltr"><p dir="ltr" role="presentation">z represents the cumulative input fed into each neuron of a layer.</p></li><li aria-level="2" dir="ltr"><p dir="ltr" role="presentation">The superscript represents the layer number.</p></li><li aria-level="2" dir="ltr"><p dir="ltr" role="presentation">The subscript represents the index of each individual neuron in a layer.</p></li></ol></li></ol><p>In the next segment, you will attempt some graded questions to test your understanding of the topics covered in this session. All the best!</p></div></div><div class="MuiBox-root css-0"></div></div></div></div></div></div></div></main><div class="MuiDrawer-root MuiDrawer-docked css-2vnr09" data-testid="drawer-left"><div class="MuiPaper-root MuiPaper-elevation MuiPaper-elevation0 MuiDrawer-paper MuiDrawer-paperAnchorLeft MuiDrawer-paperAnchorDockedLeft css-1bs9ff9-leftDrawer" style="visibility: hidden; transform: translateX(-639px);"><div class="css-9jdz9v"><div class="MuiBox-root css-1rr4qq7"><div class="css-8vnv0e"><p class="MuiTypography-root MuiTypography-body1 css-1t5mpng-h8-bold" data-testid="ira-typography">Notes</p><div class="MuiBox-root css-adzu6j"><svg class="MuiSvgIcon-root MuiSvgIcon-fontSizeMedium css-6eiy26" focusable="false" aria-hidden="true" viewBox="0 0 32 32" aria-label="Trainer will reply to your queries post the session"><g clip-path="url(#clip0_103_1405)"><path d="M9.4541 9.4541L22.545 22.545M22.545 9.4541L9.4541 22.545" stroke="#2C3E50" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"></path></g><defs><clipPath id="clip0_103_1405"><rect width="32" height="32" fill="white"></rect></clipPath></defs></svg></div></div><div class="MuiDivider-root MuiDivider-fullWidth css-1q2aaa7" role="separator"></div><div class="MuiBox-root css-8hmz09"></div></div><div class="MuiDivider-root MuiDivider-fullWidth css-aiec70" role="separator"></div><div class="MuiBox-root css-iv33n6"><form data-testid="ira-form"><div class="css-40u8vs-root" data-testid="ira-text-area-subject"><textarea rows="1" id="subject" aria-label="maximum height" placeholder="Type a new message" data-testid="ira-text-area-input-subject" name="subject" style="height: 133px; overflow: hidden;"></textarea><textarea aria-hidden="true" readonly="" tabindex="-1" style="visibility: hidden; position: absolute; overflow: hidden; height: 0px; top: 0px; left: 0px; transform: translateZ(0px); padding: 0px; width: 226px;"></textarea><p class="MuiTypography-root MuiTypography-body1 css-19ohxfv" data-testid="ira-text-area-error-subject"></p></div><div class="MuiFormControl-root MuiFormControl-fullWidth css-tzsjye" style="display: none;"><div class="MuiFormControl-root MuiTextField-root css-13quz26-root" data-testid="ira-text-field-id"><label class="MuiFormLabel-root MuiInputLabel-root MuiInputLabel-formControl MuiInputLabel-animated MuiInputLabel-outlined MuiFormLabel-colorPrimary MuiInputLabel-root MuiInputLabel-formControl MuiInputLabel-animated MuiInputLabel-outlined css-1cwqx25" data-shrink="false" for=":r15:" id=":r15:-label">note Id</label><div class="MuiInputBase-root MuiOutlinedInput-root MuiInputBase-colorPrimary MuiInputBase-formControl css-12x8yxb"><input aria-invalid="false" autocomplete="off" id=":r15:" name="id" type="text" class="MuiInputBase-input MuiOutlinedInput-input css-1x5jdmq" value=""><fieldset aria-hidden="true" class="MuiOutlinedInput-notchedOutline css-igs3ac"><legend class="css-yjsfm1"><span>note Id</span></legend></fieldset></div></div></div><div class="MuiBox-root css-1yz5gze" data-testid="ira-form-action-buttons-divider"></div><div class="MuiBox-root css-3oohm3" data-testid="ira-form-action-buttons"><div class="MuiBox-root css-4s95f4"><div class="MuiBox-root css-0"><button class="MuiButtonBase-root MuiButton-root MuiButton-text MuiButton-textPrimary MuiButton-sizeMedium MuiButton-textSizeMedium MuiButton-root MuiButton-text MuiButton-textPrimary MuiButton-sizeMedium MuiButton-textSizeMedium css-3xhnhd" tabindex="0" type="submit" data-testid="button">Save</button></div></div></div></form></div></div></div></div></div>

## Graded Questions

![24.png](attachment:e8df9faa-079d-4109-9512-7a686e3076d5.png)

# Introduction to Feedforward Neural Network

Welcome to the second session on Feedforward Neural Networks. 

 

In the previous session, you understood the architecture of neural networks and how it was inspired by the structure of the human brain. You also learnt about the working of an artificial neuron, the hyperparameters and parameters of neural networks and various simplifying assumptions.

 

In this session, you will learn how information flows in a neural network from the input layer to the output layer to enable the neural network to make a prediction. The information flow in this direction is often called feedforward. You will also learn how to assess the performance of a neural network.
 

### In this session
The following topics will be covered:

- Information flow from the input layer to the output layer
- Regression and classification feedforward methods
- Working of neural networks
- Loss function
 

### Prerequisites
As the main prerequisites for this session, you must have a basic understanding of the concepts of vectors, matrix multiplication, derivatives and partial derivatives and must have completed the previous courses on Statistics and ML.

## Flow of Information Between Layers

In the previous session, you learnt about the structure, topology, and hyperparameters of neural networks along with some simplifying assumptions of neural networks. In this segment, you will understand how information flows from one layer to the next one in a neural network.  

In artificial neural networks, the output from one layer is used as input to the next layer. Such networks are called **feedforward neural networks**. This means that there are no loops in the network, i.e., information is always fed forward, never fed backward. Let’s start by understanding the feedforward mechanism between the two layers. For simplicity, in the next video, the professor will use the input and the first layer to demonstrate how information flows between any two layers.

<div class="text_component" data-testid="text-component"><p style="text-align: center;">As seen in the video, an image of a subset of the neural network is shown below:<br><br><br><img class="image-editor" maxwidth="100%" src="https://d35ev2v1xsdze0.cloudfront.net/3f330a8a-c58a-4e92-ab80-7e18ca42f18f-92ookk12.png"></p><p>As you learnt in the previous session, the weight matrix between layer 0 (input layer) and layer 1 (the first hidden layer) is denoted by <img alt="Equation" data-latex="W" src="https://latex.upgrad.com/render?formula=W" style="vertical-align: middle;display: inline;">. The dot product between the matrix&nbsp;<img alt="Equation" data-latex="W" src="https://latex.upgrad.com/render?formula=W" style="vertical-align: middle;display: inline;"> and the input vector&nbsp;<img alt="Equation" data-latex="x_{i}" src="https://latex.upgrad.com/render?formula=x_%7Bi%7D" style="vertical-align: middle;display: inline;"> along with the bias vector&nbsp;<img alt="Equation" data-latex="b" src="https://latex.upgrad.com/render?formula=b" style="vertical-align: middle;display: inline;">, i.e.,&nbsp;<img alt="Equation" data-latex="W.x_{i}+b" src="https://latex.upgrad.com/render?formula=W.x_%7Bi%7D%2Bb" style="vertical-align: middle;display: inline;"><img alt="Equation" data-latex="W.x_{i}+b" src="https://latex.upgrad.com/render?formula=W.x_%7Bi%7D%2Bb"> acts as the cumulative input&nbsp;<img alt="Equation" data-latex="z" src="https://latex.upgrad.com/render?formula=z" style="vertical-align: middle;display: inline;"> to layer 1. The activation function is applied to this cumulative input&nbsp;<img alt="Equation" data-latex="z" src="https://latex.upgrad.com/render?formula=z" style="vertical-align: middle;display: inline;"> to compute the output&nbsp;<img alt="Equation" data-latex="h" src="https://latex.upgrad.com/render?formula=h" style="vertical-align: middle;display: inline;"> of layer 1.&nbsp;</p><p>Let’s take the above-mentioned example and perform matrix multiplication to get a vectorised method to compute the output of layer 1 from the inputs of layer 0.</p><p>Here, the following input is given:</p><p style="text-align: center;"><img alt="Equation" data-latex="x^{i}=\begin{bmatrix}
x_{1}\\ 
x_{2}\\ 
x_{3}
\end{bmatrix}" src="https://latex.upgrad.com/render?formula=x%5E%7Bi%7D%3D%5Cbegin%7Bbmatrix%7D%0Ax_%7B1%7D%5C%5C%20%0Ax_%7B2%7D%5C%5C%20%0Ax_%7B3%7D%0A%5Cend%7Bbmatrix%7D"></p><p>The dimensions of the input are (3,1).<br>There are two neurons in the first hidden layer. Hence, the cumulative input <img alt="Equation" data-latex="z^{1}" src="https://latex.upgrad.com/render?formula=z%5E%7B1%7D" style="vertical-align: middle;display: inline;"> will be given as:</p><p style="text-align: center;"><img alt="Equation" data-latex="z^{1}=\begin{bmatrix}
z_{1}^{1}\\ 
z_{2}^{1}
\end{bmatrix}" src="https://latex.upgrad.com/render?formula=z%5E%7B1%7D%3D%5Cbegin%7Bbmatrix%7D%0Az_%7B1%7D%5E%7B1%7D%5C%5C%20%0Az_%7B2%7D%5E%7B1%7D%0A%5Cend%7Bbmatrix%7D"></p><p>Also, the weight matrix will be of dimension 2x3 and is represented as follows:</p><p style="text-align: center;"><img alt="Equation" data-latex="W^{1}= \begin{bmatrix}
w_{11}^{1}&amp;w_{12}^{1}  &amp;w_{13}^{1} \\ 
w_{21}^{1} &amp; w_{22}^{1} &amp;w_{23}^{1} 
\end{bmatrix}" src="https://latex.upgrad.com/render?formula=W%5E%7B1%7D%3D%20%5Cbegin%7Bbmatrix%7D%0Aw_%7B11%7D%5E%7B1%7D%26w_%7B12%7D%5E%7B1%7D%20%20%26w_%7B13%7D%5E%7B1%7D%20%5C%5C%20%0Aw_%7B21%7D%5E%7B1%7D%20%26%20w_%7B22%7D%5E%7B1%7D%20%26w_%7B23%7D%5E%7B1%7D%20%0A%5Cend%7Bbmatrix%7D"></p><p><strong>NOTE:</strong> The notation of a neuron's weight in a particular layer is represented as:</p><p style="text-align: center;"><img data-height="314" data-width="314" height="314" src="https://images.upgrad.com/5cdbcd44-0058-4945-b15c-9a98e1da7927-notation.png" width="314"></p><p>And, the bias vector can be represented as follows:&nbsp;</p><p style="text-align: center;"><img alt="Equation" data-latex="b^{1}= \begin{bmatrix}
b_{1}^{1}\\ 
b_{2}^{1}
\end{bmatrix}" src="https://latex.upgrad.com/render?formula=b%5E%7B1%7D%3D%20%5Cbegin%7Bbmatrix%7D%0Ab_%7B1%7D%5E%7B1%7D%5C%5C%20%0Ab_%7B2%7D%5E%7B1%7D%0A%5Cend%7Bbmatrix%7D"></p><p>The matrix representation of obtaining <img alt="Equation" data-latex="z_{1}^{1}" src="https://latex.upgrad.com/render?formula=z_%7B1%7D%5E%7B1%7D" style="vertical-align: middle;display: inline;"> is given below.&nbsp;</p><p style="text-align: center;"><img class="image-editor" maxwidth="100%" src="https://d35ev2v1xsdze0.cloudfront.net/3b22f6a3-e085-4ef3-8019-e9b57816b5d7-prlm3zln.png"></p><p style="text-align: center;"><img alt="Equation" data-latex="z_{1}^{1}=w_{11}x_{1}+w_{12}x_{2}+w_{13}x_{3}+b_{1}" src="https://latex.upgrad.com/render?formula=z_%7B1%7D%5E%7B1%7D%3Dw_%7B11%7Dx_%7B1%7D%2Bw_%7B12%7Dx_%7B2%7D%2Bw_%7B13%7Dx_%7B3%7D%2Bb_%7B1%7D"></p><p>Here, <img alt="Equation" data-latex="z_{1}^{1}" src="https://latex.upgrad.com/render?formula=z_%7B1%7D%5E%7B1%7D" style="vertical-align: middle;display: inline;"> is obtained by taking a dot product of the input vector and the corresponding weights. The same goes for obtaining the value of&nbsp;<img alt="Equation" data-latex="z_{2}^{1}" src="https://latex.upgrad.com/render?formula=z_%7B2%7D%5E%7B1%7D" style="vertical-align: middle;display: inline;">. Hence, we get:</p><p style="text-align: center;"><img alt="Equation" data-latex="z_{2}^{1}=w_{21}x_{1}+w_{22}x_{2}+w_{23}x_{3}+b_{2}" src="https://latex.upgrad.com/render?formula=z_%7B2%7D%5E%7B1%7D%3Dw_%7B21%7Dx_%7B1%7D%2Bw_%7B22%7Dx_%7B2%7D%2Bw_%7B23%7Dx_%7B3%7D%2Bb_%7B2%7D"></p><p>The two equations can be written as a matrix multiplication as given below:</p><p style="text-align: center;"><img alt="Equation" data-latex="\begin{bmatrix}
z_{1}^{1}\\ 
z_{2}^{1}
\end{bmatrix} = 
\begin{bmatrix}
w_{11}^{1} &amp;w_{12}^{1}  &amp; w_{13}^{1}\\ 
 w_{21}^{1}&amp;w_{22}^{1}  &amp; w_{23}^{1}
\end{bmatrix} 
\begin{bmatrix}
x_{1}\\ 
x_{2}\\
x_{3}
\end{bmatrix}
+
\begin{bmatrix}
b_{1}^{1}\\ 
b_{2}^{1}
\end{bmatrix} 
=
\begin{bmatrix}
w_{11}^{1}x_{1}+w_{12}^{1}x_{2}+w_{13}^{1}x_{3}+b_{1}^{1}\\ 
w_{21}^{1}x_{1}+w_{22}^{1}x_{2}+w_{23}^{3}x_{3}+b_{2}^{1}
\end{bmatrix}" src="https://latex.upgrad.com/render?formula=%5Cbegin%7Bbmatrix%7D%0Az_%7B1%7D%5E%7B1%7D%5C%5C%20%0Az_%7B2%7D%5E%7B1%7D%0A%5Cend%7Bbmatrix%7D%20%3D%20%0A%5Cbegin%7Bbmatrix%7D%0Aw_%7B11%7D%5E%7B1%7D%20%26w_%7B12%7D%5E%7B1%7D%20%20%26%20w_%7B13%7D%5E%7B1%7D%5C%5C%20%0A%20w_%7B21%7D%5E%7B1%7D%26w_%7B22%7D%5E%7B1%7D%20%20%26%20w_%7B23%7D%5E%7B1%7D%0A%5Cend%7Bbmatrix%7D%20%0A%5Cbegin%7Bbmatrix%7D%0Ax_%7B1%7D%5C%5C%20%0Ax_%7B2%7D%5C%5C%0Ax_%7B3%7D%0A%5Cend%7Bbmatrix%7D%0A%2B%0A%5Cbegin%7Bbmatrix%7D%0Ab_%7B1%7D%5E%7B1%7D%5C%5C%20%0Ab_%7B2%7D%5E%7B1%7D%0A%5Cend%7Bbmatrix%7D%20%0A%3D%0A%5Cbegin%7Bbmatrix%7D%0Aw_%7B11%7D%5E%7B1%7Dx_%7B1%7D%2Bw_%7B12%7D%5E%7B1%7Dx_%7B2%7D%2Bw_%7B13%7D%5E%7B1%7Dx_%7B3%7D%2Bb_%7B1%7D%5E%7B1%7D%5C%5C%20%0Aw_%7B21%7D%5E%7B1%7Dx_%7B1%7D%2Bw_%7B22%7D%5E%7B1%7Dx_%7B2%7D%2Bw_%7B23%7D%5E%7B3%7Dx_%7B3%7D%2Bb_%7B2%7D%5E%7B1%7D%0A%5Cend%7Bbmatrix%7D"></p><p>The next step is to apply the activation function to the <img alt="Equation" data-latex="z^{1}" src="https://latex.upgrad.com/render?formula=z%5E%7B1%7D" style="vertical-align: middle;display: inline;"> vector to obtain the output <img alt="Equation" data-latex="h^{1}" src="https://latex.upgrad.com/render?formula=h%5E%7B1%7D" style="vertical-align: middle;display: inline;">. As mentioned in the video, the activation function is applied to each element of the vector. Thus, the final output <img alt="Equation" data-latex="h^{1}" src="https://latex.upgrad.com/render?formula=h%5E%7B1%7D" style="vertical-align: middle;display: inline;"> of layer 1 is:</p><p style="text-align: center;"><img alt="Equation" data-latex="h^{1}=\begin{bmatrix}
h_{1}^{1}\\ 
h_{2}^{1}
\end{bmatrix} = 
\sigma (W^{1}.x^{1}+b^{1})
=
\begin{bmatrix}
\sigma (w_{11}^{1}x_{1}+w_{12}^{1}x_{2}+w_{13}^{1}x_{3}+b_{1}^{1})\\ 
\sigma (w_{21}^{1}x_{1}+w_{22}^{1}x_{2}+w_{23}^{3}x_{3}+b_{2}^{1})
\end{bmatrix}" src="https://latex.upgrad.com/render?formula=h%5E%7B1%7D%3D%5Cbegin%7Bbmatrix%7D%0Ah_%7B1%7D%5E%7B1%7D%5C%5C%20%0Ah_%7B2%7D%5E%7B1%7D%0A%5Cend%7Bbmatrix%7D%20%3D%20%0A%5Csigma%20(W%5E%7B1%7D.x%5E%7B1%7D%2Bb%5E%7B1%7D)%0A%3D%0A%5Cbegin%7Bbmatrix%7D%0A%5Csigma%20(w_%7B11%7D%5E%7B1%7Dx_%7B1%7D%2Bw_%7B12%7D%5E%7B1%7Dx_%7B2%7D%2Bw_%7B13%7D%5E%7B1%7Dx_%7B3%7D%2Bb_%7B1%7D%5E%7B1%7D)%5C%5C%20%0A%5Csigma%20(w_%7B21%7D%5E%7B1%7Dx_%7B1%7D%2Bw_%7B22%7D%5E%7B1%7Dx_%7B2%7D%2Bw_%7B23%7D%5E%7B3%7Dx_%7B3%7D%2Bb_%7B2%7D%5E%7B1%7D)%0A%5Cend%7Bbmatrix%7D"></p><p>As Gunnvant mentioned,&nbsp;<img alt="Equation" data-latex="x" src="https://latex.upgrad.com/render?formula=x" style="vertical-align: middle;display: inline;"> is a vector function, i.e., it is applied element-wise to a vector.</p><p>This completes the forward propagation of a single data point through one layer of the network.</p><p><br>To summarise, the steps involved in computing the output of the&nbsp;<img alt="Equation" data-latex="i^{th}" src="https://latex.upgrad.com/render?formula=i%5E%7Bth%7D" style="vertical-align: middle;display: inline;"> neuron in layer&nbsp;<img alt="Equation" data-latex="l" src="https://latex.upgrad.com/render?formula=l" style="vertical-align: middle;display: inline;">&nbsp;is as follows:</p><ul><li>Multiply each row of the weight matrix with the output from the previous layer to obtain the weighted sum of inputs from the previous layer.</li><li>Convert the weighted sum into the cumulative input by adding the bias vector.</li><li>Apply the activation function <img alt="Equation" data-latex="\sigma (x)" src="https://latex.upgrad.com/render?formula=%5Csigma%20%28x%29" style="vertical-align: middle;display: inline;">&nbsp;to the cumulative input to obtain the output vector&nbsp;<img alt="Equation" data-latex="h" src="https://latex.upgrad.com/render?formula=h" style="vertical-align: middle;display: inline;">.&nbsp;</li></ul><p>With this premise, let’s study feedforward in a small neural network in the next segment.<br>&nbsp;</p><p>Before you proceed further, spend some time answering the question next.<br>&nbsp;</p></div>


## Questions

![25.png](attachment:4e87dbea-40d2-4cc2-be3e-59203ab8ec0f.png)

## Forward Pass - Demonstration

In the previous segment, you saw how the output of the next layer is calculated, given the inputs from the previous layer. In this segment, you will learn about the flow of data through different layers in a step-by-step fashion using an example in which we intend to calculate the price of a house, given its size and the number of rooms in it. You may want to use pen and paper to do the calculations yourself for better understanding.

<div class="MuiBox-root css-1bi8ut6"><div class="text_component" data-testid="text-component"><p dir="ltr">&nbsp;To reiterate, the problem statement is to predict the price of houses, given the size of the houses and the number of rooms available.&nbsp;</p><table align="center" border="1" cellpadding="1" cellspacing="1"><tbody><tr><td><b id="docs-internal-guid-de8afd1c-7fff-6a81-61d6-9922bc3e8bf4">Std. Number of Rooms</b></td><td><b id="docs-internal-guid-221b62d2-7fff-7fe8-778e-4b1d1cdd14a2">Std. House Size (sq. ft.)</b></td><td><b id="docs-internal-guid-69f7bdf0-7fff-9a49-8e16-f049acd90883">Price($)</b></td></tr><tr><td>3</td><td><p>1,340</p></td><td><p>313,000</p></td></tr><tr><td>5</td><td>3,650</td><td><p>2,384,000</p></td></tr><tr><td>3</td><td>1,930</td><td><p>342,000</p></td></tr><tr><td>3</td><td>2,000</td><td><p>420,000</p></td></tr><tr><td>4</td><td>1,940</td><td><p>550,000</p></td></tr><tr><td>2</td><td>880</td><td>490,000</td></tr></tbody></table><p>In this case, we first scale the input and output for these 6 observations using the formula&nbsp;<img alt="Equation" data-latex="\frac{(obs-mean)}{std. deviation}" src="https://latex.upgrad.com/render?formula=%5Cfrac%7B(obs-mean)%7D%7Bstd.%20deviation%7D">. So, we get the table given below.</p>





<table align="center" border="1" cellpadding="1" cellspacing="1"><tbody><tr><td><b id="docs-internal-guid-0b64f2c0-7fff-757a-692e-1dd785795876">Std. Number of Rooms</b></td><td><b id="docs-internal-guid-1fc304ad-7fff-8721-47fe-ced887d2668d">Std. House Size (sq. ft.)</b></td><td><b id="docs-internal-guid-9c527696-7fff-98cc-4a74-5067b9c28b68">Price ($)</b></td></tr><tr><td>-0.32</td><td>-0.66</td><td>-0.54</td></tr><tr><td>1.61</td><td>1.80</td><td>2.03</td></tr><tr><td>-0.32</td><td>-0.03</td><td>-0.51</td></tr><tr><td>-0.32</td><td>-0.03</td><td>-0.41</td></tr><tr><td>0.65</td><td>-0.02</td><td>-0.25</td></tr><tr><td>-1.29</td><td>-1.15</td><td>-0.32</td></tr></tbody></table>
    
<p>As you saw in the video, we want to build a neural network that will predict the price of a house, given two input attributes: number of rooms and house size. Let’s start with the structure of the neural network that we will consider for this case. We have an input layer with two input nodes, <img alt="Equation" data-latex="x_{1}" src="https://latex.upgrad.com/render?formula=x_%7B1%7D" style="vertical-align: middle;display: inline;">&nbsp;and <img alt="Equation" data-latex="x_{2}" src="https://latex.upgrad.com/render?formula=x_%7B2%7D" style="vertical-align: middle;display: inline;">, one hidden layer with two nodes, a sigmoid activation function and finally an output layer with a linear activation function (since this is a regression problem), as shown below.<br>&nbsp;</p><p style="text-align: center;"><b><img data-height="823" data-width="1219" height="405.0861361771944" src="https://images.upgrad.com/cac6cf66-e689-41a2-b8bb-d24615da1003-forward pass.png" width="600"></b></p><p>Now, to understand how the data moves forward in the network to enable the neural network to make predictions, we will initialise the weights and biases with random values. We recommend that you keep a pen and paper handy for practising the computations that will be performed further. The intention is that as this network gets trained, the weights and biases will be updated as per the data such that the predicted output will eventually be the same or at least similar to the actual output.</p><p>Let’s start by initialising the weights and biases to the following values:</p><p><img alt="Equation" data-latex="Layer1:W^{1} = 
\begin{bmatrix}
w_{11}^{1} &amp;w_{12}^{1}  \ 
 w_{21}^{1}&amp;w_{22}^{1} 
\end{bmatrix} \
= \begin{bmatrix}
0.2 &amp;0.15\ 
0.5&amp;0.6
\end{bmatrix} b^{1} = 
\begin{bmatrix}
b_{1}^{1}   \ 
 b_{2}^{1} 
\end{bmatrix} 
=
\begin{bmatrix}
0.1\ 
0.25
\end{bmatrix} 
\\
 Layer 2:\
W^{2}=\begin{bmatrix}
w_{21}^{2}\
w_{22}^{2}
\
\end{bmatrix}
=
\begin{bmatrix}
0.3\ 
0.2
\end{bmatrix}\
 b^{2}=
\begin{bmatrix}
b_{1}^{2}
\end{bmatrix}=
\begin{bmatrix}
0.4
\end{bmatrix}" src="https://latex.upgrad.com/render?formula=Layer1%3AW%5E%7B1%7D%20%3D%20%0A%5Cbegin%7Bbmatrix%7D%0Aw_%7B11%7D%5E%7B1%7D%20%26w_%7B12%7D%5E%7B1%7D%20%20%5C%20%0A%20w_%7B21%7D%5E%7B1%7D%26w_%7B22%7D%5E%7B1%7D%20%0A%5Cend%7Bbmatrix%7D%20%5C%0A%3D%20%5Cbegin%7Bbmatrix%7D%0A0.2%20%260.15%5C%20%0A0.5%260.6%0A%5Cend%7Bbmatrix%7D%20b%5E%7B1%7D%20%3D%20%0A%5Cbegin%7Bbmatrix%7D%0Ab_%7B1%7D%5E%7B1%7D%20%20%20%5C%20%0A%20b_%7B2%7D%5E%7B1%7D%20%0A%5Cend%7Bbmatrix%7D%20%0A%3D%0A%5Cbegin%7Bbmatrix%7D%0A0.1%5C%20%0A0.25%0A%5Cend%7Bbmatrix%7D%20%0A%5C%5C%0A%20Layer%202%3A%5C%0AW%5E%7B2%7D%3D%5Cbegin%7Bbmatrix%7D%0Aw_%7B21%7D%5E%7B2%7D%5C%0Aw_%7B22%7D%5E%7B2%7D%0A%5C%0A%5Cend%7Bbmatrix%7D%0A%3D%0A%5Cbegin%7Bbmatrix%7D%0A0.3%5C%20%0A0.2%0A%5Cend%7Bbmatrix%7D%5C%0A%20b%5E%7B2%7D%3D%0A%5Cbegin%7Bbmatrix%7D%0Ab_%7B1%7D%5E%7B2%7D%0A%5Cend%7Bbmatrix%7D%3D%0A%5Cbegin%7Bbmatrix%7D%0A0.4%0A%5Cend%7Bbmatrix%7D"></p><p><span class="equation-content" data-latex="" data-widget="equation-editor"><span class="mjx-chtml"><span aria-label="" class="mjx-math"><span aria-hidden="true" class="mjx-mrow"></span></span></span></span></p><p>Remember, the superscript denotes the layer to which it belongs and the subscript denotes the node in that particular layer.&nbsp;</p><p>To showcase the step-by-step computation of the output, let’s take the first example as the input vector:&nbsp;<br>&nbsp;</p><p style="text-align: center;"><img alt="Equation" data-latex="X^{1}= 
\begin{bmatrix}
x_{1}\\ 
x_{2}
\end{bmatrix} 
=
\begin{bmatrix}
-0.32\\ 
-0.66
\end{bmatrix}" src="https://latex.upgrad.com/render?formula=X%5E%7B1%7D%3D%20%0A%5Cbegin%7Bbmatrix%7D%0Ax_%7B1%7D%5C%5C%20%0Ax_%7B2%7D%0A%5Cend%7Bbmatrix%7D%20%0A%3D%0A%5Cbegin%7Bbmatrix%7D%0A-0.32%5C%5C%20%0A-0.66%0A%5Cend%7Bbmatrix%7D"></p><p style="text-align: center;"><img data-height="650" data-width="1010" height="386.13861386138615" src="https://images.upgrad.com/42efdb90-399e-4baf-8446-fa4c48b2c209-layer1_node1.png" width="600"></p><p style="text-align: center;"><strong>Layer 1: Node 1</strong></p><p>Let’s compute the output from the first node in layer 1.</p><p>Computing the cumulative input for the first node of the hidden layer:</p><p style="text-align: center;"><img alt="Equation" data-latex="z_{1}^{1}=w_{11}^{1}x_{1}+w_{12}^{1}x_{2}+b_{1}^{1} = 0.2 * (-0.32)+0.15 * (-0.66) + 0.1 = -0.063" src="https://latex.upgrad.com/render?formula=z_%7B1%7D%5E%7B1%7D%3Dw_%7B11%7D%5E%7B1%7Dx_%7B1%7D%2Bw_%7B12%7D%5E%7B1%7Dx_%7B2%7D%2Bb_%7B1%7D%5E%7B1%7D%20%3D%200.2%20*%20(-0.32)%2B0.15%20*%20(-0.66)%20%2B%200.1%20%3D%20-0.063"></p><p>Applying the sigmoid activation function to obtain the output from the first node:</p><p style="text-align: center;"><img alt="Equation" data-latex="h_{1}^{1}=\sigma (-0.063) = \frac{1}{1+e^{-z_{1}^{1}}}= \frac{1}{1+e^{-(-0.063)}}=0.484" src="https://latex.upgrad.com/render?formula=h_%7B1%7D%5E%7B1%7D%3D%5Csigma%20%28-0.063%29%20%3D%20%5Cfrac%7B1%7D%7B1%2Be%5E%7B-z_%7B1%7D%5E%7B1%7D%7D%7D%3D%20%5Cfrac%7B1%7D%7B1%2Be%5E%7B-%28-0.063%29%7D%7D%3D0.484" style="vertical-align: middle;display: inline;"></p><p style="text-align: center;"><img data-height="650" data-width="1010" height="386.13861386138615" src="https://images.upgrad.com/6bd2a503-f948-4182-acf1-639a5b2c653e-layer1_node2.png" width="600"></p><p style="text-align: center;"><strong>Layer 1: Node 2</strong></p><p>Next, let’s compute the output from the second node in layer 1 by following a similar process.</p><p>Computing the cumulative input for the second node of the hidden layer:</p><p style="text-align: center;"><img alt="Equation" data-latex="z_{2}^{1}=w_{21}^{1}x_{1}+w_{22}^{1}x_{2}+b_{2}^{1} = 0.5 * (-0.32)+0.6 * (-0.66) + 0.25 = -0.306" src="https://latex.upgrad.com/render?formula=z_%7B2%7D%5E%7B1%7D%3Dw_%7B21%7D%5E%7B1%7Dx_%7B1%7D%2Bw_%7B22%7D%5E%7B1%7Dx_%7B2%7D%2Bb_%7B2%7D%5E%7B1%7D%20%3D%200.5%20*%20(-0.32)%2B0.6%20*%20(-0.66)%20%2B%200.25%20%3D%20-0.306"></p><p>Applying the sigmoid activation function to get the output from the second node, we get:</p><p style="text-align: center;"><img alt="Equation" data-latex="h_{1}^{1}=\sigma (-0.306) = \frac{1}{1+e^{-z_{2}^{1}}}= \frac{1}{1+e^{-(-0.306)}}=0.424" src="https://latex.upgrad.com/render?formula=h_%7B1%7D%5E%7B1%7D%3D%5Csigma%20(-0.306)%20%3D%20%5Cfrac%7B1%7D%7B1%2Be%5E%7B-z_%7B2%7D%5E%7B1%7D%7D%7D%3D%20%5Cfrac%7B1%7D%7B1%2Be%5E%7B-(-0.306)%7D%7D%3D0.424"></p><p>Each of these individual operations can be done together using <strong>matrix multiplication.&nbsp;</strong><br>We have the input vector <img alt="Equation" data-latex="X^{1}" src="https://latex.upgrad.com/render?formula=X%5E%7B1%7D" style="vertical-align: middle;display: inline;">, the weight matrix&nbsp;<img alt="Equation" data-latex="W^{1}" src="https://latex.upgrad.com/render?formula=W%5E%7B1%7D" style="vertical-align: middle;display: inline;">&nbsp;and the bias vector <img alt="Equation" data-latex="b^{1}" src="https://latex.upgrad.com/render?formula=b%5E%7B1%7D" style="vertical-align: middle;display: inline;">&nbsp;with the following values:</p><p style="text-align: center;"><img alt="Equation" data-latex="X^{1}=\begin{bmatrix} x_{1}\\ x_{2} \end{bmatrix}=\begin{bmatrix} -0.32\\ -0.66 \end{bmatrix}" src="https://latex.upgrad.com/render?formula=X%5E%7B1%7D%3D%5Cbegin%7Bbmatrix%7D%20x_%7B1%7D%5C%5C%20x_%7B2%7D%20%5Cend%7Bbmatrix%7D%3D%5Cbegin%7Bbmatrix%7D%20-0.32%5C%5C%20-0.66%20%5Cend%7Bbmatrix%7D" style="vertical-align: middle;display: inline;"></p><p style="text-align: center;"><img alt="Equation" data-latex="W^{1}=\begin{bmatrix} w_{11}^{1}&amp;w_{12}^{1}\\ w_{21}^{1}&amp;w_{22}^{1} \end{bmatrix}=\begin{bmatrix} 0.2&amp;0.15\\ 0.5&amp;0.6 \end{bmatrix}" src="https://latex.upgrad.com/render?formula=W%5E%7B1%7D%3D%5Cbegin%7Bbmatrix%7D%20w_%7B11%7D%5E%7B1%7D%26w_%7B12%7D%5E%7B1%7D%5C%5C%20w_%7B21%7D%5E%7B1%7D%26w_%7B22%7D%5E%7B1%7D%20%5Cend%7Bbmatrix%7D%3D%5Cbegin%7Bbmatrix%7D%200.2%260.15%5C%5C%200.5%260.6%20%5Cend%7Bbmatrix%7D" style="vertical-align: middle;display: inline;"></p><p style="text-align: center;"><img alt="Equation" data-latex="b^{1}=\begin{bmatrix} b_{1}^{1}\\ b_{2}^{1} \end{bmatrix}=\begin{bmatrix} 0.1\\ 0.25 \end{bmatrix}" src="https://latex.upgrad.com/render?formula=b%5E%7B1%7D%3D%5Cbegin%7Bbmatrix%7D%20b_%7B1%7D%5E%7B1%7D%5C%5C%20b_%7B2%7D%5E%7B1%7D%20%5Cend%7Bbmatrix%7D%3D%5Cbegin%7Bbmatrix%7D%200.1%5C%5C%200.25%20%5Cend%7Bbmatrix%7D" style="vertical-align: middle;display: inline;"></p><p>We know that:</p><p style="text-align: center;"><img alt="Equation" data-latex="h^{1}=\begin{bmatrix}
h_{1}^{1}\\ 
h_{2}^{1}
\end{bmatrix}=\sigma (W^{1}.x_{i}+b)" src="https://latex.upgrad.com/render?formula=h%5E%7B1%7D%3D%5Cbegin%7Bbmatrix%7D%0Ah_%7B1%7D%5E%7B1%7D%5C%5C%20%0Ah_%7B2%7D%5E%7B1%7D%0A%5Cend%7Bbmatrix%7D%3D%5Csigma%20(W%5E%7B1%7D.x_%7Bi%7D%2Bb)"></p><p style="text-align: center;"><img alt="Equation" data-latex="h^{1}=\sigma (\begin{bmatrix}
w_{11}^{1}x_{1}+w_{12}^{1}x_{2}+b_{1}^{1}\\ 
w_{21}^{1}x_{1}+w_{22}^{1}x_{2}+b_{2}^{1}
\end{bmatrix})" src="https://latex.upgrad.com/render?formula=h%5E%7B1%7D%3D%5Csigma%20(%5Cbegin%7Bbmatrix%7D%0Aw_%7B11%7D%5E%7B1%7Dx_%7B1%7D%2Bw_%7B12%7D%5E%7B1%7Dx_%7B2%7D%2Bb_%7B1%7D%5E%7B1%7D%5C%5C%20%0Aw_%7B21%7D%5E%7B1%7Dx_%7B1%7D%2Bw_%7B22%7D%5E%7B1%7Dx_%7B2%7D%2Bb_%7B2%7D%5E%7B1%7D%0A%5Cend%7Bbmatrix%7D)"></p><p style="text-align: center;"><img alt="Equation" data-latex="h^{1}=\sigma (\begin{bmatrix} 0.2*(-0.32)+0.15*(-0.66)+0.1\\ 0.5*(-0.32)+0.6*(-0.66)+0.25 \end{bmatrix}) =\sigma (\begin{bmatrix} -0.063\\ -0.306 \end{bmatrix})" src="https://latex.upgrad.com/render?formula=h%5E%7B1%7D%3D%5Csigma%20%28%5Cbegin%7Bbmatrix%7D%200.2%2A%28-0.32%29%2B0.15%2A%28-0.66%29%2B0.1%5C%5C%200.5%2A%28-0.32%29%2B0.6%2A%28-0.66%29%2B0.25%20%5Cend%7Bbmatrix%7D%29%20%3D%5Csigma%20%28%5Cbegin%7Bbmatrix%7D%20-0.063%5C%5C%20-0.306%20%5Cend%7Bbmatrix%7D%29" style="vertical-align: middle;display: inline;"></p><p style="text-align: center;"><img alt="Equation" data-latex="h^{1}=\begin{bmatrix} 0.484\\ 0.424 \end{bmatrix}" src="https://latex.upgrad.com/render?formula=h%5E%7B1%7D%3D%5Cbegin%7Bbmatrix%7D%200.484%5C%5C%200.424%20%5Cend%7Bbmatrix%7D" style="vertical-align: middle;display: inline;"></p><p>Now that we have the outputs for the two neurons in the hidden layer, we can calculate the final output.</p><p style="text-align: center;"><strong><img data-height="562" data-width="1039" height="324.54282964388835" src="https://images.upgrad.com/cf425cf8-b12b-406c-bff8-45e688ee0e0e-layer2_node1.png" width="600"></strong></p><p><strong>Layer 2 (Output layer): Node 1</strong></p><p>Moving on to the output layer with the linear activation function, we first compute the cumulative input to the neuron:</p><p style="text-align: center;"><img alt="Equation" data-latex="z_{1}^{2} = w_{11}^{2}h_{1}^{1}+w_{12}^{2}h_{2}^{1}+b_{1}^{2} = 0.3 * 0.484 + 0.2 * 0.424 + 0.4 = 0.63" src="https://latex.upgrad.com/render?formula=z_%7B1%7D%5E%7B2%7D%20%3D%20w_%7B11%7D%5E%7B2%7Dh_%7B1%7D%5E%7B1%7D%2Bw_%7B12%7D%5E%7B2%7Dh_%7B2%7D%5E%7B1%7D%2Bb_%7B1%7D%5E%7B2%7D%20%3D%200.3%20*%200.484%20%2B%200.2%20*%200.424%20%2B%200.4%20%3D%200.63"></p><p>Since this is a regression problem, we have considered the activation function as the <strong>linear activation</strong> function, i.e., the input is sent as the output without any modification. Hence, the output is the same as the cumulative input:</p><p style="text-align: center;"><img alt="Equation" data-latex="h_{1}^{2} = z_{1}^{2} = 0.63" src="https://latex.upgrad.com/render?formula=h_%7B1%7D%5E%7B2%7D%20%3D%20z_%7B1%7D%5E%7B2%7D%20%3D%200.63" style="vertical-align: middle;display: inline;"></p><p>This value of 0.63 is the <strong>prediction </strong>that the neural network makes in the first forward pass.</p><p>The <strong>matrix multiplication method</strong> will give us the same output as shown below:</p><p style="text-align: center;"><img alt="Equation" data-latex="h^{2}=(W^{2}h^{1}+b^{2}) = (\begin{bmatrix} w_{11}^{2} &amp; w_{12}^{2} \end{bmatrix}\begin{bmatrix} h_{1}^{1}\\ h_{2}^{1} \end{bmatrix})+ b^{2}" src="https://latex.upgrad.com/render?formula=h%5E%7B2%7D%3D%28W%5E%7B2%7Dh%5E%7B1%7D%2Bb%5E%7B2%7D%29%20%3D%20%28%5Cbegin%7Bbmatrix%7D%20w_%7B11%7D%5E%7B2%7D%20%26%20w_%7B12%7D%5E%7B2%7D%20%5Cend%7Bbmatrix%7D%5Cbegin%7Bbmatrix%7D%20h_%7B1%7D%5E%7B1%7D%5C%5C%20h_%7B2%7D%5E%7B1%7D%20%5Cend%7Bbmatrix%7D%29%2B%20b%5E%7B2%7D" style="vertical-align: middle;display: inline;"></p><p style="text-align: center;"><img alt="Equation" data-latex="h^{2}=(W^{2}h^{1}+b^{2}) = (\begin{bmatrix} 0.3 &amp; 0.2 \end{bmatrix}\begin{bmatrix} 0.484\\ 0.424 \end{bmatrix})+ 0.4" src="https://latex.upgrad.com/render?formula=h%5E%7B2%7D%3D%28W%5E%7B2%7Dh%5E%7B1%7D%2Bb%5E%7B2%7D%29%20%3D%20%28%5Cbegin%7Bbmatrix%7D%200.3%20%26%200.2%20%5Cend%7Bbmatrix%7D%5Cbegin%7Bbmatrix%7D%200.484%5C%5C%200.424%20%5Cend%7Bbmatrix%7D%29%2B%200.4" style="vertical-align: middle;display: inline;"></p><p style="text-align: center;"><img alt="Equation" data-latex="h^{2}=[0.63]" src="https://latex.upgrad.com/render?formula=h%5E%7B2%7D%3D%5B0.63%5D" style="vertical-align: middle;display: inline;"></p><p>Hence, performing the forward pass through the neural network using the input as [-0.32, -0.66] gives us the <strong>output </strong>as <strong>0.63</strong>. The prediction is very different from the actual value of -0.54, but this is to be expected because we initialised the neural network with random weights and biases. As we train the neural network, we will update these parameters and get better predictions through multiple iterations. In the upcoming session, we will cover this process in depth.&nbsp;</p><p>This was a demonstration of how information flows forward in a neural network from the input to the output, i.e., the forward pass to make a prediction.&nbsp;<br>&nbsp;</p><p>In the next segment, we will introduce a concise algorithm that can be used for any feedforward neural network.&nbsp;<br>&nbsp;</p><p>Before you proceed further, spend some time answering the question next.<br>&nbsp;</p></div></div>

## Questions

![26.png](attachment:c84d65fc-cbe2-4648-8f22-8425e9e3cfe4.png)


## Feedforward Algorithm

<div class="text_component" data-testid="online-editor-content"><p>Having understood how information flows in the network for a regression problem, let’s write the <strong>pseudocode for a feedforward pass</strong> through the network for a single data point <img alt="Equation" data-latex="x_{i}" src="https://latex.upgrad.com/render?formula=x_%7Bi%7D" style="vertical-align: middle;display: inline;">.</p><p><br>The pseudocode for a feedforward pass is given below:</p><ol><li>We initialise the variable&nbsp;<img alt="Equation" data-latex="h^{0}" src="https://latex.upgrad.com/render?formula=h%5E%7B0%7D" style="vertical-align: middle;display: inline;"> as the input:&nbsp;<img alt="Equation" data-latex="h_{0}=x_{i}" src="https://latex.upgrad.com/render?formula=h_%7B0%7D%3Dx_%7Bi%7D" style="vertical-align: middle;display: inline;"></li><li>We loop through each of the layers computing the corresponding output for each layer, i.e., <img alt="Equation" data-latex="h^{l}" src="https://latex.upgrad.com/render?formula=h%5E%7Bl%7D" style="vertical-align: middle;display: inline;">.&nbsp;<br>For l in [1,2,......,L]:&nbsp;<img alt="Equation" data-latex="h^{l}=\sigma (W^{l}.h^{l-1}+b^{l})" src="https://latex.upgrad.com/render?formula=h%5E%7Bl%7D%3D%5Csigma%20%28W%5E%7Bl%7D.h%5E%7Bl-1%7D%2Bb%5E%7Bl%7D%29" style="vertical-align: middle;display: inline;"></li><li>We compute the prediction p by applying an activation function to the output from the previous layer, i.e., we apply a function to <img alt="Equation" data-latex="h^{L}" src="https://latex.upgrad.com/render?formula=h%5E%7BL%7D" style="vertical-align: middle;display: inline;">, as shown below. &nbsp;<img alt="Equation" data-latex="p=f(h^{L})" src="https://latex.upgrad.com/render?formula=p%3Df%28h%5E%7BL%7D%29" style="vertical-align: middle;display: inline;"></li></ol><p>There are some important things to notice here. In both the regression and classification problems, the same algorithm is used till the last step. In the final step, in the classification problem, p defines the probability vector, which gives the probability of the data point belonging to a particular class among different possible classes or categories. In the regression problem, p represents the predicted output obtained, which we will normally refer to as <img alt="Equation" data-latex="h^{L}" src="https://latex.upgrad.com/render?formula=h%5E%7BL%7D" style="vertical-align: middle;display: inline;">.&nbsp;</p><p>Let’s discuss the classification problem. We use the <strong>softmax output</strong>, which we had defined in an earlier session, which gives us the probability vector <img alt="Equation" data-latex="p_{i}" src="https://latex.upgrad.com/render?formula=p_%7Bi%7D" style="vertical-align: middle;display: inline;"> of an input belonging to one of the multiple output classes (c):</p><p style="text-align: center;"><img alt="Equation" data-latex="\begin{bmatrix} p_{i1} \\ . \\ p_{ic} \end{bmatrix}" src="https://latex.upgrad.com/render?formula=%5Cbegin%7Bbmatrix%7D%20p_%7Bi1%7D%20%5C%5C%20.%20%5C%5C%20p_%7Bic%7D%20%5Cend%7Bbmatrix%7D" style="vertical-align: middle;display: inline;"></p><p>As per our understanding of the softmax function, we know that&nbsp;<img alt="Equation" data-latex="p_{ij}=\frac{e^{w_{j}h^{L}}}{\sum_{t=1}^{c}W_{t}h^{L}}" src="https://latex.upgrad.com/render?formula=p_%7Bij%7D%3D%5Cfrac%7Be%5E%7Bw_%7Bj%7Dh%5E%7BL%7D%7D%7D%7B%5Csum_%7Bt%3D1%7D%5E%7Bc%7DW_%7Bt%7Dh%5E%7BL%7D%7D" style="vertical-align: middle;display: inline;"></p><p>j = [1,2,......,c] and c =&nbsp;&nbsp;Number of classes.</p><p>Note that calculating&nbsp;<img alt="Equation" data-latex="p_{ij}=\frac{e^{w_{j}h^{L}}}{\sum_{t=1}^{c}W_{t}h^{L}}" src="https://latex.upgrad.com/render?formula=p_%7Bij%7D%3D%5Cfrac%7Be%5E%7Bw_%7Bj%7Dh%5E%7BL%7D%7D%7D%7B%5Csum_%7Bt%3D1%7D%5E%7Bc%7DW_%7Bt%7Dh%5E%7BL%7D%7D" style="vertical-align: middle;display: inline;">is often called&nbsp;<strong>normalising </strong>the vector&nbsp;<img alt="Equation" data-latex="p_{i}" src="https://latex.upgrad.com/render?formula=p_%7Bi%7D" style="vertical-align: middle;display: inline;">.</p><p>Hence, the complete feedforward algorithm for the <strong>classification problem</strong> becomes:</p><ol><li><img alt="Equation" data-latex="h^{0} = x_{i}" src="https://latex.upgrad.com/render?formula=h%5E%7B0%7D%20%3D%20x_%7Bi%7D" style="vertical-align: middle;display: inline;"></li><li>For&nbsp; l in [1,2,....,L]:&nbsp;<img alt="Equation" data-latex="h^{l}=\sigma (W^{l}.h^{l-1}+b^{l})" src="https://latex.upgrad.com/render?formula=h%5E%7Bl%7D%3D%5Csigma%20%28W%5E%7Bl%7D.h%5E%7Bl-1%7D%2Bb%5E%7Bl%7D%29" style="vertical-align: middle;display: inline;"></li><li><img alt="Equation" data-latex="p_{i}=e^{W^{0}.h^{L}}" src="https://latex.upgrad.com/render?formula=p_%7Bi%7D%3De%5E%7BW%5E%7B0%7D.h%5E%7BL%7D%7D" style="vertical-align: middle;display: inline;"></li><li><img alt="Equation" data-latex="p_{i} = normalise(p_{i})" src="https://latex.upgrad.com/render?formula=p_%7Bi%7D%20%3D%20normalise%28p_%7Bi%7D%29" style="vertical-align: middle;display: inline;"></li></ol><p>The classification feedforward algorithm has been extensively used in industries like finance, healthcare, travel etc. Considering the finance industry, one of the applications of this algorithm is categorising customer applications for credit cards as ‘Good’, ‘Bad’ or ‘Needing further analysis’ by credit card companies. For this, credit card companies consider different factors such as annual salary, any outstanding debts and age. These can be the features in the input vector that is fed into a neural network, which then predicts which category the customer belongs to.&nbsp;<br>&nbsp;</p><p>For the <strong>regression problem</strong>, we can <strong>skip the third and fourth steps</strong>, i.e., computing the probability and normalising the ‘predicted output vector’ p, because in a regression problem, the output is <img alt="Equation" data-latex="h^{L}" src="https://latex.upgrad.com/render?formula=h%5E%7BL%7D" style="vertical-align: middle;display: inline;">,i.e., the value we obtain from the single output node, and we usually compare the output obtained from the ANN directly with the ground truth. We do not need to perform any further operations on the predicted output to get probabilities in a regression problem.<br>&nbsp;</p><p>Note that <img alt="Equation" data-latex="W^{o}" src="https://latex.upgrad.com/render?formula=W%5E%7Bo%7D" style="vertical-align: middle;display: inline;"> (the weights of the output layer) can also be written as <img alt="Equation" data-latex="W^{L+1}" src="https://latex.upgrad.com/render?formula=W%5E%7BL%2B1%7D" style="vertical-align: middle;display: inline;">.</p><h2>Comprehension based Questions</h2><p><br>Let’s try to implement the same algorithm for a classification problem and answer a few questions. Given below is the representation of an ANN. ​</p><p style="text-align: center;"><b id="docs-internal-guid-56cc04da-7fff-56b1-4b91-04befa939ff8"><img data-height="412" data-width="1226" height="201.63132137030996" src="https://images.upgrad.com/c7494c88-9c4b-4ba1-a840-c722141b5810-Screenshot 2022-02-12 at 3.59.08 PM (1).png" width="600"></b></p><p>We have the last weight matrix <img alt="Equation" data-latex="W^{3}" src="https://latex.upgrad.com/render?formula=W%5E%7B3%7D" style="vertical-align: middle;display: inline;"> as <img alt="Equation" data-latex="W^{O}" src="https://latex.upgrad.com/render?formula=W%5E%7BO%7D" style="vertical-align: middle;display: inline;">. The output layer classifies the input into one of these three labels: 1, 2 or 3. The first neuron outputs the probability for label 1, the second neuron outputs the probability for label 2 and the third neuron outputs the probability for label 3.<br>​<br>Now, answer the questions given below.<br><br>The primary goal in machine learning is to get the predicted output to be the same or as close to the ground truth output as possible. We have seen the feedforward algorithm and learnt how to compute each element in an ANN. Now, we want to train the neural network to get the predicted output as close as possible to the actual output. In order to do this, in the next segment, we will discuss the Loss function, which quantifies the difference between the predicted output and the actual output.&nbsp;</p><p>Before you proceed further, spend some time answering the question next.<br>&nbsp;</p></div>

# Loss Function

<div class="MuiBox-root css-1bi8ut6"><div class="text_component" data-testid="text-component"><p><strong>Loss Function - Part 1</strong><br>&nbsp;</p><p>Now that we know how to calculate the predicted output from a neural network when given an input, we want to check if the neural network predicted it correctly. We will revisit the calculations we had done in the previous segment on the housing price prediction problem.</p><table align="center" border="1" cellpadding="1" cellspacing="1"><tbody><tr><td><b id="docs-internal-guid-e1125ce9-7fff-73c3-d0d0-1e72de26a2aa">Std. Number of Rooms</b></td><td><b id="docs-internal-guid-d198c133-7fff-8539-1e7c-60074353277a">Std. House Size (sq. ft)</b></td><td><b id="docs-internal-guid-a713e806-7fff-9a13-63c9-4dac8180970d">Predicted Price</b></td><td><b id="docs-internal-guid-23e9e876-7fff-ae65-d09a-740ea9a42c12">Actual Price</b></td></tr><tr><td>-0.32</td><td>-0.66</td><td>0.63</td><td>-0.54</td></tr></tbody></table><p>As you can see in the table above, the predicted price is not the same or even close to the actual price. So, we want to know how wrong the prediction of the neural network is and want to quantify this error in the prediction. A loss function or cost function will help us quantify such errors.</p><p>A <strong>loss function</strong> or <strong>cost function</strong> is a function that maps an<strong> event</strong> or <strong>values </strong>of one or more variables onto a<strong> real number</strong> intuitively, representing some ‘cost’ associated with the ‘event’, as shown below:&nbsp;</p><p style="text-align: center;"><img alt="Equation" data-latex="L(y,\hat{y}) = f:(y,\hat{y}) →R" src="https://latex.upgrad.com/render?formula=L%28y%2C%5Chat%7By%7D%29%20%3D%20f%3A%28y%2C%5Chat%7By%7D%29%20%E2%86%92R" style="vertical-align: middle;display: inline;"></p><p>Neural networks minimise the error in the prediction by optimising the loss function with respect to the parameters in the network. In other words, this optimisation is done by adjusting the weights and biases. We will see how this adjustment is done in subsequent sessions. For now, we will concentrate on how to compute the loss.&nbsp;</p><p>In the case of regression, the most commonly used loss function is <strong>MSE/RSS.</strong></p><p>In the case of classification, the most commonly used loss function is <strong>Cross Entropy/Log Loss.</strong></p><p>Let’s consider the regression problem where we predict the house price, given the number of rooms and the size of the house. Here, we will use the RSS method to calculate the loss.</p></div></div>

<div class="MuiBox-root css-1bi8ut6"><div class="text_component" data-testid="text-component"><table align="center" border="1" cellpadding="1" cellspacing="1"><tbody><tr><td><b id="docs-internal-guid-a627edfc-7fff-4e86-ef07-a077b89e62de">Std. Number of Rooms</b></td><td><b id="docs-internal-guid-51b9f502-7fff-5efd-936c-978559c99b9a">Std. House Size (sq. ft.)</b></td><td><b id="docs-internal-guid-7cc29ef6-7fff-7357-3d78-a8b5b290489a">Predicted Price</b></td><td><b id="docs-internal-guid-3c0d696a-7fff-8195-3421-847107b79a5c">Actual Price</b></td></tr><tr><td>-0.32</td><td>-0.66</td><td>0.63</td><td>-0.54</td></tr></tbody></table><p style="text-align: center;"><img data-height="559" data-width="1038" height="323.12138728323697" src="https://images.upgrad.com/a16dfeda-2390-4b96-bee8-f28c0d04f8d0-loss.png" width="600"></p><p>In this example, we get a prediction 0.63, but the expected output is -0.54. Let’s calculate the loss using RSS:&nbsp;<br><img alt="Equation" data-latex="Loss(L)=\frac{1}{2}(actual-predicted)^{2} =\frac{1}{2}(-0.54-0.63)^{2} =0.68445" src="https://latex.upgrad.com/render?formula=Loss%28L%29%3D%5Cfrac%7B1%7D%7B2%7D%28actual-predicted%29%5E%7B2%7D%20%3D%5Cfrac%7B1%7D%7B2%7D%28-0.54-0.63%29%5E%7B2%7D%20%3D0.68445" style="vertical-align: middle;display: inline;"></p><p>As given above, the MSE is the mean square error of all the samples in the given data. This gives us a quantified method of measuring how well the neural network is predicting the output.&nbsp;</p></div></div>


![27.png](attachment:42cce1e5-1245-402c-9dd0-646e6f44a4b4.png)

Now, let’s take a look at the loss function for the classification problem. In the next video, you will learn how to quantify the loss for a classification problem.

![28.png](attachment:eba47488-9f63-453f-bcd6-7940931c1301.png)

<div class="text_component" data-testid="text-component"><p>Note: The second neuron of the second layer is incorrectly denoted as&nbsp;<img data-latex="f({z_1}^2)" alt="Equation" src="https://latex.upgrad.com/render?formula=f(%7Bz_1%7D%5E2)">. Instead, it should be <img data-latex="f({z_2}^2)" alt="Equation" src="https://latex.upgrad.com/render?formula=f(%7Bz_2%7D%5E2)">, as shown below:</p><p><img src="https://d35ev2v1xsdze0.cloudfront.net/b019dd53-6706-4afa-b788-b90221cf6409-image.jpg" class="image-editor" maxwidth="100%"></p><p>&nbsp;</p><p>Now that we have learnt about the forward pass and the loss function for regression and classification problems, we know that given any input and its actual output, we can assess the behaviour of the neural network.&nbsp;</p><p>&nbsp;</p><p>Let’s now attempt a few questions based on this topic and then proceed to the next segment to understand how neural networks are trained in order to minimise the loss.&nbsp;</p><p>&nbsp;</p><p>Before you proceed further, spend some time answering the question next.<br>&nbsp;</p></div>

## Question

![29.png](attachment:65c958e7-8169-4f79-ab7b-f90e07f7b2ca.png)

# What Is Learning in Neural Networks - Part 1

In this segment, you will understand how neural networks are trained. Recall that the training task is to **compute the optimal weights and biases** by **minimising some cost function**. Let's start with a quick recap on defining the training task.

<div class="MuiBox-root css-1ernbou"><div class="MuiBox-root css-1xzog2f" id="switch-player-content"></div><div class="MuiBox-root css-j7qwjs" data-testid="switch-player"><div class="MuiBox-root css-1bi8ut6"><div class="text_component" data-testid="text-component"><p><strong>What Is Learning in Neural Networks - Part 1</strong><br>&nbsp;</p><p>In this segment, you will understand how neural networks are <strong>trained</strong>. Recall that the training task is to compute the optimal weights and biases by <strong>minimising some cost function</strong>. Let's start with a quick recap on defining the training task.</p></div></div>
    
    
    
<div class="MuiBox-root css-17j5yzz"><div class="MuiBox-root css-pori7h"></div></div><div class="MuiBox-root css-1bi8ut6"><div class="text_component" data-testid="text-component"><p>The task of training neural networks is similar to that of other ML models such as linear regression and logistic regression. The predicted output (output from the last layer) minus the actual output is the <strong>cost</strong> (or the <strong>loss</strong>), and we have to tune the parameters <img alt="Equation" data-latex="w" src="https://latex.upgrad.com/render?formula=w" style="vertical-align: middle;display: inline;">&nbsp;and&nbsp;<img alt="Equation" data-latex="b" src="https://latex.upgrad.com/render?formula=b" style="vertical-align: middle;display: inline;"> such that <strong>the total cost is minimised. &nbsp;</strong></p><p><br>The loss function for a regression model can be given as follows:</p><p style="text-align: center;"><img alt="Equation" data-latex="Loss(L)=RSS=\sum (actual-h^{L})^{2}\\Loss (L)= f(W,b)" src="https://latex.upgrad.com/render?formula=Loss%28L%29%3DRSS%3D%5Csum%20%28actual-h%5E%7BL%7D%29%5E%7B2%7D%5C%5CLoss%20%28L%29%3D%20f%28W%2Cb%29" style="vertical-align: middle;display: inline;"></p><p>To start training a neural network, we randomly initialise the weights at the outset.</p><p>An important point to note is that if the data is large (which is often the case), the loss calculation itself can get pretty messy. For example, if you have a million data points, they will be fed into the network (in batches), the output will be calculated using feedforward, and the loss/cost <img alt="Equation" data-latex="L_{i}" src="https://latex.upgrad.com/render?formula=L_%7Bi%7D" style="vertical-align: middle;display: inline;">(for <img alt="Equation" data-latex="i^{th}" src="https://latex.upgrad.com/render?formula=i%5E%7Bth%7D" style="vertical-align: middle;display: inline;"> data point) will be calculated. The total loss is the sum of losses of all the individual data points. Hence:&nbsp;</p><p style="text-align: center;"><img alt="Equation" data-latex="Total \:loss=L=L_{1}+L_{2}+L_{3}+........+L_{1000000}" src="https://latex.upgrad.com/render?formula=Total%20%5C%3Aloss%3DL%3DL_%7B1%7D%2BL_%7B2%7D%2BL_%7B3%7D%2B........%2BL_%7B1000000%7D" style="vertical-align: middle;display: inline;"></p><p>The total loss L is a function of <img alt="Equation" data-latex="w" src="https://latex.upgrad.com/render?formula=w" style="vertical-align: middle;display: inline;">'s and <img alt="Equation" data-latex="b" src="https://latex.upgrad.com/render?formula=b" style="vertical-align: middle;display: inline;">'s. Once the total loss is computed, the weights and biases are updated (in the direction of decreasing loss). In other words, L is <strong>minimised with respect to the <strong><img alt="Equation" data-latex="w" src="https://latex.upgrad.com/render?formula=w" style="vertical-align: middle;display: inline;"></strong>'s and <strong><img alt="Equation" data-latex="b" src="https://latex.upgrad.com/render?formula=b" style="vertical-align: middle;display: inline;"></strong>’s.</strong></p><p><br>One important point to note here is that we minimise the average of the total loss and not the total loss that you will get to see shortly. Minimising the average loss implies that the total loss is getting minimised.</p><p><br>This can be done using any optimisation routine such as <strong>gradient descent.&nbsp;</strong></p><p><br>The parameter being optimised is iterated in the direction of reducing cost according to the following rule</p><p style="text-align: center;"><img alt="Equation" data-latex="W_{new}=W_{old}-\alpha \frac{\partial L}{\partial W}" src="https://latex.upgrad.com/render?formula=W_%7Bnew%7D%3DW_%7Bold%7D-%5Calpha%20%5Cfrac%7B%5Cpartial%20L%7D%7B%5Cpartial%20W%7D" style="vertical-align: middle;display: inline;"></p><p>The same can be written for biases. Note that weights and biases are often collectively represented by one matrix called W. Going forward,&nbsp;<img alt="Equation" data-latex="W" src="https://latex.upgrad.com/render?formula=W" style="vertical-align: middle;display: inline;"> will, by default, refer to the matrix of all weights and biases.</p><p><br>The main challenge is that&nbsp;<img alt="Equation" data-latex="W" src="https://latex.upgrad.com/render?formula=W" style="vertical-align: middle;display: inline;"> is a huge matrix, and thus, the total loss L as a function of&nbsp;<img alt="Equation" data-latex="W" src="https://latex.upgrad.com/render?formula=W" style="vertical-align: middle;display: inline;"> is a complex function.</p></div></div><div class="MuiBox-root css-0"></div><div class="MuiBox-root css-0"></div></div></div>


## Training A Network
Let's watch the next video to understand how to deal with this complexity.

![30.png](attachment:76dc61a0-9764-4468-9887-98a2c8199cc7.png)

As you learnt in the video above, the loss function for a very small and simple neural network can be quite complex. The best way to minimise this complex loss function is by using gradient descent.

 

Let us next summarise what you learnt in this session.

 

Before you proceed further, spend some time answering the question next.


## Question

![31.png](attachment:28ffcc1e-e1e9-4e0a-a2db-9cf9ffaadb03.png)


## Summary
 

In this session, you learnt how information flows from the input layer to the output layer in Artificial Neural Networks (feedforward). You studied feedforward for a regression problem based on the housing price prediction problem statement. You also learnt how to specify the dimensions and representations of the weight matrices, biases, inputs and outputs, etc., of the various layers. 


You developed an understanding of how **feedforward can be done in a vectorised form**. 


In order to train a neural network, you need to optimise the weights and biases of the network, and you need to use optimisation techniques such as gradient descent to do this.


In the next module, you will learn about the process of training a neural network using backpropagation.


## Graded Questions

![32.png](attachment:d26a4b82-7d14-40f8-967f-361a1e5f172a.png)


# Perceptron as a Classifier

Now that you understand the design of a perceptron, think about how it can be used for simple learning tasks. To start with, consider a simple binary classification task and spend a few minutes thinking about how the perceptron can work as a classifier.

![33.png](attachment:95f66241-7b1c-491a-8a6d-775879c30445.png)
 

In the following lecture, you will understand how the perceptron can act as a classifier.

<div class="text_component" data-testid="text-component"><p>You saw how the perceptron works as a classifier. The weights represent the importance of the corresponding feature for classification. You might have also noticed that the professor has used a <strong>sign function</strong>. The 'sign function' is similar to the step function&nbsp;- it&nbsp;outputs +1 when the input is greater than 0 and -1&nbsp;otherwise. In a binary classification setting, +1 and -1 represent the two classes.</p><p dir="ltr">This is a simple exercise that will help you better understand how a perceptron works.&nbsp;</p><p dir="ltr">Consider the decision of whether to go&nbsp;to the sushi place being taken by a perceptron model. You have the following factors affecting the decision to go/not go: Distance,&nbsp;Cost and Company. These three variables are&nbsp;inputs to the perceptron. Suppose the inputs can be only 0/1 and the weights you assign to each&nbsp;variable&nbsp;add up&nbsp;to 1.</p><p dir="ltr">A sample set of weights can be <img alt="Equation" data-latex="\begin{bmatrix} 0.5\\ 0.3\\ 0.2 \end{bmatrix}" src="https://latex.upgrad.com/render?formula=%5Cbegin%7Bbmatrix%7D%200.5%5C%5C%200.3%5C%5C%200.2%20%5Cend%7Bbmatrix%7D" style="vertical-align: middle;display: inline;">.</p><p dir="ltr">For each of the inputs, the rules for deciding 1 and 0 are as follows - these are arbitrary mappings that you have decided to make your model simpler:</p><table border="1" cellpadding="1" cellspacing="1"><tbody><tr><td><strong>Factor</strong></td><td><strong>1</strong></td><td><strong>0</strong></td></tr><tr><td>Distance</td><td>&lt; 8 km</td><td>&gt; = 8 km</td></tr><tr><td>Cost</td><td>=&lt; Rs 2000 for two&nbsp;</td><td>&gt; Rs 2000 for two</td></tr><tr><td>Company</td><td>&gt; 2 friends</td><td>&lt; 2 friends</td></tr></tbody></table><p dir="ltr">Assume that the <strong>bias</strong> value is -0.7.&nbsp;The sushi place is 5 km away and 3 of your friends are ready to accompany you. Also, the cost for 2 is INR 2500.</p><p dir="ltr">From this exercise, you would have realised that the weighted sum of inputs,&nbsp;<img alt="Equation" data-latex="w_1x_1 +w_2x_2 + w_3x_3" src="https://latex.upgrad.com/render?formula=w_1x_1%20%2Bw_2x_2%20%2B%20w_3x_3" style="vertical-align: middle;display: inline;">, when crosses a <strong>threshold</strong> (that is 0.7 here), you decide that you’ll go to the restaurant. else you wouldn't go.</p><p dir="ltr">In the next segment, you will understand how a perceptron can perform binary classification in detail.</p><p dir="ltr">Before you proceed further, spend some time answering the question next.<br>&nbsp;</p></div>

## Binary Classification using Perceptron

In the previous segment, you learned the design of a perceptron. In this segment, you will learn how perceptrons can be trained to perform certain tasks. But first, let's formally define the problem statement and fix some notations we'll be using throughout this session.

![34.png](attachment:2463f34c-84f7-41cc-b726-173d425116c6.png)

<div class="text_component" data-testid="text-component"><p dir="ltr">&nbsp;We need to find the correct <img alt="Equation" data-latex="w" src="https://latex.upgrad.com/render?formula=w" style="vertical-align: middle;display: inline;">&nbsp;and <img alt="Equation" data-latex="b" src="https://latex.upgrad.com/render?formula=b" style="vertical-align: middle;display: inline;">&nbsp;such that&nbsp;<img alt="Equation" data-latex="w^T" src="https://latex.upgrad.com/render?formula=w%5ET" style="vertical-align: middle;display: inline;">.<img alt="Equation" data-latex="x" src="https://latex.upgrad.com/render?formula=x" style="vertical-align: middle;display: inline;">&nbsp; +&nbsp;<img alt="Equation" data-latex="b" src="https://latex.upgrad.com/render?formula=b" style="vertical-align: middle;display: inline;">&nbsp; &gt; 0 &nbsp;for all points where <img alt="Equation" data-latex="y = +1" src="https://latex.upgrad.com/render?formula=y%20%3D%20%2B1" style="vertical-align: middle;display: inline;"> and&nbsp; &nbsp;&nbsp;<img alt="Equation" data-latex="w^T" src="https://latex.upgrad.com/render?formula=w%5ET" style="vertical-align: middle;display: inline;">.<img alt="Equation" data-latex="x" src="https://latex.upgrad.com/render?formula=x" style="vertical-align: middle;display: inline;">&nbsp; +&nbsp;<img alt="Equation" data-latex="b" src="https://latex.upgrad.com/render?formula=b" style="vertical-align: middle;display: inline;">&nbsp; &lt; 0 for points where&nbsp;<img alt="Equation" data-latex="y = -1." src="https://latex.upgrad.com/render?formula=y%20%3D%20-1." style="vertical-align: middle;display: inline;"></p><p>Note that the step function used is defined as follows:</p><p><img alt="Equation" data-latex="y = 1" src="https://latex.upgrad.com/render?formula=y%20%3D%201" style="vertical-align: middle;display: inline;">&nbsp;if&nbsp;<img alt="Equation" data-latex="x &gt; 0" src="https://latex.upgrad.com/render?formula=x%20%3E%200" style="vertical-align: middle;display: inline;"></p><p><img alt="Equation" data-latex="y = -1" src="https://latex.upgrad.com/render?formula=y%20%3D%20-1" style="vertical-align: middle;display: inline;">&nbsp;if&nbsp;<img alt="Equation" data-latex="x &lt; = 0" src="https://latex.upgrad.com/render?formula=x%20%3C%20%3D%200" style="vertical-align: middle;display: inline;"></p></div>


![35.png](attachment:ce92a6dd-d9b7-4854-ae41-69f66e9d7280.png)

<div class="text_component" data-testid="text-component"><p dir="ltr">&nbsp;So we see that a certain set&nbsp;<img alt="Equation" data-latex="(w,b)" src="https://latex.upgrad.com/render?formula=%28w%2Cb%29" style="vertical-align: middle;display: inline;">&nbsp;is a valid separator if <img alt="Equation" data-latex="y(w^T.x+b)" src="https://latex.upgrad.com/render?formula=y%28w%5ET.x%2Bb%29" style="vertical-align: middle;display: inline;"><img alt="Equation" data-latex="y(w^T.x+b)" src="https://latex.upgrad.com/render?formula=y(w%5ET.x%2Bb)"><strong>for all the data points</strong> and not a valid separator if <img alt="Equation" data-latex="y(w^T.x+b)" src="https://latex.upgrad.com/render?formula=y%28w%5ET.x%2Bb%29" style="vertical-align: middle;display: inline;">&nbsp;&lt; 0 for <strong>any one </strong>of the data points.</p><p dir="ltr">Let's now solve some questions to concretize these concepts. Say you have the following data points with their corresponding ground truth values.</p><table border="1" cellpadding="1" cellspacing="1"><tbody><tr><td><p>Data points,&nbsp;<img alt="Equation" data-latex="x" src="https://latex.upgrad.com/render?formula=x" style="vertical-align: middle;display: inline;"></p></td><td>Ground Truth,&nbsp;<img alt="Equation" data-latex="y" src="https://latex.upgrad.com/render?formula=y" style="vertical-align: middle;display: inline;">&nbsp;</td></tr><tr><td>(0,3)</td><td>1</td></tr><tr><td>(5,9)</td><td>1</td></tr><tr><td>(-1,-2)</td><td>-1</td></tr></tbody></table><p dir="ltr">Note that the vector&nbsp;<img alt="Equation" data-latex="\begin{bmatrix} x_1\\ x_2\\ x_3 \end{bmatrix}" src="https://latex.upgrad.com/render?formula=%5Cbegin%7Bbmatrix%7D%20x_1%5C%5C%20x_2%5C%5C%20x_3%20%5Cend%7Bbmatrix%7D" style="vertical-align: middle;display: inline;"><img alt="Equation" data-latex="\begin{bmatrix}
x_1\\ 
x_2\\ x_3

\end{bmatrix}" src="https://latex.upgrad.com/render?formula=%5Cbegin%7Bbmatrix%7D%0Ax_1%5C%5C%20%0Ax_2%5C%5C%20x_3%0A%0A%5Cend%7Bbmatrix%7D">can be represented as&nbsp;(<img alt="Equation" data-latex="x_1" src="https://latex.upgrad.com/render?formula=x_1" style="vertical-align: middle;display: inline;">, <img alt="Equation" data-latex="x_2" src="https://latex.upgrad.com/render?formula=x_2" style="vertical-align: middle;display: inline;">, <img alt="Equation" data-latex="x_3" src="https://latex.upgrad.com/render?formula=x_3" style="vertical-align: middle;display: inline;">).</p></div>

![36.png](attachment:8a14a8b7-49f5-443e-ab69-15d90b973f4b.png)

### Homogenous coordinates 

Before we move on, let us first tweak our representation a little to homogenous coordinates which will help us in formulating the perceptron solution more neatly.

![37.png](attachment:fc02bfa2-0d8b-46df-9ef9-10ebeb9d791c.png)

<div class="text_component" data-testid="text-component"><p>So what homogeneous coordinates mean is as follows:</p><p><img data-latex="x" alt="Equation" src="https://latex.upgrad.com/render?formula=x">&nbsp;earlier represented as this&nbsp;<img data-latex="\begin{bmatrix}
x_1\\ 
x_2\\ 
.\\ 
.\\ 
x_d
\end{bmatrix}" alt="Equation" src="https://latex.upgrad.com/render?formula=%5Cbegin%7Bbmatrix%7D%0Ax_1%5C%5C%20%0Ax_2%5C%5C%20%0A.%5C%5C%20%0A.%5C%5C%20%0Ax_d%0A%5Cend%7Bbmatrix%7D">ransforms to this<img data-latex="\begin{bmatrix}
x_1\\ 
x_2\\ 
.\\ 
.\\ 
x_d\\
1
\end{bmatrix}" alt="Equation" src="https://latex.upgrad.com/render?formula=%5Cbegin%7Bbmatrix%7D%0Ax_1%5C%5C%20%0Ax_2%5C%5C%20%0A.%5C%5C%20%0A.%5C%5C%20%0Ax_d%5C%5C%0A1%0A%5Cend%7Bbmatrix%7D"></p><p>&nbsp;</p><p>w&nbsp;earlier represented as this&nbsp;<img data-latex="\begin{bmatrix}
w_1\\ 
w_2\\ 
.\\ 
.\\ 
w_d
\end{bmatrix}" alt="Equation" src="https://latex.upgrad.com/render?formula=%5Cbegin%7Bbmatrix%7D%0Aw_1%5C%5C%20%0Aw_2%5C%5C%20%0A.%5C%5C%20%0A.%5C%5C%20%0Aw_d%0A%5Cend%7Bbmatrix%7D">&nbsp;transforms to this&nbsp;<img data-latex="\begin{bmatrix}
w_1\\ 
w_2\\ 
.\\ 
.\\ 
w_d\\
b
\end{bmatrix}" alt="Equation" src="https://latex.upgrad.com/render?formula=%5Cbegin%7Bbmatrix%7D%0Aw_1%5C%5C%20%0Aw_2%5C%5C%20%0A.%5C%5C%20%0A.%5C%5C%20%0Aw_d%5C%5C%0Ab%0A%5Cend%7Bbmatrix%7D"></p><p>&nbsp;</p><p>This new representation does not explicitly state the existence of a bias term though it intrinsically includes it.</p><p>&nbsp;</p><p>So you have understood how we use homogeneous coordinates to represent the perceptron more concisely. This will help us in illustrating some of the wonderful tasks a set of perceptrons can do. Let's look at them in the next segment.</p><p>&nbsp;</p><p>Before you proceed further, spend some time answering the question next.<br>&nbsp;</p></div>

## Question

![38.png](attachment:6e1a26e6-a158-4253-8212-8e5f0c89bf65.png)

# Perceptrons - Training

<div class="text_component" data-testid="text-component"><p>To summarise, Rosenblatt suggested an elegant&nbsp;iterative solution to train the perceptron (i.e. to learn the weights):</p><p style="text-align: center;"><img class="image-editor" maxwidth="100%" src="https://d35ev2v1xsdze0.cloudfront.net/1c50bad0-fe66-451d-9e64-e9b0d39e1558-9sbziu6i.png"></p><p dir="ltr">&nbsp;where <img alt="Equation" data-latex="{y_i}_t.{x_i}_t" src="https://latex.upgrad.com/render?formula=%7By_i%7D_t.%7Bx_i%7D_t" style="vertical-align: middle;display: inline;"> is the error term. It is important to note here that <img alt="Equation" data-latex="{x_i}_t" src="https://latex.upgrad.com/render?formula=%7Bx_i%7D_t" style="vertical-align: middle;display: inline;"> in this iterative procedure is a&nbsp;<strong>misclassified data point</strong> and <img alt="Equation" data-latex="{y_i}_t" src="https://latex.upgrad.com/render?formula=%7By_i%7D_t" style="vertical-align: middle;display: inline;">&nbsp;is the corresponding true label. Also, note that the dot in <img alt="Equation" data-latex="{y_i}_t.{x_i}_t" src="https://latex.upgrad.com/render?formula=%7By_i%7D_t.%7Bx_i%7D_t" style="vertical-align: middle;display: inline;"> is not a&nbsp;dot&nbsp;product.&nbsp;Let’s try to understand the intuition behind this with an example.</p><p>Consider the following figure with 6 data points and the separator. These are represented by numbers on a scale. The blue points belong to class '-1' and the orange points belong to the class&nbsp;'+1'.</p><p>The coordinates of the points and the labels are given in the following table:</p><table border="1" cellpadding="1" cellspacing="1"><tbody><tr><td>Data Points</td><td><img alt="Equation" data-latex="x_1" src="https://latex.upgrad.com/render?formula=x_1" style="vertical-align: middle;display: inline;"></td><td><img alt="Equation" data-latex="x_2" src="https://latex.upgrad.com/render?formula=x_2" style="vertical-align: middle;display: inline;"></td><td>True Label(y)</td><td>Homogeneous coordinates</td></tr><tr><td>0</td><td>1</td><td>0</td><td>1</td><td>(1,0,1)</td></tr><tr><td>1</td><td>3</td><td>1</td><td>1</td><td>(3,1,1)</td></tr><tr><td>2</td><td>4</td><td>2</td><td>1</td><td>(4,2,1)</td></tr><tr><td>3</td><td>0</td><td>1</td><td>-1</td><td>(0,1,1)</td></tr><tr><td>4</td><td>1</td><td>6</td><td>-1</td><td>(1,6,1)</td></tr><tr><td>5</td><td>2</td><td>4</td><td>-1</td><td>(2,4,1)</td></tr></tbody></table><p>Please note that the last column has the homogeneous coordinates of the data points.</p><p>The initial classifier is (3, -1, 0) which when expressed&nbsp;algebraically is&nbsp;<img alt="Equation" data-latex="3x_1-1x_2 = 0" src="https://latex.upgrad.com/render?formula=3x_1-1x_2%20%3D%200" style="vertical-align: middle;display: inline;">.</p><p style="text-align: center;"><img class="image-editor" maxwidth="100%" src="https://d35ev2v1xsdze0.cloudfront.net/7238c30c-f0db-4168-aabb-6e8c853c3d12-sro3hidd.png"></p><p>Let's start with the first iteration. The misclassified data point is the data point '5': (2,4,1).</p><p>Hence in the formula&nbsp;<img alt="Equation" data-latex="w_{t+1} = w_{t}" src="https://latex.upgrad.com/render?formula=w_%7Bt%2B1%7D%20%3D%20w_%7Bt%7D" style="vertical-align: middle;display: inline;">+&nbsp;<img alt="Equation" data-latex="{y_i}_t.{x_i}_t" src="https://latex.upgrad.com/render?formula=%7By_i%7D_t.%7Bx_i%7D_t" style="vertical-align: middle;display: inline;">,&nbsp;<img alt="Equation" data-latex="{x_i}_t" src="https://latex.upgrad.com/render?formula=%7Bx_i%7D_t" style="vertical-align: middle;display: inline;">&nbsp;is (2,4,1) and&nbsp;<img alt="Equation" data-latex="{y_i}_t" src="https://latex.upgrad.com/render?formula=%7By_i%7D_t" style="vertical-align: middle;display: inline;">&nbsp;is the true label '-1'.</p><p>We get&nbsp;<img alt="Equation" data-latex="w_{1} =" src="https://latex.upgrad.com/render?formula=w_%7B1%7D%20%3D" style="vertical-align: middle;display: inline;"> <img alt="Equation" data-latex="\begin{bmatrix} 3\\-1 \\ 0 \end{bmatrix}" src="https://latex.upgrad.com/render?formula=%5Cbegin%7Bbmatrix%7D%203%5C%5C-1%20%5C%5C%200%20%5Cend%7Bbmatrix%7D" style="vertical-align: middle;display: inline;"><meta charset="utf-8"><b id="docs-internal-guid-fe5130de-7fff-a089-922d-fac38b16190f">+ (-1)*</b><img alt="Equation" data-latex="\begin{bmatrix} 2\\4 \\ 1 \end{bmatrix}" src="https://latex.upgrad.com/render?formula=%5Cbegin%7Bbmatrix%7D%202%5C%5C4%20%5C%5C%201%20%5Cend%7Bbmatrix%7D" style="vertical-align: middle;display: inline;"><img src="https://latex.upgrad.com/render?formula=%5Cbegin%7Bbmatrix%7D%0A2%5C%5C4%20%0A%5C%5C%201%0A%0A%5Cend%7Bbmatrix%7D">=&nbsp;><img alt="Equation" data-latex="\begin{bmatrix} 1\\-5 \\ -1 \end{bmatrix}" src="https://latex.upgrad.com/render?formula=%5Cbegin%7Bbmatrix%7D%201%5C%5C-5%20%5C%5C%20-1%20%5Cend%7Bbmatrix%7D" style="vertical-align: middle;display: inline;"></p><p><img alt="Equation" data-latex="w_{1} =" src="https://latex.upgrad.com/render?formula=w_%7B1%7D%20%3D" style="vertical-align: middle;display: inline;">&nbsp;(1, -5, -1) which is&nbsp;<img alt="Equation" data-latex="1x_1-5x_2 = -1" src="https://latex.upgrad.com/render?formula=1x_1-5x_2%20%3D%20-1" style="vertical-align: middle;display: inline;">&nbsp;shown in the figure below.</p><p style="text-align: center;"><img class="image-editor" maxwidth="100%" src="https://d35ev2v1xsdze0.cloudfront.net/0c585653-7989-4d46-a61b-ba8c8093e2cf-mqy7t7bf.png"></p><p>You have seen how we performed the&nbsp;<img alt="Equation" data-latex="2^{nd}" src="https://latex.upgrad.com/render?formula=2%5E%7Bnd%7D" style="vertical-align: middle;display: inline;">&nbsp;iteration to get&nbsp;<img alt="Equation" data-latex="w_{1}" src="https://latex.upgrad.com/render?formula=w_%7B1%7D" style="vertical-align: middle;display: inline;">. Notice that the line moves in the right direction, though it misclassifies two orange points now (and passes through one).</p><p>Now answer the following questions to get&nbsp;<img alt="Equation" data-latex="w_{2}" src="https://latex.upgrad.com/render?formula=w_%7B2%7D" style="vertical-align: middle;display: inline;">.&nbsp;<br>&nbsp;</p><p><img alt="Equation" data-latex="w_2" src="https://latex.upgrad.com/render?formula=w_2"><img alt="Equation" data-latex="\begin{bmatrix}
5\\-3 
\\ 0

\end{bmatrix}" src="https://latex.upgrad.com/render?formula=%5Cbegin%7Bbmatrix%7D%0A5%5C%5C-3%20%0A%5C%5C%200%0A%0A%5Cend%7Bbmatrix%7D"><img alt="Equation" data-latex="5x_1-3x_2 = 0" src="https://latex.upgrad.com/render?formula=5x_1-3x_2%20%3D%200"></p><p style="text-align: center;"><img class="image-editor" maxwidth="100%" src="https://d35ev2v1xsdze0.cloudfront.net/3a987d34-74ad-43e8-9696-83e36b805880-5rnrjxc5.png"></p><p>This is a simple way to understand the intuition behind the algorithm. You can go through&nbsp;the mathematics of the proof in the&nbsp;additional reading section.</p><p>You have seen how a perceptron performs binary classification but wouldn't it be amazing if these simple devices could do something more complex? Let's see how a group of perceptrons can do multiclass classification in the next segment.</p><p dir="ltr"><b>Additional Readings:&nbsp;</b></p><ul dir="ltr"><li>Please find the proof of learning algorithm of the perceptron&nbsp;<a href="https://www.cse.iitb.ac.in/~shivaram/teaching/old/cs344+386-s2017/resources/classnote-1.pdf" target="_blank">here</a>.</li></ul></div>

# Multiclass Classification using Perceptrons

Until now, you have seen how a perceptron performs binary classification. But if that were the only task a perceptron (or a collection of them) could do, we wouldn’t have cared much about them. It turns out that they can do much more complex things, such as **multiclass classification**. Let’s see how a set of perceptrons can perform more complex tasks.

<div class="MuiBox-root css-1bi8ut6"><div class="text_component" data-testid="text-component"><p dir="ltr">We see how a network of perceptrons can act as a <strong>universal function approximator</strong>. We have seen how a single layer of perceptron in combination with an AND gate leads to an enclosure in a polygon, and multiple such AND outputs using an OR gate lead to an enclosure in multiple polygons. In the most extreme case, this can be extended to finding a polygon for every single data point.</p><p>&nbsp;</p><p dir="ltr">Let’s now test what you have understood in the previous few lectures.</p><p>&nbsp;</p><p dir="ltr">In the following figure, there are two classes shown in two colours.</p><p dir="ltr" style="text-align: center;"><img class="image-editor" maxwidth="100%" src="https://d35ev2v1xsdze0.cloudfront.net/9573fcfe-3851-4df1-96a9-e5259014dde7-1jw9jtx8.png"></p><p dir="ltr" style="text-align: center;">&nbsp;</p><p dir="ltr">You know that a single perceptron is a binary classifier and that it can be defined in any of the following shown ways.&nbsp; You have the freedom to decide which one to use to answer the following questions.</p><p dir="ltr" style="text-align: center;"><img class="image-editor" maxwidth="100%" src="https://d35ev2v1xsdze0.cloudfront.net/e46af55e-0a46-4cb9-a29b-8d88a9effc82-1t2qwky3.png"></p><p dir="ltr" style="text-align: center;">&nbsp;</p><p dir="ltr">That brings us to the end of understanding how&nbsp;networks of simple perceptrons can act as universal function approximators.<br><br>Before you proceed further, spend some time answering the question next.<br>&nbsp;</p></div></div>