### 1.3 Vectorizing Logistic Regression
You will be using multiple one-vs-all logistic regression models to build a multi-class classifier. Since there are 10 classes, you will need to train 10 separate logistic regression classifiers. To make this training efficient, it is important to ensure that your code is well vectorized. In this section, you will implement a vectorized version of logistic regression that does not employ any `for` loops. 

#### 1.3.1 Vectorizing the cost function

We will begin by writing a vectorized version of the cost function. Recall that in (unregularized) logistic regression, the cost function is:
$$J(\theta) = \frac{1}{m}\sum[-y^{(i)}\log(h_\theta(x^{(i)})-(1-y^{(i)})\log(1-h_\theta(x^{(i)}))]$$
To compute each element in the summation, we have to compute $h_\theta(x^{(i)})$ for every example $i$, where $h_\theta(x^{(i)})= g(\theta^Tx^{(i)})$ and $g(z)=\frac{1}{1+e^{-z}}$ is the sigmoid function. It turns out that we can compute this quickly for all our examples by using matrix multiplication. Let us define $X$ and $\theta$ as
$$
X=\left[ \begin{array}{c}
-(x^{(1)})^T-  \\
-(x^{(1)})^T-   \\
 \vdots      \\
-(x^{(m)})^T-   \\
 \end{array} \right]
\qquad and \qquad 
\theta = \left[\begin{array}{c}
\theta_0 \\
\theta_1 \\
\vdots \\
\theta_n \\
\end{array} \right].
$$

Then, by computing the matrix product $X\theta$, we have

$$
X  \theta = \left[ \begin{array}{c}
-(x^{(1)})^T\theta -  \\
-(x^{(1)})^T\theta -   \\
 \vdots      \\
-(x^{(m)})^T\theta -   \\
 \end{array} \right]
= \left[ \begin{array}{c}
-\theta^T(x^{(1)})-  \\
-\theta^T(x^{(1)})-   \\
 \vdots      \\
-\theta^T(x^{(m)})-   \\
 \end{array} \right].
$$

In the last equality, we used the fact that $a^Tb = b^Ta$ if $a$ and $b$ are vectors. This allows us to compute the products $\theta^Tx(i)$θ T x (i) for all our examples i in one line of code.
Your job is  to write the unregularized cost function in the file <b>lr_cost_function</b>. Your implementation should use the strategy we presented above to calculate $\theta^Tx(i)$ . You should also use a vectorized approach for the rest of the cost function. A fully vectorized version of <b>lr_cost_function</b> should not contain any loops.
#### 1.3.2 Vectorizing the gradient
<div align="justify"> <div style="text-indent: 25px">Recall that the gradient of the (unregularized) logistic regression cost is a vector where the $j^{th}$ element is defined as</div></div>
$$ \frac {\delta J}{\delta \theta_j} = \frac{1}{m}\sum^{m}_{i=1}{\large((h_\theta(x^{(i)}) - y^{(i)})x^{(i)}_j)}$$

To vectorize this operation over the dataset, we start by writing out all the partial derivatives explicitly for all $\theta_j$,
$$
\left[ \begin{array}{c}
    \frac{\delta J}{\delta \theta_0} \\
    \frac{\delta J}{\delta \theta_1} \\
    \frac{\delta J}{\delta \theta_2} \\
    \vdots \\
    \frac{\delta J}{\delta \theta_n} \\
\end{array}\right] = \frac{1}{m}  
\left[ \begin{array}{c}
    \sum^{m}_{i=1}{(h_\theta(x^{(i)}) - y^{(i)})x^{(i)}_0} \\
    \sum^{m}_{i=1}{(h_\theta(x^{(i)}) - y^{(i)})x^{(i)}_1} \\
    \sum^{m}_{i=1}{(h_\theta(x^{(i)}) - y^{(i)})x^{(i)}_2} \\
    \vdots \\
    \sum^{m}_{i=1}{(h_\theta(x^{(i)}) - y^{(i)})x^{(i)}_n} \\
\end{array}\right]  
    =\frac{1}{m} \sum^{m}_{i=1}{((h_\theta(x^{(i)}) - y^{(i)}x^{(i)} )}  
    =\frac{1}{m} X^T(h_\theta(x) - y). 
$$ 

where
$
h_\theta(x) - y =
\left[\begin{array}{c}
    h_\theta(x^{(i)}) - y^{(1)} \\ 
    h_\theta(x^{(i)}) - y^{(2)} \\
    \vdots \\
    h_\theta(x^{(i)}) - y^{(n)})
\end{array}\right]
$

Note that $x^{(i)}$ is a vector, while $(h_\theta(x^{(i)}) − y^{(i)})$ is a scalar (single number). To understand the last step of the derivation, let $\beta_i = (h_\theta(x^{(i)}) − y^{(i)})$ and observe that:
$$
\sum_{i}{\beta_ix^{(i)}} = 
    \left[\begin{array}{cccc}
        \vert &\vert& &\vert \\
        x^{(1)} & x^{(2)} &\cdots & x^{(m)}\\
        \vert &\vert& & \vert 
    \end{array}\right]  
    \left[\begin{array}{c}
        \beta_1 \\
        \beta_2 \\
        \vdots \\
        \beta_m
    \end{array}\right] = X^T\beta, \text{ where the values } \beta_i = (h_\theta(x^{(i)}) - y^{(i)}).
$$
The expression above allows us to compute all the partial derivatives without any loops. If you are comfortable with linear algebra, we encourage you to work through the matrix multiplications above to convince yourself that the vectorized version does the same computations. You should now implement above Equation to compute the correct vectorized gradient. Once you
are done, complete the function <b>lr_cost_function</b> by implementing the gradient.
<p style="border:3px; border-style:solid; border-color:#000000; padding: 1em;">
<b>Debugging Tip</b>: Vectorizing code can sometimes be tricky. One common strategy for debugging is to print out the sizes of the matrices you are working with using the size function. For example, given a data matrix $X$ of size $100 × 20$ (100 examples, 20 features) and $\theta$, a vector with dimensions $20×1$, you can observe that $X\theta$ is a valid multiplication operation, while $\theta X$ is not. Furthermore, if you have a non-vectorized version of your code, you can compare the output of your vectorized code and non-vectorized code to make sure that they produce the same outputs.
</p>  

#### 1.3.3 Vectorizing regularized logistic regression
After you have implemented vectorization for logistic regression, you will now add regularization to the cost function. Recall that for regularized logistic regression, the cost function is defined as 
$$J(\theta) = \frac{1}{m}\sum^{m}_{i=1}{[-y^{(i)}\log{(h_\theta(x^{(i)}))} - (1 - y^{(i)})\log{(1 - h_\theta(x^{(i)}))} ]}
+ \frac{\lambda}{2m} \sum^{n}_{j=1}{\theta^2_j}$$

Note that you should not be regularizing $\theta_0$ which is used for the bias term.  
Correspondingly, the partial derivative of regularized logistic regression cost for $\theta_j$ is defined as
$$
\left.\begin{array}{lr}
    \frac{\delta J(\theta)}{\delta\theta_0} = \frac{1}{m}\sum^{m}_{i=1}{(h_\theta(x^{(i)}) - y^{(i)})x^{(i)}_j} & 
    \text{, for } j=0 \\
    \frac{\delta J(\theta)}{\delta\theta_j} = (\frac{1}{m}\sum^{m}_{i=1}{(h_\theta(x^{(i)}) - y^{(i)})x^{(i)}_j}) + 
    \frac{\delta}{m}\theta_j & \text{, for } j \geq 1
\end{array}\right.
$$

Now modify your code in <b>lr_cost_function</b> to account for regularization.
Once again, you should not put any loops into your code.