## Logistic Regression (C1W2L02)

Given $x$ we want:

$$
\hat{y} = P(y = 1 \mid x)
$$

- $x$ is the input feature 
- $\hat{y}$ is the predicted probability produced by the model (it's not the true value). It represents the model estimate.
- $P(y = 1 \mid x)$ is the conditional probability, i.e., the probability that $y$ equals 1 given $x$.
- $y$ is the true label (0 or 1), the ground truth. This parameter comes from the dataset.

Example to illustrate $y$:

| Pixel (R) | y |
| --------- | - |
| 255       | 1 |
| 240       | 1 |
| 210       | 1 |
| 180       | 0 |
| 90        | 0 |

In this example, we have a simple dataset where the input feature is the red channel value of a pixel (ranging from 0 to 255), and the corresponding label $y$ indicates whether the pixel is classified as belonging to the positive class (1) or not (0) (or whether it's bright or dark).

We need to find a way to adjust the output so that it can be interpreted as a probability between 0 and 1. This is done by using the sigmoid function:

$$
\sigma(z) = \frac{1}{1 + e^{-z}}
$$

Our model can be represented as:
$$
\hat{y} = \sigma(w^T x + b)
$$

The sigmoid function is fixed, it receives an input $z$ and outputs a value between 0 and 1. The parameters we can adjust are $w$ (weights) and $b$ (bias) before applying the sigmoid function.

## Logistic Regression Cost Function (C1W2L03)

Cost / Loss function measures how well the model's predictions match the true labels. It quantifies the difference between the predicted probabilities $\hat{y}$ and the actual labels $y$.

The most common cost function used is the quadratic cost function, but in the example Andrew Ng uses, it's not ideal for logistic regression  because it can lead to non-convex optimization problem.

Instead, we use the logistic loss (also known as log loss or binary cross-entropy loss):
$$
f(\hat{y},y) = -y \log(\hat{y}) - (1-y) \log(1-\hat{y})
$$

If $y$ is 1, only the first part of the equation matters, if $y$ is 0, only the second part matters.

Let's say $y$ = 1 and $\hat{y}$ = 0.7:

$$
f(0.7, 1) = -1 \cdot \log(0.7) - (1-1) \cdot \log(1-0.7)
$$
$$
= -1 \cdot \log(0.7) - 0 \cdot \log(0.3) 
$$

Here the second part doesn't matter because it's multiplied by 0 now.

Then:

$$
f(0.7, 1) = -\log(0.7)
$$
$$
\approx 0.3567
$$

