# Gradient Descent Optimization in Linear Regression

## Introduction
Hello and welcome to another session on "Regression and Gradient Descent." In today's syllabus, we will construct and fit the gradient descent algorithm into a linear regression problem. Though linear regression has a direct solution, gradient descent is essential for computational efficiency, especially when handling larger datasets or complex models.

## The Concept of Gradient Descent
Gradient descent is an iterative optimization algorithm for minimizing a function, usually a loss function, quantifying the disparity between predicted and actual results. The goal of gradient descent is to find the parameters that minimize the value of the loss function. Importantly, gradient descent navigates its way to the minimum of the function by moving iteratively toward the direction of the steepest descent. To leverage gradient descent, the target function must be differentiable.

### Taking Steps with Gradient Descent
Gradient descent derives its name from its working mechanism: taking descents along the gradient. It operates in several iterative steps as follows:

1. **Choose random values for initial parameters.**
2. **Calculate the cost (the difference between actual and predicted value).**
3. **Compute the gradient (the steepest slope of the function around that point).**
4. **Update the parameters using the gradient.**
5. **Repeat steps 2 to 4 until an acceptable error rate is reached or the maximum iterations are exhausted.**

A vital component of gradient descent is the learning rate, which determines the size of the descent towards the optimum solution. If the learning rate is too high, we may overshoot the minimum; if it's too low, the convergence to the minimum may take too long.

## Implementing Gradient Descent in Python: The Cost Function
Let's implement gradient descent from scratch with a basic understanding of the algorithm. We need two functions: one for calculating the cost and another for calculating and applying the gradient to update our parameters. We'll also add an early stop mechanism to halt computations after a predefined number of iterations.

The cost function is as follows:

\[
J(X, y, \theta) = \frac{1}{m} \sum_{i=1}^{m} (X \cdot \theta - y_i)^2
\]

Where:
- \( J \) is the cost,
- \( X \) is the data,
- \( y \) are the actual values,
- \( \theta \) are the parameters,
- \( m \) is the length of \( y \).

This is the calculation of the mean squared error.

```python
import numpy as np

def cost(X, y, theta):
    m = len(y)
    predictions = X.dot(theta)
    cost = (1/m) * np.sum(np.square(predictions-y))  # Compute mean square error
    return cost
```

## Implementing Gradient Descent in Python: The Gradient Descent
Next, for the gradient descent function, we follow the gradient descent update rule:

\[
\theta := \theta - \alpha \frac{1}{m} X^T \cdot (X \cdot \theta - y)
\]

Where:
- \( \alpha \) is the learning rate,
- \( X^T \) is the transpose of the data.

Note that the derivative of the mean squared error usually includes a factor of 2, but we can incorporate this into the learning rate for simplicity.

```python
def gradient_descent(X, y, theta, alpha, iterations):
    m = len(y)
    cost_history = np.zeros(iterations)
    theta_history = np.zeros((iterations, 2))
    for i in range(iterations):  # Iterate until convergence
        prediction = np.dot(X, theta)  # Matrix multiplication between X and theta
        theta = theta - (1/m) * alpha * (X.T.dot((prediction - y)))  # Gradient update rule
        theta_history[i, :] = theta.T
        cost_history[i] = cost(X, y, theta)
    return theta, cost_history, theta_history
```

## Applying Gradient Descent to Linear Regression
Let's apply our gradient descent function to a simple linear regression problem. The form of linear regression is:

\[
y = ax + b
\]

Where:
- \( a \) and \( b \) are the parameters \( \theta \) that we need to learn.

The following data has been generated based on this form with some noise.

```python
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)

lr = 0.01  # Learning Rate
n_iter = 1000  # Max number of iterations
theta = np.random.randn(2, 1)  # Randomly initialized parameters
X_b = np.c_[np.ones((len(X), 1)), X]  # Add bias parameter to X
theta, cost_history, theta_history = gradient_descent(X_b, y, theta, lr, n_iter)  # Gradient Descent
```

## Lesson Summary and Practice
Congratulations! You have mastered implementing the gradient descent algorithm and its application to linear regression. We covered theoretical explanations, derived the math behind the cost function and the gradient descent update rule, and brought these concepts to life by coding in Python.

It is now time to practice and solidify what you have learned. In the upcoming exercises, challenge yourself with different problems and experiment with varying parameters like the learning rate. Enjoy your journey into the world of gradients!
```

## Adjust the Learning Rate

## Applying Gradient Descent in Real Estate Pricing

## Implementing Gradient Descent in Real Estate Analysis

## Trying New Approach