# Lesson 4: Understanding Logistic Regression and Its Implementation Using Gradient Descent

# Understanding Logistic Regression and Its Implementation Using Gradient Descent

## Introduction
Welcome to our new lesson on Logistic Regression and its implementation using the Gradient Descent technique. Having familiarized yourself with the fundamentals of Regression Analysis and the operation of Gradient Descent in optimizing regression models, we'll now address a different kind of problem: Classification. While Regression Analysis is suitable for predicting continuous variables, predicting categories such as whether an email is spam or not requires specially designed tools — one of them being Logistic Regression.

In this lesson, we'll guide you through the basic concepts that define Logistic Regression, focusing on its unique components like the Sigmoid function and Log-Likelihood. Eventually, we'll utilize Python to engineer a straightforward Logistic Regression model using Gradient Descent. By the end of this lesson, you will have broadened your theoretical understanding of another vital machine learning concept and enhanced your practical Python coding skills.

## Classification: From Linear Regression to Logistic Regression
So far, we've dealt with tasks where a continuous output needs prediction based on one or more input variables - these tasks are known as regression tasks. There is, however, another category of tasks known as classification tasks, where the objective is to predict a categorical outcome. These categories are often binary, like "spam"/"not spam" for an email or "malignant"/"benign" for a tumor. The models we've studied so far are not optimal for predicting categorical outcomes - for example, it isn't intuitive to understand what it means for an email to be "0.67" spam. Enter Logistic Regression - a classification algorithm that can predict the probability of a binary outcome.

## Sigmoid Function: The Heart of Logistic Regression
While Linear Regression makes predictions by directly calculating the output, Logistic Regression does it differently. Instead of directly predicting the output, Logistic Regression calculates a raw model output, then transforms it using the sigmoid function, mapping it to a range between 0 and 1, thus making it a probability.

The Sigmoid function is defined as:

\[
S(x) = \frac{1}{1 + e^{-x}}
\]

We can implement it like this:

```python
def sigmoid(z):
    return 1 / (1 + np.exp(-z))
```

It looks like this:

![Sigmoid Function](https://via.placeholder.com/800x200.png)

When providing a high positive input, the output of \(S(x)\) is close to 1, and for a large negative input, the output is close to 0. This feature of the Sigmoid function makes it a perfect fit when we want to classify emails into two categories: "spam" or "not-spam".

## Understanding Logistic Regression
The mathematical form of Logistic Regression can be expressed as follows:

\[
P(Y = 1 \mid x) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x)}}
\]

Where:
- \(P(Y = 1 \mid x)\) is the probability of event \(Y = 1\) given \(x\).
- \(\beta_0\) and \(\beta_1\) are parameters of the model.
- \(x\) is the input variable.
- \(\beta_0 + \beta_1 x\) is the linear combination of parameters and feature(s).

Log-Likelihood in Logistic Regression plays a similar role to the Least Squares method in Linear Regression. A maximum likelihood estimation method estimates parameters that maximize the likelihood of making the observations we collected. In Logistic Regression, we seek to maximize the log-likelihood.

## The Cost Function in Logistic Regression
We've seen the least squares cost function in Linear Regression. However, in Logistic Regression, the cost function is defined differently.

The cost function for a single training instance can be expressed as:

\[
- \left[ y \log(\hat{p}) + (1 - y) \log(1 - \hat{p}) \right]
\]

Where \(\hat{p}\) denotes the predicted probability.

We can implement it like this:

```python
def cost_function(h, y):
    return (-y * np.log(h) - (1 - y) * np.log(1 - h)).mean()
```

Let's plot it:

![Cost Function](https://via.placeholder.com/800x200.png)

This function makes sense because \(-\log(t)\) approaches 0 when \(t\) approaches 1, so the cost will be close to 0 if the predicted probability is near the actual target. However, the cost will approach \(\infty\) when \(t\) approaches 0, which coincides with predicting a probability close to 0 for a positive instance will be highly penalized. This peculiar feature of the cost function gives rise to another concern, the threshold selection. You might wonder why we often consider a probability of more than 0.5 as belonging to Category 1, and less than 0.5 as Category 0. This is simply a convention for binary classification and can be adjusted based on the problem at hand.

## Implementing Logistic Regression with Gradient Descent
As we already know, the Gradient Descent technique is highly efficient at finding the global minimum of a function. Logistic regression is used to calculate the values of parameters that result in the smallest cost. Here's a simple Python implementation of a Logistic Regression model:

```python
def logistic_regression(X, y, num_iterations, learning_rate):
    # Add intercept to X
    intercept = np.ones((X.shape[0], 1))
    X = np.concatenate((intercept, X), axis=1)

    # Weights initialization
    theta = np.zeros(X.shape[1])

    for i in range(num_iterations):
        z = np.dot(X, theta)
        h = sigmoid(z)
        gradient = np.dot(X.T, (h - y)) / y.size
        theta -= learning_rate * gradient

        z = np.dot(X, theta)
        h = sigmoid(z)
        loss = cost_function(h, y)

        if i % 10000 == 0:
            print(f'Loss: {loss}\t')

    return theta
```

In this code:
- The `sigmoid()` function computes the sigmoid of the input value.
- The `cost_function()` computes the cost for given inputs and outputs using the weights.
- The `logistic_regression()` applies Gradient Descent to Logistic Regression to find the optimum weights for minimizing the cost.

This simple function can be a Logistic Regression model for classifying emails as "spam" or "not-spam."

## Applying Logistic Regression with Gradient Descent
Now, we can define the `predict` function, which makes the prediction:

```python
def predict_prob(X, theta):
    # Add intercept to X
    intercept = np.ones((X.shape[0], 1))
    X = np.concatenate((intercept, X), axis=1)
    return sigmoid(np.dot(X, theta))

def predict(X, theta, threshold=0.5):
    return predict_prob(X, theta) >= threshold
```

## Lesson Summary and Practice
That wraps up our lesson on the fundamentals of Logistic Regression and its Python implementation using Gradient Descent. Throughout this lesson, we've highlighted the differences between regression and classification tasks, introduced Logistic Regression as a classification algorithm, and elaborated on the components that define it.

You'll have ample opportunities to refine these skills in our forthcoming practice exercises. Remember, the more you practice, the more fluent you'll become. So, practice away and have fun doing it!


## Sigmoid Function: From Input to Probability

Here's the modified code that includes the `Close` column in the feature selection. This will allow us to compare how including the `Close` feature impacts the model's Mean Squared Error (MSE):

```python
import pandas as pd
from sklearn.preprocessing import StandardScaler
from datasets import load_dataset
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt

# Load dataset
tesla = load_dataset('codesignal/tsla-historic-prices')
tesla_df = pd.DataFrame(tesla['train'])

# Convert 'Date' column to datetime format
tesla_df['Date'] = pd.to_datetime(tesla_df['Date'])

# Adding technical indicators
tesla_df['SMA_5'] = tesla_df['Adj Close'].rolling(window=5).mean()
tesla_df['SMA_10'] = tesla_df['Adj Close'].rolling(window=10).mean()
tesla_df['EMA_5'] = tesla_df['Adj Close'].ewm(span=5, adjust=False).mean()
tesla_df['EMA_10'] = tesla_df['Adj Close'].ewm(span=10, adjust=False).mean()

# Drop NaN values created by moving averages
tesla_df.dropna(inplace=True)

# Features and target selection including the 'Close' column
features = tesla_df[['Close', 'Open', 'High', 'Low', 'Volume', 'SMA_5', 'SMA_10', 'EMA_5', 'EMA_10']].values
target = tesla_df['Adj Close'].values

# Standardizing features
scaler = StandardScaler()
features_scaled = scaler.fit_transform(features)

# Splitting the dataset
X_train, X_test, y_train, y_test = train_test_split(features_scaled, target, test_size=0.25, random_state=42)

# Train the model
model = GradientBoostingRegressor(random_state=42)
model.fit(X_train, y_train)

# Predict and evaluate
predictions = model.predict(X_test)
mse = mean_squared_error(y_test, predictions)
print("Mean Squared Error with 'Close' feature:", mse)

# Visualizing the predictions
plt.figure(figsize=(10, 6))
plt.scatter(range(len(y_test)), y_test, label='Actual', alpha=0.7)
plt.scatter(range(len(y_test)), predictions, label='Predicted', alpha=0.7)
plt.title('Actual vs Predicted Values')
plt.xlabel('Sample Index')
plt.ylabel('Value')
plt.legend()
plt.show()
```

### Changes Made:
- **Included `Close` in Feature Selection**: Added `'Close'` to the features list in the `features` variable.
- **Modified MSE Output**: Updated the print statement to specify that the MSE includes the `Close` feature.

Now, when you run this code, it will standardize the `Close` column along with the other features and use it in training the model. This will help you determine the impact of the `Close` feature on the model's performance by observing the change in the Mean Squared Error (MSE).

## Implementing the Sigmoid Function

🚀 Let's fix that bug so we can get the Gradient Boosting model running smoothly!

### Identifying the Issue:
The bug is in the line where the model is being trained:

```python
model.fit(X_test, y_test)
```

In the `fit` method, the model is supposed to learn from the training data, but here, it's incorrectly using the testing data (`X_test`, `y_test`) instead of the training data (`X_train`, `y_train`).

### Fixing the Issue:
We need to replace `X_test` and `y_test` with `X_train` and `y_train` in the `model.fit()` method.

Here’s the corrected code:

```python
import pandas as pd
from sklearn.preprocessing import StandardScaler
import datasets
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt

# Load dataset
tesla = datasets.load_dataset('codesignal/tsla-historic-prices')
tesla_df = pd.DataFrame(tesla['train'])

# Convert 'Date' to datetime format
tesla_df['Date'] = pd.to_datetime(tesla_df['Date'])

# Calculate SMAs and EMAs
tesla_df['SMA_5'] = tesla_df['Adj Close'].rolling(window=5).mean()
tesla_df['SMA_10'] = tesla_df['Adj Close'].rolling(window=10).mean()
tesla_df['EMA_5'] = tesla_df['Adj Close'].ewm(span=5, adjust=False).mean()
tesla_df['EMA_10'] = tesla_df['Adj Close'].ewm(span=10, adjust=False).mean()

# Drop NaN values
tesla_df.dropna(inplace=True)

# Features and target selection
features = tesla_df[['Open', 'High', 'Low', 'Close', 'Volume', 'SMA_5', 'SMA_10', 'EMA_5', 'EMA_10']].values
target = tesla_df['Adj Close'].values

# Standardizing features
scaler = StandardScaler()
features_scaled = scaler.fit_transform(features)

# Splitting the dataset
X_train, X_test, y_train, y_test = train_test_split(features_scaled, target, test_size=0.25, random_state=42)

# Instantiate and fit the Gradient Boosting Regressor
model = GradientBoostingRegressor(random_state=42)
model.fit(X_train, y_train)  # Use training data here

# Predict and evaluate the model
predictions = model.predict(X_test)
mse = mean_squared_error(y_test, predictions)
print("Mean Squared Error:", mse)

# Visualizing the actual vs predicted values
plt.figure(figsize=(10, 6))
plt.scatter(range(len(y_test)), y_test, label='Actual', alpha=0.7)
plt.scatter(range(len(y_test)), predictions, label='Predicted', alpha=0.7)
plt.title('Actual vs Predicted Values')
plt.xlabel('Sample Index')
plt.ylabel('Value')
plt.legend()
plt.show()
```

### Summary:
- **Bug**: The model was being trained on the testing data instead of the training data.
- **Fix**: Corrected the `model.fit()` method to use `X_train` and `y_train` for training.

Now the code should run correctly and give you the Mean Squared Error (MSE) along with a visualization of actual vs predicted values. 🚀

## Evaluating Spam Filter Accuracy with Logistic Regression

Let's complete the code to standardize the features, train the Gradient Boosting Regressor model, and evaluate the model's performance.

Here's the completed code:

```python
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error
import datasets
import matplotlib.pyplot as plt

# Load dataset
tesla = datasets.load_dataset('codesignal/tsla-historic-prices')
tesla_df = pd.DataFrame(tesla['train'])

# Convert 'Date' column to datetime format
tesla_df['Date'] = pd.to_datetime(tesla_df['Date'])

# Calculate SMA and EMA
tesla_df['SMA_5'] = tesla_df['Adj Close'].rolling(window=5).mean()
tesla_df['SMA_10'] = tesla_df['Adj Close'].rolling(window=10).mean()
tesla_df['EMA_5'] = tesla_df['Adj Close'].ewm(span=5, adjust=False).mean()
tesla_df['EMA_10'] = tesla_df['Adj Close'].ewm(span=10, adjust=False).mean()

# Drop NaN values created by moving averages
tesla_df.dropna(inplace=True)

# Select features and target
features = tesla_df[['Open', 'High', 'Low', 'Close', 'Volume', 'SMA_5', 'SMA_10', 'EMA_5', 'EMA_10']].values
target = tesla_df['Adj Close'].values

# Standardize features
scaler = StandardScaler()
features_scaled = scaler.fit_transform(features)

# Splitting the dataset
X_train, X_test, y_train, y_test = train_test_split(features_scaled, target, test_size=0.25, random_state=42)

# Instantiate and fit the model
model = GradientBoostingRegressor(random_state=42)
model.fit(X_train, y_train)  # Fit the model with training data

# Predict and evaluate error
predictions = model.predict(X_test)
mse = mean_squared_error(y_test, predictions)
print("Mean Squared Error:", mse)

# Plotting predictions vs actual values
plt.figure(figsize=(10, 6))
plt.scatter(range(len(y_test)), y_test, label='Actual', alpha=0.7)
plt.scatter(range(len(y_test)), predictions, label='Predicted', alpha=0.7)
plt.title('Actual vs Predicted Values')
plt.xlabel('Sample Index')
plt.ylabel('Value')
plt.legend()
plt.show()
```

### Explanation of the Completed Code:
1. **Standardize Features**:
   - `features_scaled = scaler.fit_transform(features)` standardizes the features to have a mean of 0 and a standard deviation of 1.

2. **Train the Model**:
   - `model.fit(X_train, y_train)` fits the Gradient Boosting Regressor model using the training data.

3. **Predict and Evaluate**:
   - `predictions = model.predict(X_test)` generates predictions based on the test data.
   - The Mean Squared Error (MSE) is calculated using `mean_squared_error(y_test, predictions)` to evaluate the model's performance.

4. **Visualization**:
   - The scatter plot compares the actual vs. predicted values, providing a visual representation of the model's accuracy.

Now, your model is fully operational and ready to make predictions! 🚀 May the data be with you!

## Adding the Gradient to Logistic 

Thank you, Space Voyager! Let's complete the code and make sure it works flawlessly. I'll guide you through filling in the missing parts.

Here's the complete code:

```python
import pandas as pd
from sklearn.preprocessing import StandardScaler
import datasets
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt

# Load dataset
tesla = datasets.load_dataset('codesignal/tsla-historic-prices')
tesla_df = pd.DataFrame(tesla['train'])

# Convert the 'Date' column to datetime format
tesla_df['Date'] = pd.to_datetime(tesla_df['Date'])

# Calculate technical indicators: Simple Moving Average (SMA), Exponential Moving Average (EMA), and Volatility
tesla_df['SMA_5'] = tesla_df['Adj Close'].rolling(window=5).mean()
tesla_df['EMA_5'] = tesla_df['Adj Close'].ewm(span=5, adjust=False).mean()
tesla_df['Volatility'] = tesla_df['Close'].rolling(window=5).std()

# Drop NaN values created by indicators
tesla_df.dropna(inplace=True)

# Selecting features and target
# Configure all the features you need for training the model (excluding the target 'Adj Close')
features = tesla_df[['Open', 'High', 'Low', 'Close', 'Volume', 'SMA_5', 'EMA_5', 'Volatility']].values

# Configure the target - 'Adj Close' column
target = tesla_df['Adj Close'].values

# Standardize features
scaler = StandardScaler()
features_scaled = scaler.fit_transform(features)

# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(features_scaled, target, test_size=0.25, random_state=42)

# Instantiate the Gradient Boosting Regressor and fit the model to the training data
model = GradientBoostingRegressor(random_state=42)
model.fit(X_train, y_train)

# Predict the values for X_test and calculate the Mean Squared Error (MSE)
predictions = model.predict(X_test)
mse = mean_squared_error(y_test, predictions)
print("Mean Squared Error:", mse)

# Visualizing the actual vs predicted values
plt.figure(figsize=(10, 6))
plt.scatter(range(len(y_test)), y_test, label='Actual', alpha=0.7)
plt.scatter(range(len(y_test)), predictions, label='Predicted', alpha=0.7)
plt.title('Actual vs Predicted Values')
plt.xlabel('Sample Index')
plt.ylabel('Value')
plt.legend()
plt.show()
```

### Explanation of the Completed Parts:
1. **Features Selection**:
   - `features = tesla_df[['Open', 'High', 'Low', 'Close', 'Volume', 'SMA_5', 'EMA_5', 'Volatility']].values`
   - These are the selected features for the model. Notice that we didn't include the target variable ('Adj Close') in the features.

2. **Target Selection**:
   - `target = tesla_df['Adj Close'].values`
   - The target variable is the 'Adj Close' column, which we want to predict.

3. **Standardizing Features**:
   - `features_scaled = scaler.fit_transform(features)`
   - This standardizes the features to ensure that they have a mean of 0 and a standard deviation of 1.

4. **Model Training**:
   - `model = GradientBoostingRegressor(random_state=42)`
   - `model.fit(X_train, y_train)`
   - We train the Gradient Boosting Regressor model using the training data.

5. **Prediction and Evaluation**:
   - `predictions = model.predict(X_test)`
   - `mse = mean_squared_error(y_test, predictions)`
   - We predict the test data and calculate the Mean Squared Error to evaluate model performance.

Now you're ready to run the code and get insights from the model's predictions. 🚀 May the data be with you!

Let's complete the full Gradient Boosting Model training workflow using Tesla stock data. I'll guide you through each step with the necessary code.

Here's the full solution:

```python
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
import numpy as np
import datasets

# Load dataset
tesla = datasets.load_dataset('codesignal/tsla-historic-prices')
tesla_df = pd.DataFrame(tesla['train'])

# Convert 'Date' column to datetime format
tesla_df['Date'] = pd.to_datetime(tesla_df['Date'])

# Add technical indicators
tesla_df['Price_Diff'] = tesla_df['Adj Close'] - tesla_df['Adj Close'].shift(1)
tesla_df['Volatility'] = tesla_df['Adj Close'].rolling(window=5).std()
tesla_df['Momentum'] = tesla_df['Adj Close'] - tesla_df['Adj Close'].shift(5)
tesla_df['Log_Price'] = np.log(tesla_df['Adj Close'])

# Drop rows with NaN values
tesla_df.dropna(inplace=True)

# Select features and target variable
features = tesla_df[['Price_Diff', 'Volatility', 'Momentum', 'Log_Price']].values
target = tesla_df['Adj Close'].values

# Standardize the feature values
scaler = StandardScaler()
features_scaled = scaler.fit_transform(features)

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(features_scaled, target, test_size=0.25, random_state=42)

# Instantiate the Gradient Boosting Regressor and fit it to the training data
model = GradientBoostingRegressor(random_state=42)
model.fit(X_train, y_train)

# Make predictions and calculate the Mean Squared Error (MSE)
predictions = model.predict(X_test)
mse = mean_squared_error(y_test, predictions)
print("Mean Squared Error:", mse)

# Visualize actual vs predicted values using scatter plots
plt.figure(figsize=(10, 6))
plt.scatter(range(len(y_test)), y_test, label='Actual', alpha=0.7)
plt.scatter(range(len(y_test)), predictions, label='Predicted', alpha=0.7)
plt.title('Actual vs Predicted Values')
plt.xlabel('Sample Index')
plt.ylabel('Value')
plt.legend()
plt.show()
```

### Explanation of Each Step:

1. **Convert 'Date' to Datetime Format**:
   - Ensures that the 'Date' column is properly formatted for any potential time-series analysis.

2. **Add Technical Indicators**:
   - `Price_Diff`: Measures the difference in the adjusted close price from the previous day.
   - `Volatility`: Calculates the standard deviation of the adjusted close price over the last 5 days.
   - `Momentum`: Measures the difference in the adjusted close price compared to 5 days ago.
   - `Log_Price`: Takes the logarithm of the adjusted close price, useful for stabilizing variance.

3. **Drop Rows with NaN Values**:
   - Ensures that there are no missing values in the data before training the model.

4. **Select Features and Target Variable**:
   - The selected features are the ones we added above, and the target is the 'Adj Close' price.

5. **Standardize Feature Values**:
   - Standardizes the feature values to have a mean of 0 and a standard deviation of 1, which is important for the gradient boosting algorithm.

6. **Split the Dataset**:
   - Splits the data into training and test sets, where 25% of the data is used for testing.

7. **Train the Gradient Boosting Model**:
   - The model is trained using the training data.

8. **Make Predictions and Evaluate**:
   - Predictions are made on the test set, and the Mean Squared Error (MSE) is calculated to evaluate the model's performance.

9. **Visualize Actual vs Predicted Values**:
   - A scatter plot is used to visually compare the actual values against the predicted values.

You can now run the code and analyze the results! 🚀