# Lesson 4 Feature Importance in Gradient Boosting Models


Hello and welcome! Today's lesson focuses on **Feature Importance in Gradient Boosting Models**. We will explore how to determine which features in our dataset are most influential in predicting Tesla ($TSLA) stock prices. By understanding the importance of features, we can refine our models and make more informed trading decisions.

## 📝 Revision of Previous Steps

Before diving into feature importance, let's quickly revise the previous steps to ensure we have a solid foundation.

### Data Preparation and Feature Engineering:

```python
import pandas as pd
import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load TSLA dataset
tesla = datasets.load_dataset('codesignal/tsla-historic-prices')
tesla_df = pd.DataFrame(tesla['train'])

# Convert Date column to datetime type
tesla_df['Date'] = pd.to_datetime(tesla_df['Date'])

# Feature Engineering: adding technical indicators as features
tesla_df['SMA_5'] = tesla_df['Adj Close'].rolling(window=5).mean()
tesla_df['SMA_10'] = tesla_df['Adj Close'].rolling(window=10).mean()
tesla_df['EMA_5'] = tesla_df['Adj Close'].ewm(span=5, adjust=False).mean()
tesla_df['EMA_10'] = tesla_df['Adj Close'].ewm(span=10, adjust=False).mean()

# Drop NaN values created by moving averages
tesla_df.dropna(inplace=True)

# Select features and target
features = tesla_df[['Open', 'High', 'Low', 'Close', 'Volume', 'SMA_5', 'SMA_10', 'EMA_5', 'EMA_10']].values
target = tesla_df['Adj Close'].values

# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.25, random_state=42)

# Standardizing features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
```

### Model Training:

```python
from sklearn.ensemble import GradientBoostingRegressor

# Instantiate and fit the model
model = GradientBoostingRegressor(random_state=42)
model.fit(X_train, y_train)
```

## 🌟 Understanding Feature Importance

### What is Feature Importance?

Feature importance refers to techniques that assign scores to input features based on their importance in predicting the target variable. In the context of a Gradient Boosting model, feature importance indicates how valuable each feature is in constructing the boosted decision trees.

### Why is Feature Importance Useful?

Understanding feature importance helps:

- Identify and select the most influential features, potentially simplifying the model.
- Gain insights into the factors driving your predictions.
- Improve model interpretability and trustworthiness.

### 🧠 Computing Feature Importance in Gradient Boosting

Once the Gradient Boosting model is trained, we can easily access the feature importances. Let's walk through the steps:

```python
# Compute feature importance
feature_importance = model.feature_importances_

# Create a DataFrame for better visualization of feature names alongside their importance
feature_names = ['Open', 'High', 'Low', 'Close', 'Volume', 'SMA_5', 'SMA_10', 'EMA_5', 'EMA_10']
feature_importance_df = pd.DataFrame({'Feature': feature_names, 'Importance': feature_importance})

# Sort features by importance
feature_importance_df = feature_importance_df.sort_values(by='Importance', ascending=False)

# Print feature importances with names
print("Feature importance:\n", feature_importance_df)
```

**Output:**

```sh
Feature importance:
   Feature    Importance
3   Close  9.447889e-01
1    High  3.668675e-02
0    Open  9.142875e-03
2     Low  8.464037e-03
6  SMA_10  4.800413e-04
7   EMA_5  2.992652e-04
8  EMA_10  1.326235e-04
5   SMA_5  5.195267e-06
4  Volume  3.363300e-07
```

Here's what each step is doing:

1. `model.feature_importances_`: Extracts the feature importance scores from the trained Gradient Boosting model.
2. `feature_names = [...]`: Defines a list of feature names for better readability.
3. `feature_importance_df = pd.DataFrame(...)`: Creates a DataFrame that links feature names with their respective importance scores.
4. `feature_importance_df.sort_values(...)`: Sorts the DataFrame by feature importance in descending order for better interpretation.

## 📊 Visualizing Feature Importance

Visualizing the importance of features helps interpret the results more effectively. We'll use Matplotlib to create a bar chart:

```python
import matplotlib.pyplot as plt

feature_importance_df = feature_importance_df.iloc[::-1]

# Plotting feature importance
plt.figure(figsize=(10,6))
plt.barh(feature_importance_df['Feature'], feature_importance_df['Importance'])
plt.title('Feature Importances')
plt.xlabel('Importance')
plt.ylabel('Feature')
plt.show()
```

The plot of the above code is a bar chart visually indicating the significance of each feature, making it easier to distinguish the most influential features. This visualization is crucial for understanding how different features contribute to the model's predictions.

## 🧐 Interpreting the Results

By examining the feature importance values and plot, you can determine which features have the most impact on the model's predictions. For instance, if `Adj Close` heavily relies on `SMA_10` and `Close`, we know they are critical factors in the stock's movement.

### 🛠️ Insights and Next Steps:

- **Focus on Key Features**: Emphasize the most important features in further analysis and model tuning.
- **Feature Selection**: Consider removing less important features to simplify the model.
- **Model Interpretation**: Use feature importance insights to explain model predictions to stakeholders.

## 🏁 Lesson Summary

In this lesson, you learned about the concept of feature importance in Gradient Boosting models and its practical application to predict Tesla ($TSLA) stock prices. You computed feature importances, visualized them using a bar chart, and interpreted the results to gain actionable insights.

Understanding which features influence your model's predictions is crucial for refining your models and making informed trading decisions. Up next, practice these concepts to solidify your understanding and enhance your skillset in machine learning for financial trading.

**Great job!**

--- 

This markdown format organizes the content into clear sections, with proper code formatting and an easy-to-follow structure.

## Remove Volume Feature for Better Analysis
To exclude the `Volume` feature from both the `features` array and `feature_names` array in your code for feature importance analysis, you can modify the code as follows:

```python
import warnings
warnings.filterwarnings("ignore")

import pandas as pd
import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import GradientBoostingRegressor
import matplotlib.pyplot as plt

# Logging level setup
import logging
logging.getLogger('datasets').setLevel(logging.ERROR)

# Load TSLA dataset
tesla = datasets.load_dataset('codesignal/tsla-historic-prices')
tesla_df = pd.DataFrame(tesla['train'])

# Convert Date column to datetime type
tesla_df['Date'] = pd.to_datetime(tesla_df['Date'])

# Feature Engineering: adding technical indicators as features
tesla_df['SMA_5'] = tesla_df['Adj Close'].rolling(window=5).mean()
tesla_df['SMA_10'] = tesla_df['Adj Close'].rolling(window=10).mean()
tesla_df['EMA_5'] = tesla_df['Adj Close'].ewm(span=5, adjust=False).mean()
tesla_df['EMA_10'] = tesla_df['Adj Close'].ewm(span=10, adjust=False).mean()

# Drop NaN values created by moving averages
tesla_df.dropna(inplace=True)

# Select features and target
features = tesla_df[['Open', 'High', 'Low', 'Close', 'SMA_5', 'SMA_10', 'EMA_5', 'EMA_10']].values  # Exclude 'Volume'
target = tesla_df['Adj Close'].values

# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.25, random_state=42)

# Standardizing features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Instantiate and fit the model
model = GradientBoostingRegressor(random_state=42)
model.fit(X_train, y_train)

# Compute feature importance
feature_importance = model.feature_importances_

# Create a DataFrame for better visualization of feature names alongside their importance
feature_names = ['Open', 'High', 'Low', 'Close', 'SMA_5', 'SMA_10', 'EMA_5', 'EMA_10']  # Adjusted feature names
feature_importance_df = pd.DataFrame({'Feature': feature_names, 'Importance': feature_importance})

# Sort features by importance
feature_importance_df = feature_importance_df.sort_values(by='Importance', ascending=False)

# Print feature importances with names
print("Feature Importance:\n", feature_importance_df)

# Plotting feature importance
plt.figure(figsize=(10, 6))
plt.bar(range(len(feature_importance)), feature_importance)
plt.title('Feature Importances')
plt.xlabel('Feature Index')
plt.ylabel('Importance')
plt.xticks(range(len(feature_importance)), feature_names, rotation=45)
plt.show()
```

In this modified code:

- The `Volume` feature is removed from the `features` array by excluding it from the selection.
- The `feature_names` array is adjusted to exclude `'Volume'`.
- The rest of the code remains unchanged, ensuring that the model is trained and feature importance is computed based on the updated feature set. 

This adjustment allows you to analyze the importance of features other than `Volume` in predicting stock prices.

## Feature Importance Calculation in Gradient Boosting Models

The code is missing a critical step before fitting the model: standardizing the features using the `StandardScaler`. Without this step, the model might not perform optimally, especially if the features have different scales. Here's the corrected version of the code with the missing step included:

```python
import pandas as pd
from datasets import load_dataset
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import GradientBoostingRegressor
import matplotlib.pyplot as plt

# Load TSLA dataset
tesla = load_dataset('codesignal/tsla-historic-prices')
tesla_df = pd.DataFrame(tesla['train'])

# Convert Date column to datetime type
tesla_df['Date'] = pd.to_datetime(tesla_df['Date'])

# Feature Engineering: adding technical indicators as features
tesla_df['SMA_5'] = tesla_df['Adj Close'].rolling(window=5).mean()
tesla_df['SMA_10'] = tesla_df['Adj Close'].rolling(window=10).mean()
tesla_df['EMA_5'] = tesla_df['Adj Close'].ewm(span=5, adjust=False).mean()
tesla_df['EMA_10'] = tesla_df['Adj Close'].ewm(span=10, adjust=False).mean()

# Drop NaN values created by moving averages
tesla_df.dropna(inplace=True)

# Select features and target
features = tesla_df[['Open', 'High', 'Low', 'Close', 'Volume', 'SMA_5', 'SMA_10', 'EMA_5', 'EMA_10']].values
target = tesla_df['Adj Close'].values

# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.25, random_state=42)

# **Missing Step**: Standardizing features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Instantiate and fit the model
model = GradientBoostingRegressor(random_state=42)
model.fit(X_train, y_train)

# Compute feature importance
feature_importance = model.feature_importances_

# Create a DataFrame for better visualization of feature names alongside their importance
feature_names = ['Open', 'High', 'Low', 'Close', 'Volume', 'SMA_5', 'SMA_10', 'EMA_5', 'EMA_10']
feature_importance_df = pd.DataFrame({'Feature': feature_names, 'Importance': feature_importance})

# Sort features by importance
feature_importance_df = feature_importance_df.sort_values(by='Importance', ascending=False)

# Print feature importances with names
print("Feature importance:\n", feature_importance_df)

# Plotting feature importance
plt.figure(figsize=(10, 6))
plt.bar(range(len(feature_importance)), feature_importance)
plt.title('Feature Importances')
plt.xlabel('Feature Index')
plt.ylabel('Importance')
plt.xticks(range(len(feature_names)), feature_names, rotation='vertical')
plt.show()
```

### Summary of the Changes:
- **Added Standardization Step:** The missing step of standardizing the features using `StandardScaler` is added before fitting the model. This ensures that all features are on a comparable scale, which is important for gradient boosting models.

### Conclusion:
With the added standardization, the model should now perform better, as the features are scaled appropriately, reducing bias towards features with larger ranges.

## Calculate and Visualize Feature Importance in Gradient Boosting Model

Sure, let's fill in the missing parts of the code to complete the task of calculating and visualizing feature importance in a Gradient Boosting model:

```python
import pandas as pd
import datasets
import warnings
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import GradientBoostingRegressor
import matplotlib.pyplot as plt

# Suppressing warnings
warnings.filterwarnings("ignore", category=UserWarning, module='pandas')

# Load TSLA dataset
tesla = datasets.load_dataset('codesignal/tsla-historic-prices')
tesla_df = pd.DataFrame(tesla['train'])

# Convert Date column to datetime type
tesla_df['Date'] = pd.to_datetime(tesla_df['Date'])

# Feature Engineering: adding Bollinger Bands as features
tesla_df['Rolling_Mean'] = tesla_df['Adj Close'].rolling(window=20).mean()
tesla_df['Bollinger_High'] = tesla_df['Rolling_Mean'] + 2 * tesla_df['Adj Close'].rolling(window=20).std()
tesla_df['Bollinger_Low'] = tesla_df['Rolling_Mean'] - 2 * tesla_df['Adj Close'].rolling(window=20).std()

# Drop NaN values created by rolling calculations
tesla_df.dropna(inplace=True)

# Select features and target
features = tesla_df[['Open', 'High', 'Low', 'Close', 'Bollinger_High', 'Bollinger_Low']].values
target = tesla_df['Adj Close'].values

# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.25, random_state=42)

# Standardizing features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# TODO: Instantiate and fit the Gradient Boosting model
model = GradientBoostingRegressor(random_state=42)
model.fit(X_train, y_train)

# TODO: Compute feature importance
feature_importance = model.feature_importances_

# Create a DataFrame for better visualization of feature names alongside their importance
feature_names = ['Open', 'High', 'Low', 'Close', 'Bollinger_High', 'Bollinger_Low']
feature_importance_df = pd.DataFrame({'Feature': feature_names, 'Importance': feature_importance})

# TODO: Sort features by importance
feature_importance_df = feature_importance_df.sort_values(by='Importance', ascending=False)

# Print feature importances with names
print("Feature importance:\n", feature_importance_df)

# Plotting feature importance
plt.figure(figsize=(10, 6))
plt.bar(range(len(feature_importance)), feature_importance)
plt.title('Feature Importances')
plt.xlabel('Feature Index')
plt.ylabel('Importance')
plt.xticks(range(len(feature_names)), feature_names, rotation=45)
plt.show()
```

### Explanation of the Filled Parts:
1. **Instantiate and fit the Gradient Boosting model:**
   ```python
   model = GradientBoostingRegressor(random_state=42)
   model.fit(X_train, y_train)
   ```
   - Here, we instantiate the `GradientBoostingRegressor` model with a random state for reproducibility.
   - We then fit the model to the training data (`X_train`, `y_train`).

2. **Compute feature importance:**
   ```python
   feature_importance = model.feature_importances_
   ```
   - After fitting the model, we extract the feature importances using the `feature_importances_` attribute of the trained model.

3. **Sort features by importance:**
   ```python
   feature_importance_df = feature_importance_df.sort_values(by='Importance', ascending=False)
   ```
   - To better visualize the importance of each feature, we sort the DataFrame `feature_importance_df` by the 'Importance' column in descending order.

### Conclusion:
This code now calculates and visualizes the importance of features in predicting stock prices using a Gradient Boosting model, providing insights into which features contribute most to the model's predictions.

## Computing and Visualizing Feature Importance

Sure, let's fill in the missing parts of the code to complete the task of calculating and visualizing feature importance in a Gradient Boosting model:

```python
import pandas as pd
import datasets
import warnings
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import GradientBoostingRegressor
import matplotlib.pyplot as plt

# Suppressing warnings
warnings.filterwarnings("ignore", category=UserWarning, module='pandas')

# Load TSLA dataset
tesla = datasets.load_dataset('codesignal/tsla-historic-prices')
tesla_df = pd.DataFrame(tesla['train'])

# Convert Date column to datetime type
tesla_df['Date'] = pd.to_datetime(tesla_df['Date'])

# Feature Engineering: adding Bollinger Bands as features
tesla_df['Rolling_Mean'] = tesla_df['Adj Close'].rolling(window=20).mean()
tesla_df['Bollinger_High'] = tesla_df['Rolling_Mean'] + 2 * tesla_df['Adj Close'].rolling(window=20).std()
tesla_df['Bollinger_Low'] = tesla_df['Rolling_Mean'] - 2 * tesla_df['Adj Close'].rolling(window=20).std()

# Drop NaN values created by rolling calculations
tesla_df.dropna(inplace=True)

# Select features and target
features = tesla_df[['Open', 'High', 'Low', 'Close', 'Bollinger_High', 'Bollinger_Low']].values
target = tesla_df['Adj Close'].values

# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.25, random_state=42)

# Standardizing features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# TODO: Instantiate and fit the Gradient Boosting model
model = GradientBoostingRegressor(random_state=42)
model.fit(X_train, y_train)

# TODO: Compute feature importance
feature_importance = model.feature_importances_

# Create a DataFrame for better visualization of feature names alongside their importance
feature_names = ['Open', 'High', 'Low', 'Close', 'Bollinger_High', 'Bollinger_Low']
feature_importance_df = pd.DataFrame({'Feature': feature_names, 'Importance': feature_importance})

# TODO: Sort features by importance
feature_importance_df = feature_importance_df.sort_values(by='Importance', ascending=False)

# Print feature importances with names
print("Feature importance:\n", feature_importance_df)

# Plotting feature importance
plt.figure(figsize=(10, 6))
plt.bar(range(len(feature_importance)), feature_importance)
plt.title('Feature Importances')
plt.xlabel('Feature Index')
plt.ylabel('Importance')
plt.xticks(range(len(feature_names)), feature_names, rotation=45)
plt.show()
```

### Explanation of the Filled Parts:
1. **Instantiate and fit the Gradient Boosting model:**
   ```python
   model = GradientBoostingRegressor(random_state=42)
   model.fit(X_train, y_train)
   ```
   - Here, we instantiate the `GradientBoostingRegressor` model with a random state for reproducibility.
   - We then fit the model to the training data (`X_train`, `y_train`).

2. **Compute feature importance:**
   ```python
   feature_importance = model.feature_importances_
   ```
   - After fitting the model, we extract the feature importances using the `feature_importances_` attribute of the trained model.

3. **Sort features by importance:**
   ```python
   feature_importance_df = feature_importance_df.sort_values(by='Importance', ascending=False)
   ```
   - To better visualize the importance of each feature, we sort the DataFrame `feature_importance_df` by the 'Importance' column in descending order.

### Conclusion:
This code now calculates and visualizes the importance of features in predicting stock prices using a Gradient Boosting model, providing insights into which features contribute most to the model's predictions.

## Compute and Visualize Feature Importance in Gradient Boosting Model

Let's go through the steps to compute and visualize feature importance using the Tesla dataset and a Gradient Boosting model. I'll fill in the missing parts of the code according to the provided TODOs.

```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import GradientBoostingRegressor
import matplotlib.pyplot as plt
import datasets

# Load TSLA dataset
tesla = datasets.load_dataset('codesignal/tsla-historic-prices')
tesla_df = pd.DataFrame(tesla['train'])

# TODO: Convert the Date column to datetime type
tesla_df['Date'] = pd.to_datetime(tesla_df['Date'])

# TODO: Add new technical indicators to the DataFrame (Momentum, Daily Return, High-Low Difference)
# Momentum_5: The change in the adjusted close price over the past 5 days
tesla_df['Momentum_5'] = tesla_df['Adj Close'].diff(5)

# Daily_Return: The daily percentage change in the adjusted close price
tesla_df['Daily_Return'] = tesla_df['Adj Close'].pct_change()

# High_Low_Diff: The difference between the highest and lowest prices of the day
tesla_df['High_Low_Diff'] = tesla_df['High'] - tesla_df['Low']

# TODO: Drop NaN values generated by the indicators
tesla_df.dropna(inplace=True)

# TODO: Select features and target for model training
# The target will be the 'Adj Close' column
features = tesla_df[['Open', 'High', 'Low', 'Close', 'Momentum_5', 'Daily_Return', 'High_Low_Diff']].values
target = tesla_df['Adj Close'].values

# TODO: Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.25, random_state=42)

# TODO: Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# TODO: Train a Gradient Boosting Regressor model
model = GradientBoostingRegressor(random_state=42)
model.fit(X_train, y_train)

# TODO: Compute and visualize feature importance
feature_importance = model.feature_importances_

# Create a DataFrame for better visualization of feature names alongside their importance
feature_names = ['Open', 'High', 'Low', 'Close', 'Momentum_5', 'Daily_Return', 'High_Low_Diff']
feature_importance_df = pd.DataFrame({'Feature': feature_names, 'Importance': feature_importance})

# Sort features by importance
feature_importance_df = feature_importance_df.sort_values(by='Importance', ascending=False)

# Print feature importances with names
print("Feature importance:\n", feature_importance_df)

# Plotting feature importance
plt.figure(figsize=(10, 6))
plt.bar(range(len(feature_importance)), feature_importance)
plt.title('Feature Importances')
plt.xlabel('Feature Index')
plt.ylabel('Importance')
plt.xticks(range(len(feature_names)), feature_names, rotation=45)
plt.show()
```

### Explanation of the Added Steps:
1. **Date Conversion:**
   ```python
   tesla_df['Date'] = pd.to_datetime(tesla_df['Date'])
   ```
   - Converts the 'Date' column to a `datetime` type for easier manipulation.

2. **Technical Indicators:**
   - **Momentum_5:** Measures the change in the adjusted close price over the past 5 days.
     ```python
     tesla_df['Momentum_5'] = tesla_df['Adj Close'].diff(5)
     ```
   - **Daily_Return:** Calculates the daily percentage change in the adjusted close price.
     ```python
     tesla_df['Daily_Return'] = tesla_df['Adj Close'].pct_change()
     ```
   - **High_Low_Diff:** Computes the difference between the highest and lowest prices of the day.
     ```python
     tesla_df['High_Low_Diff'] = tesla_df['High'] - tesla_df['Low']
     ```

3. **Dropping NaNs:**
   ```python
   tesla_df.dropna(inplace=True)
   ```
   - Drops rows with NaN values generated by the new indicators.

4. **Model Training and Feature Importance:**
   - After standardizing the features, we train the `GradientBoostingRegressor` model and compute the feature importances.

### Conclusion:
This code calculates and visualizes the importance of various technical indicators and other features in predicting Tesla stock prices. The feature importances help us understand which features have the most impact on the model's predictions, aiding in better decision-making and feature engineering in future models.