<a href="https://colab.research.google.com/github/williamkilpatrick/Reinforcement-Learning-for-Financial-Trading/blob/main/Transformer_PPO_Trading_Agent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Notebook Summary: Reinforcement Learning for Financial Trading

This notebook demonstrates the implementation of a reinforcement learning-based trading strategy using a Transformer model and the Gymnasium framework (maintained fork of the OpenAI Gym framework).

**At a basic level, the notebook does the following:**

1. **Sets up the environment:** Installs necessary libraries and checks for GPU availability.
2. **Loads and preprocesses financial data:** Reads historical market data from a CSV file, scales numerical features, and engineers features for the model. The data is then split into training and testing sets.
3. **Defines a Transformer model:** Creates a custom Transformer model architecture specifically designed for financial trading. This model will learn to predict buy/sell signals based on market data patterns.
4. **Defines a trading environment:** Sets up a simulated trading environment using Gymnasium, allowing the agent to interact with the market and receive rewards based on its trading decisions.
5. **Trains a PPO agent:** Trains a Proximal Policy Optimization (PPO) agent within the trading environment. The agent learns by interacting with the environment and adjusting its actions to maximize profits.
6. **Tests the trained agent:** Loads the trained agent and tests its performance in the simulated trading environment, evaluating its ability to make profitable trades.
7. **Displays the final results:** Shows the agent's final balance and the total number of trades executed during the test, providing an overall assessment of its trading strategy.

**In essence, this notebook aims to develop and evaluate a reinforcement learning-based trading agent that can learn to make profitable trading decisions based on historical market data.** The Transformer model is used to capture complex patterns in the data, and the PPO algorithm guides the agent's learning process. The simulated trading environment provides a safe and controlled setting for the agent to experiment and improve its strategy.

## Install Packages

**Summary:** This cell sets up the environment by installing the required dependencies (`stable-baselines3` and `shimmy`) for reinforcement learning and compatibility. This ensures that the rest of the code in the notebook can run without issues related to missing libraries.





In [1]:
!pip install stable-baselines3 shimmy>=2.0

## Importing Necessary Libraries

This cell imports essential libraries for data science, machine learning, and reinforcement learning tasks in the context of financial analysis.

**Steps:**

1. **Import Core Libraries:**
   - `import torch`: Imports the PyTorch library, a powerful framework for deep learning and numerical computation.
   - `import torch.nn as nn`: Imports the `nn` module from PyTorch, which provides building blocks for neural networks.
   - `import numpy as np`: Imports NumPy, a fundamental library for numerical operations and array handling.
   - `import pandas as pd`: Imports pandas, a library for data manipulation and analysis using DataFrames.
   - `from sklearn.preprocessing import StandardScaler`: Imports `StandardScaler` from scikit-learn, used for feature scaling.
   - `from torch.utils.data import Dataset, DataLoader`: Imports `Dataset` and `DataLoader` from PyTorch, used for creating and handling datasets.
   - `import gymnasium as gym`: Imports the Gymnasium library, a maintained fork of the OpenAI Gym framework. This library provides tools for developing and comparing reinforcement learning algorithms.
   - `from gymnasium import spaces`: Imports the `spaces` module from Gymnasium. This module is used to define the action and observation spaces for reinforcement learning environments.
   - `from stable_baselines3 import PPO`: Imports the PPO (Proximal Policy Optimization) algorithm from the `stable_baselines3` library, a popular reinforcement learning algorithm.
   - `import shimmy`: Imports the `shimmy` library for compatibility between different library versions.

**Summary:** This cell imports a collection of libraries that provide the necessary tools and functionalities for tasks such as data handling, preprocessing, model building, training, and reinforcement learning.

In [1]:
import torch
import torch.nn as nn
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from torch.utils.data import Dataset, DataLoader
import gymnasium as gym
from gymnasium import spaces
from stable_baselines3 import PPO
import shimmy

## Checking for GPU Availability

This cell checks if a GPU is available for computation and prints the device being used.

**Steps:**

1. **Device Detection:**
   - `device = torch.device("cuda" if torch.cuda.is_available() else "cpu")`: This line uses the `torch.cuda.is_available()` function to determine if a CUDA-enabled GPU is present.
   - If a GPU is available, it assigns the device to "cuda"; otherwise, it defaults to "cpu".

2. **Output:**
   - `print(f"Using device: {device}")`: Prints a message indicating whether the code will be running on a GPU ("cuda") or CPU ("cpu").

**Summary:** This cell checks for the availability of a GPU and sets the device accordingly, ensuring that the code utilizes the appropriate hardware for computation. This can significantly speed up training and execution, especially for computationally intensive tasks like deep learning.

In [2]:
# Check for GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

Using device: cuda


## Data Loading and Preprocessing

This cell prepares the financial data for the machine learning model.

**Steps:**

1. **Import Libraries:** Imports `pandas` for data manipulation.

2. **Load Data:** Reads a CSV file containing historical market data into a pandas DataFrame.

3. **Data Preprocessing with `preprocess_data` Function:**
    - **Scaling:** Uses `StandardScaler` to scale numerical features (Open, High, Low, Close, Volume, etc.) for improved model performance.
    - **Feature Engineering:** Creates input (X) and output (y) data:
        - **Input (X):** Sequences of historical data points (length defined by `seq_length`, default 100) representing past market behavior.
        - **Output (y):** Binary labels (0 or 1) indicating Buy (1) or Sell (0) based on future price movement.
    - **Output:** Returns prepared X and y as NumPy arrays.

4. **Prepare Dataset:**
    - Calls `preprocess_data` to create X and y.
    - Splits the data into training (80%) and testing (20%) sets.

**Summary:** This cell loads, cleans, scales, engineers features, and splits the data for training and testing the model.

In [3]:
# Load Data
df = pd.read_csv('/content/ES - 1 min.csv')
df.fillna(0, inplace=True)

def preprocess_data(df, seq_length=200):
    scaler = StandardScaler()
    features = ['Open', 'High', 'Low', 'Close', 'Volume', 'Range', '%K(Fast)(14)', '%D(Fast)(14,3)',
                'VWMA(C,20)', '%K(Fast)(9)', '%D(Fast)(9,3)', '%K(Fast)(40)', '%D(Fast)(40,4)',
                '%K(Full)(60,10)', '%D(Full)(60,10,10)', 'ATR(14)']

    df[features] = scaler.fit_transform(df[features])

    X, y = [], []
    for i in range(len(df) - seq_length - 1):
        X.append(df[features].iloc[i:i+seq_length].values)
        future_close = df['Close'].iloc[i + seq_length]
        current_close = df['Close'].iloc[i + seq_length - 1]
        y.append(1 if future_close > current_close else 0)  # 1 = Buy, 0 = Sell

    return np.array(X), np.array(y)

# Prepare dataset
X, y = preprocess_data(df)
train_size = int(0.8 * len(X))
X_train, X_test = X[:train_size], X[train_size:]
y_train, y_test = y[:train_size], y[train_size:]

## Creating and Loading Data with PyTorch

This cell defines a custom dataset class `FuturesDataset` and creates data loaders for training and testing using PyTorch.

**Steps:**

1. **Define `FuturesDataset`:**
   - This class inherits from `torch.utils.data.Dataset` and is designed to handle the financial data.
   - It takes the input features (`X`) and target labels (`y`) as NumPy arrays and converts them into PyTorch tensors.
   - It implements the `__len__` and `__getitem__` methods to enable iterating over the dataset and accessing individual data points.

2. **Create Data Loaders:**
   - `train_loader`: This data loader is created for the training dataset using `DataLoader`.
     - It shuffles the data (`shuffle=True`) to randomize the order of samples during training.
     - It uses multiple workers (`num_workers=4`) to load data in parallel, speeding up the process.
     - It sets `pin_memory=True` to optimize memory transfer between the CPU and GPU.
   - `test_loader`: This data loader is created for the testing dataset.
     - It does not shuffle the data (`shuffle=False`) to maintain a consistent order for evaluation.
     - It also uses multiple workers and pins memory for efficiency.

**Summary:** This cell prepares the data for use with PyTorch by creating a custom dataset class and data loaders. This allows for efficient batching and loading of data during training and testing of the model.

In [4]:
class FuturesDataset(Dataset):
    def __init__(self, X, y):
        self.X = torch.tensor(X, dtype=torch.float32)
        self.y = torch.tensor(y, dtype=torch.long)

    def __len__(self):
        return len(self.X)

    def __getitem__(self, idx):
        return self.X[idx], self.y[idx]

## Creating Data Loaders for Efficient Data Handling

This code snippet creates two data loaders, `train_loader` and `test_loader`, using PyTorch's `DataLoader` class. These data loaders are essential for efficiently feeding data to the machine learning model during training and evaluation.

**Purpose:**

The primary purpose of this code is to:

1. **Prepare data for training and testing:** It takes the training data (`X_train`, `y_train`) and testing data (`X_test`, `y_test`) and packages them into iterable objects called data loaders.
2. **Enable batch processing:** Data loaders divide the data into smaller batches, allowing the model to process data in chunks, which is more memory-efficient and often faster.
3. **Optimize data loading:** It uses multiple worker processes (`num_workers=4`) to load data in parallel, speeding up the data loading process, especially for large datasets.
4. **Improve GPU utilization:** It sets `pin_memory=True` to optimize data transfer between the CPU and GPU, potentially reducing data transfer overhead.

**Data Loaders:**

- **`train_loader`:**
    - Used for training the model.
    - Shuffles the data (`shuffle=True`) to randomize the order of samples in each epoch, improving model generalization.
- **`test_loader`:**
    - Used for evaluating the model's performance.
    - Does not shuffle the data (`shuffle=False`) to ensure consistent evaluation across different runs.

**Parameters:**

- `FuturesDataset(X_train, y_train)` and `FuturesDataset(X_test, y_test)`: Instances of a custom dataset class (presumably defined earlier in the code) that encapsulate the training and testing data.
- `batch_size=64`: Specifies the number of samples in each batch.
- `shuffle=True` (for `train_loader`) and `shuffle=False` (for `test_loader`): Controls whether the data is shuffled within each epoch.
- `num_workers=4`: Sets the number of worker processes to use for data loading.
- `pin_memory=True`: Enables memory pinning, which can speed up data transfer to the GPU.

**Summary:**

This code snippet creates data loaders for training and testing, enabling efficient and optimized data handling during the machine learning process. It utilizes batching, shuffling, parallel data loading, and memory pinning to streamline data processing and potentially improve model training speed and performance.

In [5]:
# Use multiple workers for faster data loading
train_loader = DataLoader(FuturesDataset(X_train, y_train), batch_size=64, shuffle=True, num_workers=4, pin_memory=True)
test_loader = DataLoader(FuturesDataset(X_test, y_test), batch_size=64, shuffle=False, num_workers=4, pin_memory=True)

## Defining the Trading Transformer Model

This cell defines the architecture of a Transformer model specifically designed for financial trading.

**Steps:**

1. **Class Definition:**
   - `TradingTransformer(nn.Module)`: A custom class inheriting from PyTorch's `nn.Module` is defined to represent the trading model.

2. **Initialization (`__init__`)**:
    - `input_dim`: Specifies the dimensionality of the input data.
    - `num_heads`: Sets the number of attention heads in the Transformer encoder layers (default: 4).
    - `num_layers`: Determines the number of stacked encoder layers (default: 2).
    - `hidden_dim`: Sets the size of the hidden layer in the feed-forward network (default: 128).
    - **Layers:**
        - `nn.TransformerEncoderLayer`: Creates a Transformer encoder layer with specified parameters.
        - `nn.TransformerEncoder`: Stacks multiple encoder layers to form the main Transformer encoder.
        - `nn.Linear`: Defines fully connected layers for the input and output.
        - `nn.ReLU`: Introduces a ReLU activation function for non-linearity.

3. **Forward Pass (`forward`)**:
    - Defines how the data flows through the model during inference or training.
    - `x`: Input tensor representing the financial data sequence.
    - Processing:
        - `self.transformer_encoder(x)`: Passes the input through the Transformer encoder.
        - `x[:, -1, :]`: Selects the output from the last time step of the encoder.
        - `self.relu(self.fc(x))`: Applies a fully connected layer and ReLU activation.
        - `self.output_layer(x)`: Produces the final output, likely a prediction for buy/sell.

**Summary:** This cell constructs a Transformer-based neural network model designed for financial trading. It defines the model's architecture, including the layers, activation functions, and how data is processed during both training and inference. The model learns patterns and relationships in financial data to make trading decisions.

In [6]:
# Transformer Model optimized for GPU
class TradingTransformer(nn.Module):
    def __init__(self, input_dim, num_heads=4, num_layers=2, hidden_dim=128):
        super(TradingTransformer, self).__init__()
        self.encoder_layer = nn.TransformerEncoderLayer(d_model=input_dim, nhead=num_heads, batch_first=True)
        self.transformer_encoder = nn.TransformerEncoder(self.encoder_layer, num_layers=num_layers)
        self.fc = nn.Linear(input_dim, hidden_dim)
        self.relu = nn.ReLU()
        self.output_layer = nn.Linear(hidden_dim, 2)  # Buy/Sell classification

    def forward(self, x):
        x = self.transformer_encoder(x)
        x = x[:, -1, :]  # Take the last time step's output
        x = self.relu(self.fc(x))
        return self.output_layer(x)

## Training and Saving the Trading Transformer Model

This cell focuses on training the defined Trading Transformer model and saving it for later use.

**Steps:**

1. **Model Initialization and Setup:**
   - `model = TradingTransformer(input_dim=X.shape[2]).to(device)`: Creates an instance of the `TradingTransformer` model, setting the input dimension based on the data shape (`X.shape[2]`). It then moves the model to the appropriate device (GPU if available, otherwise CPU).
   - `optimizer = torch.optim.Adam(model.parameters(), lr=0.001)`: Initializes an Adam optimizer to update the model's parameters during training. The learning rate is set to 0.001.
   - `criterion = nn.CrossEntropyLoss()`: Defines the loss function to measure the difference between the model's predictions and the actual target labels. CrossEntropyLoss is commonly used for classification tasks.
   - `scaler = torch.cuda.amp.GradScaler()`: Creates a GradScaler for Automatic Mixed Precision (AMP) training, potentially improving performance and reducing memory usage on compatible GPUs.

2. **Training Loop (`train_model`)**:
   - `model.train()`: Sets the model to training mode, enabling necessary behaviors like dropout and batch normalization.
   - Iterates through epochs (specified by `epochs` parameter, default 25):
     - Iterates through batches of data from the `train_loader`:
       - `X_batch, y_batch = X_batch.to(device), y_batch.to(device)`: Moves the input data and target labels to the appropriate device.
       - `optimizer.zero_grad()`: Resets the gradients of the optimizer before each iteration.
       - **Forward Pass and Loss Calculation:**
         - `with torch.cuda.amp.autocast()`: Enables automatic mixed precision for the forward pass.
         - `outputs = model(X_batch)`: Passes the input batch through the model to get predictions.
         - `loss = criterion(outputs, y_batch)`: Calculates the loss between predictions and actual labels.
       - **Backward Pass and Optimization:**
         - `scaler.scale(loss).backward()`: Scales the loss and performs backpropagation to calculate gradients.
         - `scaler.step(optimizer)`: Updates the model's parameters using the optimizer.
         - `scaler.update()`: Updates the GradScaler state.
       - `total_loss += loss.item()`: Accumulates the loss for the epoch.
     - Prints the average loss for the epoch.

3. **Saving the Model:**
   - `torch.save(model.state_dict(), 'trading_transformer.pth')`: Saves the trained model's state dictionary (containing model weights and biases) to a file named 'trading_transformer.pth'. This allows you to load and reuse the trained model later.

**Summary:** This cell trains the Trading Transformer model using the provided training data and saves the trained model's weights to a file. It sets up the necessary components for training, including the optimizer, loss function, and data loaders. The training loop iterates through epochs and batches, updating the model's parameters to minimize the loss. Finally, it saves the trained model for future use.

In [7]:
# Move model to GPU
model = TradingTransformer(input_dim=X.shape[2]).to(device)

# Optimizer and loss function
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()

# Training with Mixed Precision (AMP)
scaler = torch.amp.GradScaler()

def train_model(model, train_loader, epochs=25):
    model.train()
    for epoch in range(epochs):
        total_loss = 0
        for X_batch, y_batch in train_loader:
            X_batch, y_batch = X_batch.to(device), y_batch.to(device)

            optimizer.zero_grad()

            with torch.amp.autocast(device_type=device.type):
                outputs = model(X_batch)
                loss = criterion(outputs, y_batch)

            scaler.scale(loss).backward()
            scaler.step(optimizer)
            scaler.update()

            total_loss += loss.item()

        print(f'Epoch {epoch+1}, Loss: {total_loss/len(train_loader):.4f}')

# Train the model
train_model(model, train_loader)

# Save model
torch.save(model.state_dict(), 'trading_transformer.pth')

Epoch 1, Loss: 0.6942
Epoch 2, Loss: 0.6897
Epoch 3, Loss: 0.6880
Epoch 4, Loss: 0.6887
Epoch 5, Loss: 0.6880
Epoch 6, Loss: 0.6853
Epoch 7, Loss: 0.6877
Epoch 8, Loss: 0.6850
Epoch 9, Loss: 0.6862
Epoch 10, Loss: 0.6838
Epoch 11, Loss: 0.6833
Epoch 12, Loss: 0.6816
Epoch 13, Loss: 0.6781
Epoch 14, Loss: 0.6814
Epoch 15, Loss: 0.6778
Epoch 16, Loss: 0.6796
Epoch 17, Loss: 0.6761
Epoch 18, Loss: 0.6756
Epoch 19, Loss: 0.6718
Epoch 20, Loss: 0.6727
Epoch 21, Loss: 0.6697
Epoch 22, Loss: 0.6696
Epoch 23, Loss: 0.6713
Epoch 24, Loss: 0.6671
Epoch 25, Loss: 0.6726


## Defining the Trading Environment

This cell defines a custom trading environment called `TradingEnv`, which simulates the interactions of a trading agent with the financial market. It uses the OpenAI Gym framework to structure the environment, allowing the agent to learn through reinforcement learning.

**Steps:**

1. **Environment Initialization (`__init__`)**:
   - `df`: The DataFrame containing the market data used for the simulation.
   - `initial_balance`: The starting capital for the trading agent.
   - `tick_size`: The minimum price movement of the asset being traded.
   - `tick_value`: The profit or loss associated with a one-tick price change.
   - Other variables like `balance`, `current_step`, `position`, and `total_trades` are initialized to track the agent's state.
   - **Action Space:** Defines the possible actions the agent can take (likely Hold, Buy, Sell).
   - **Observation Space:** Defines the information the agent receives about the market (e.g., current price, indicators, etc.).

2. **Environment Methods:**
   - `reset()`: Resets the environment to its initial state for a new episode.
   - `_next_observation()`: Returns the current market state to the agent.
   - `step(action)`: Executes the agent's action, updates the environment state, and returns the new observation, reward, done flag, and additional information.
   - `render()`: Displays information about the environment, such as the current balance and position.

**Handling Potential Issues:**
- The code includes mechanisms to handle potential issues with the data:
    - **NaN/inf values:** It replaces infinite or missing values in the DataFrame with appropriate values to avoid errors during training.
    - **Forward Filling:** It uses forward filling to fill in any remaining missing values after handling infinite values.

**Summary:**

This cell creates a custom trading environment that simulates market dynamics and allows a trading agent to interact with it. It defines the actions the agent can take, the information it receives, and how the environment's state changes in response to the agent's actions. This environment serves as a crucial component for training a reinforcement learning agent for financial trading.

In [8]:
class TradingEnv(gym.Env):
    def __init__(self, df, initial_balance=10000, tick_size=0.25, tick_value=12.50):
        super(TradingEnv, self).__init__()

        self.df = df.reset_index(drop=True)
        self.initial_balance = initial_balance
        self.balance = initial_balance
        self.current_step = 0
        self.tick_size = tick_size
        self.tick_value = tick_value
        self.position = 0  # 1 = long, -1 = short, 0 = no position
        self.total_trades = 0

        # Define action & observation space
        self.action_space = spaces.Discrete(3)  # 0 = Hold, 1 = Buy, 2 = Sell

        # Fix NaN and infinite values
        self.df.replace([np.inf, -np.inf], np.nan, inplace=True)
        self.df = self.df.ffill()

        # Fix observation space (explicit float32 conversion)
        self.observation_space = spaces.Box(
            low=self.df.min().values.astype(np.float32),
            high=self.df.max().values.astype(np.float32),
            shape=(len(self.df.columns),),
            dtype=np.float32
        )

    def reset(self, seed=None, options=None):
        """Reset the environment for a new episode."""
        super().reset(seed=seed)  # Gymnasium requires this
        self.balance = self.initial_balance
        self.current_step = 0
        self.position = 0
        self.total_trades = 0
        return self._next_observation(), {}  # Gymnasium requires (obs, info)

    def _next_observation(self):
        """Return the next market state."""
        return self.df.iloc[self.current_step].values.astype(np.float32)

    def step(self, action):
        """Execute the action and return new state, reward, done."""
        self.current_step += 1
        if self.current_step >= len(self.df) - 1:
            return self._next_observation(), 0, True, False, {}  # Done if out of data

        prev_close = self.df.iloc[self.current_step - 1]['Close']
        new_close = self.df.iloc[self.current_step]['Close']
        price_change = (new_close - prev_close) / self.tick_size  # Price movement in ticks
        profit_loss = price_change * self.tick_value  # P&L per tick

        reward = 0  # Default reward

        if action == 1:  # Buy
            if self.position == 0:
                self.position = 1
                self.total_trades += 1
            elif self.position == -1:  # Closing a short
                self.balance += profit_loss
                reward = profit_loss
                self.position = 0

        elif action == 2:  # Sell
            if self.position == 0:
                self.position = -1
                self.total_trades += 1
            elif self.position == 1:  # Closing a long
                self.balance += profit_loss
                reward = profit_loss
                self.position = 0

        # Fix termination conditions for Gymnasium
        terminated = self.current_step >= len(self.df) - 1  # If episode naturally ends
        truncated = self.balance <= 0  # If forced to end due to balance loss

        return self._next_observation(), reward, terminated, truncated, {}

    def render(self):
        print(f'Step: {self.current_step}, Balance: {self.balance:.2f}, Position: {self.position}')


## Training and Saving the PPO Agent

This cell focuses on training a Proximal Policy Optimization (PPO) agent within the defined trading environment and saving the trained agent for later use.

**Steps:**

1. **Environment and Agent Setup:**
    - `env = TradingEnv(df)`: Creates an instance of the `TradingEnv` environment using the loaded financial data (`df`).
    - `model = PPO("MlpPolicy", env, verbose=1)`: Creates a PPO agent using the `MlpPolicy` (Multi-Layer Perceptron policy), specifying the environment and enabling verbose output (`verbose=1`) to track training progress.

2. **Training the Agent:**
    - `model.learn(total_timesteps=100000)`: Trains the PPO agent for a specified number of timesteps (100,000 in this case). During training, the agent interacts with the environment, learns from its experiences, and updates its policy to maximize rewards (profit in the trading context).

3. **Saving the Trained Agent:**
    - `model.save("ppo_trading_agent")`: Saves the trained PPO agent to a file named "ppo_trading_agent". This allows you to load and reuse the trained agent later without retraining.

**Summary:**

This cell sets up a trading environment, creates a PPO reinforcement learning agent, trains the agent within the environment to learn a profitable trading strategy, and saves the trained agent for future use. The PPO algorithm is a popular choice for reinforcement learning due to its stability and efficiency. Saving the trained agent enables you to apply the learned strategy to new market data or continue training from where it left off.

In [9]:
# Load dataset
df = pd.read_csv("/content/ES - 1 min.csv")
df.fillna(0, inplace=True)
df = df.drop(columns=["Date"])

env = TradingEnv(df)

# Create PPO agent
model = PPO("MlpPolicy", env, verbose=1, device="cpu")

# Train the model
model.learn(total_timesteps=100000)

# Save trained model
model.save("ppo_trading_agent")

Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
-----------------------------
| time/              |      |
|    fps             | 1069 |
|    iterations      | 1    |
|    time_elapsed    | 1    |
|    total_timesteps | 2048 |
-----------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 3.56e+03     |
|    ep_rew_mean          | 4.11e+03     |
| time/                   |              |
|    fps                  | 828          |
|    iterations           | 2            |
|    time_elapsed         | 4            |
|    total_timesteps      | 4096         |
| train/                  |              |
|    approx_kl            | 0.0046434654 |
|    clip_fraction        | 0.0115       |
|    clip_range           | 0.2          |
|    entropy_loss         | -1.09        |
|    explained_variance   | -0.000218    |
|    learning_rate        | 0.0003       |
|    los

## Testing the Trained PPO Agent

This cell focuses on loading and testing the trained PPO agent within the trading environment to evaluate its performance.

**Steps:**

1. **Load Trained Agent:**
   - `model = PPO.load("ppo_trading_agent")`: Loads the previously trained PPO agent from the file "ppo_trading_agent".

2. **Test Environment Setup:**
   - `obs = env.reset()`: Resets the trading environment to its initial state, preparing it for a new trading episode.
   - `done = False`: Initializes a flag variable `done` to `False`, indicating that the trading episode is not yet finished.

3. **Trading Loop:**
   - `while not done:`: Enters a loop that continues as long as the trading episode is not finished (i.e., `done` is `False`).
     - `action, _states = model.predict(obs)`: Uses the loaded PPO agent to predict the best action to take based on the current observation (`obs`) of the environment.
     - `obs, reward, done, _ = env.step(action)`: Executes the predicted action in the environment, updating the environment's state and receiving feedback in the form of a new observation (`obs`), a reward signal (`reward`), a done flag (`done`) indicating whether the episode is finished, and any additional information.
     - `env.render()`: Displays information about the environment's current state, such as the agent's balance and position.

**Summary:**

This cell loads a trained PPO agent, sets up a trading environment, and runs a trading simulation using the agent's learned policy. It continuously observes the environment, predicts the best actions to take, executes those actions, and receives feedback. The `env.render()` function provides insights into the agent's performance during the simulation, such as its trading decisions and the resulting changes in its balance. This testing process helps evaluate the effectiveness of the trained agent in a simulated trading scenario.

In [13]:
# Load trained agent
model = PPO.load("ppo_trading_agent")

# Test environment
obs, _ = env.reset()
done = False

while not done:
    action, _states = model.predict(obs)
    obs, reward, done, _, _ = env.step(action)
    env.render()

Step: 1, Balance: 10000.00, Position: 1
Step: 2, Balance: 10000.00, Position: 1
Step: 3, Balance: 10000.00, Position: 1
Step: 4, Balance: 10000.00, Position: 1
Step: 5, Balance: 10050.00, Position: 0
Step: 6, Balance: 10050.00, Position: 1
Step: 7, Balance: 10050.00, Position: 1
Step: 8, Balance: 10050.00, Position: 1
Step: 9, Balance: 10050.00, Position: 1
Step: 10, Balance: 10050.00, Position: 1
Step: 11, Balance: 10050.00, Position: 1
Step: 12, Balance: 10050.00, Position: 1
Step: 13, Balance: 10050.00, Position: 1
Step: 14, Balance: 10050.00, Position: 1
Step: 15, Balance: 10050.00, Position: 1
Step: 16, Balance: 10050.00, Position: 1
Step: 17, Balance: 10050.00, Position: 1
Step: 18, Balance: 10050.00, Position: 1
Step: 19, Balance: 10050.00, Position: 1
Step: 20, Balance: 10050.00, Position: 1
Step: 21, Balance: 10050.00, Position: 1
Step: 22, Balance: 10050.00, Position: 1
Step: 23, Balance: 10087.50, Position: 0
Step: 24, Balance: 10087.50, Position: -1
Step: 25, Balance: 10087

## Displaying Final Trading Results

This cell displays the final results of the trading simulation performed in the previous cell (Cell Block 10). It prints the agent's final balance and the total number of trades executed during the simulation.

**Steps:**

1. **Print Final Balance:**
   - `print(f"Final Balance: ${env.balance:.2f}")`: Prints the agent's final balance after the trading simulation. The balance is formatted to two decimal places using `:.2f`.

2. **Print Total Trades:**
   - `print(f"Total Trades: {env.total_trades}")`: Prints the total number of trades the agent made during the simulation.

**Summary:**

This cell provides a basic summary of the trading agent's performance by showing its final balance and the number of trades executed. These metrics are a starting point for evaluating the profitability and efficiency of the trading strategy learned by the agent during training. Many more are needed...

In [14]:
print(f"Final Balance: ${env.balance:.2f}")
print(f"Total Trades: {env.total_trades}")

Final Balance: $14550.00
Total Trades: 438
