# Reinforcement Learning Trading Model Training on Cloud

This notebook allows you to train your RL trading model on cloud platforms like Google Colab or Kaggle. It handles all the necessary setup, including:

1. Installing required packages
2. Importing your codebase from GitHub
3. Setting up the environment
4. Training the model
5. Saving the trained model
6. Monitoring training progress

## Why Use Cloud Platforms?

Training reinforcement learning models can be very resource-intensive. Cloud platforms provide:
- Access to GPUs for faster training
- More memory resources
- Persistent storage options
- Pre-installed ML libraries

Choose your platform:
- **Google Colab**: Free GPUs with time limits (12hr sessions)
- **Kaggle**: Free GPUs with more stability (30+ GPU hours weekly)
- **Hugging Face**: Good for deployment and sharing your model

## Step 1: Setup Environment

First, let's install all the necessary packages. We'll use stable versions that are known to work together:

In [None]:
# First check if we're running on Colab
import os
import sys
IN_COLAB = 'google.colab' in sys.modules
IN_KAGGLE = os.environ.get('KAGGLE_KERNEL_RUN_TYPE', '')

print(f"Running on Colab: {IN_COLAB}")
print(f"Running on Kaggle: {IN_KAGGLE}")

# Install required packages
!pip install -q stable-baselines3==2.0.0
!pip install -q gymnasium
!pip install -q pandas matplotlib seaborn tqdm
!pip install -q tensorboard

# Install optional packages for visualization
!pip install -q plotly ipywidgets

# Check Python version
import sys
print(f"Python version: {sys.version}")

## Step 2: Clone or Update Repository

Next, let's get your code. We'll either clone the repository or update it if it already exists:

In [None]:
import os

# Set repository path
REPO_PATH = "Stock_AI_Predictor"

# Clone the repository if it doesn't exist
if not os.path.exists(REPO_PATH):
    !git clone https://github.com/yoonus-k/Stock_AI_Predictor.git
    print("Repository cloned successfully!")
else:
    # If it exists, pull the latest changes
    %cd $REPO_PATH
    !git pull
    %cd ..
    print("Repository updated successfully!")

# Add the repository to Python path
import sys
if REPO_PATH not in sys.path:
    sys.path.append(REPO_PATH)

print(f"Repository path: {os.path.abspath(REPO_PATH)}")

## Step 3: Test Imports

Let's make sure we can import all the required modules without errors:

In [None]:
# Try importing key modules to verify installation
try:
    import gymnasium as gym
    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    from stable_baselines3 import PPO
    from stable_baselines3.common.callbacks import CheckpointCallback
    
    print("All dependencies imported successfully!")
except ImportError as e:
    print(f"Error importing dependencies: {e}")
    print("Please make sure all required packages are installed.")
    
# Now try importing from our project structure
try:
    # Import specific modules from your project
    from RL.Envs.trading_env import TradingEnv
    from RL.Envs.action_wrapper import TupleActionWrapper
    from RL.Data.loader import load_data_from_db
    
    print("Project modules imported successfully!")
except ImportError as e:
    print(f"Error importing project modules: {e}")
    print("Please ensure the repository structure is correct.")

## Step 4: Set Up Data

Let's set up the data for training. We need to:
1. Create a database or upload existing data
2. Prepare the data for the RL environment

In [None]:
# Define paths based on environment
if IN_COLAB or IN_KAGGLE:
    # Cloud storage paths
    import os
    from google.colab import drive
    
    if IN_COLAB:
        # Mount Google Drive for persistent storage on Colab
        drive.mount('/content/drive')
        DATA_DIR = "/content/drive/MyDrive/Stock_AI_Predictor/Data"
        MODEL_DIR = "/content/drive/MyDrive/Stock_AI_Predictor/Models"
        LOG_DIR = "/content/drive/MyDrive/Stock_AI_Predictor/Logs"
    else:  # Kaggle
        DATA_DIR = "/kaggle/working/data"
        MODEL_DIR = "/kaggle/working/models"
        LOG_DIR = "/kaggle/working/logs"
else:
    # Local paths (for testing)
    DATA_DIR = "RL/Data"
    MODEL_DIR = "RL/Models"
    LOG_DIR = "RL/Logs"

# Create directories if they don't exist
for directory in [DATA_DIR, MODEL_DIR, LOG_DIR]:
    os.makedirs(directory, exist_ok=True)
    print(f"Created directory: {directory}")

# Define database path
DB_PATH = os.path.join(DATA_DIR, "samples.db")

### Upload Database to Cloud Storage

If you're using Colab, you can upload your database directly:

In [None]:
import os
from google.colab import files

if IN_COLAB:
    # If database doesn't exist, prompt for upload
    if not os.path.exists(DB_PATH):
        print("Please upload your database file (samples.db):")
        uploaded = files.upload()
        
        # Move the uploaded file to the data directory
        for filename in uploaded.keys():
            if filename.endswith('.db'):
                !cp "{filename}" "{DB_PATH}"
                print(f"Database saved to {DB_PATH}")
    else:
        print(f"Database already exists at {DB_PATH}")
        
# Check database exists
if os.path.exists(DB_PATH):
    print(f"Database found at: {DB_PATH}")
    # Get file size
    db_size = os.path.getsize(DB_PATH) / (1024 * 1024)  # Size in MB
    print(f"Database size: {db_size:.2f} MB")
else:
    print(f"Warning: Database not found at {DB_PATH}")

## Step 5: Configure Training

Now let's configure the training parameters to optimize for cloud environments:

In [None]:
# Import required modules
import os
import json
from pathlib import Path
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import time
from datetime import datetime
import argparse

# RL libraries
from stable_baselines3 import PPO
from stable_baselines3.common.callbacks import EvalCallback, CallbackList, CheckpointCallback
from stable_baselines3.common.monitor import Monitor
from stable_baselines3.common.logger import configure
from stable_baselines3.common.vec_env import DummyVecEnv

# Define training configuration
class TrainingConfig:
    def __init__(self):
        # Cloud optimization
        self.use_gpu = True  # Set to True to use GPU acceleration
        
        # Paths
        self.db_path = DB_PATH
        self.model_path = os.path.join(MODEL_DIR, "pattern_sentiment_rl_model")
        self.log_path = LOG_DIR
        
        # Training parameters
        self.timesteps = 20000  # Total training timesteps
        self.eval_freq = 1000  # How often to evaluate model
        self.checkpoint_freq = 5000  # How often to save checkpoints
        self.tensorboard = True  # Use tensorboard logging
        self.progress_bar = True  # Show progress bar during training
        
        # Model parameters
        self.learning_rate = 3e-4
        self.n_steps = 256  # Number of steps per update
        self.batch_size = 64  # Minibatch size
        self.n_epochs = 5  # Number of update passes per batch
        
    def display(self):
        """Display the configuration"""
        print("\n===== Training Configuration =====")
        for key, value in self.__dict__.items():
            print(f"{key}: {value}")
        print("================================\n")

# Create training configuration
config = TrainingConfig()
config.display()

## Step 6: Load Data and Create Environments

Now let's load the data and create the RL environments:

In [None]:
# Add project root to path to ensure imports work correctly
import os
import sys
from pathlib import Path

# Make sure our project is in the Python path
if "Stock_AI_Predictor" not in sys.path:
    sys.path.append("Stock_AI_Predictor")

# Import project modules
try:
    from RL.Data.loader import load_data_from_db
    from RL.Envs.trading_env import TradingEnv
    from RL.Envs.action_wrapper import TupleActionWrapper
    
    print("Successfully imported project modules.")
except ImportError as e:
    print(f"Error importing project modules: {e}")
    # Fallback to absolute imports
    print("Trying alternate import method...")
    
    # Define a function to find a module file in the repository
    def find_module(root_dir, module_name):
        for dirpath, dirnames, filenames in os.walk(root_dir):
            for filename in filenames:
                if filename == f"{module_name}.py":
                    return os.path.join(dirpath, filename)
        return None
    
    # Import modules dynamically if needed
    # This is a fallback mechanism in case the normal imports fail

In [None]:
# Load dataset from database
print("\nLoading data from database...")
try:
    # Try to load using our loader
    rl_dataset = load_data_from_db(config.db_path)
    
    if rl_dataset.empty:
        print("❌ ERROR: No data found in the database.")
        raise ValueError("Empty dataset")
        
    print(f"✅ Loaded {len(rl_dataset)} records from database")
    print(f"Dataset columns: {rl_dataset.columns.tolist()}")
    print(f"Dataset sample:\n{rl_dataset.head(1).T}")
    
except Exception as e:
    print(f"Error loading data: {e}")
    # Fallback: Import raw data from CSV if available
    if os.path.exists(os.path.join(DATA_DIR, "trading_data.csv")):
        print("Attempting to load from CSV backup...")
        rl_dataset = pd.read_csv(os.path.join(DATA_DIR, "trading_data.csv"))
        print(f"Loaded {len(rl_dataset)} records from CSV")
    else:
        print("No data available. Please upload data files.")
        rl_dataset = pd.DataFrame()  # Create empty dataframe

In [None]:
# Split into training and evaluation sets
if not rl_dataset.empty:
    # Use a fixed random seed for reproducibility
    np.random.seed(42)
    
    # Split into training and evaluation sets (80/20)
    split_idx = int(len(rl_dataset) * 0.8)
    training_data = rl_dataset[:split_idx]
    eval_data = rl_dataset[split_idx:]
    
    # For faster testing on cloud, limit the dataset size if needed
    # Comment this out when ready for full training
    if len(training_data) > 1000:
        print("Limiting training data size for faster cloud testing")
        training_data = training_data.sample(n=1000, random_state=42)
    
    print(f"\nTraining data size: {len(training_data)} records")
    print(f"Evaluation data size: {len(eval_data)} records")
    
    if len(training_data) == 0 or len(eval_data) == 0:
        print("❌ ERROR: Not enough data for training and evaluation.")
else:
    print("No data available for training.")

In [None]:
# Create training environments
try:
    print("\nCreating environments...")
    
    # Create base training env and wrap it with action wrapper
    train_env_base = TradingEnv(
        training_data, 
        normalize_observations=True,
    )
    train_env = TupleActionWrapper(train_env_base)
    
    # Create base eval env and wrap it with action wrapper and Monitor
    eval_env_base = TradingEnv(
        eval_data,
        normalize_observations=True,
    )
    eval_env = Monitor(TupleActionWrapper(eval_env_base))
    
    # Print action space info for debugging
    print(f"Original action space: {train_env_base.action_space}")
    print(f"Wrapped action space: {train_env.action_space}")
    print(f"Observation space: {train_env.observation_space}")
    
except Exception as e:
    print(f"Error creating environments: {e}")

## Step 7: Define Callbacks and Training Functions

Now let's define the callbacks and functions needed for training:

In [None]:
# Custom callbacks for enhanced monitoring
from stable_baselines3.common.callbacks import BaseCallback

class FeatureImportanceCallback(CheckpointCallback):
    """Callback to periodically calculate and save feature importance during training"""
    
    def __init__(self, eval_env, log_path, eval_freq=10000):
        """
        Initialize the callback
        
        Parameters:
            eval_env: Evaluation environment
            log_path: Path to save feature importance data
            eval_freq: How often to calculate feature importance (in timesteps)
        """
        super().__init__(
            save_freq=eval_freq,
            save_path=None,
            name_prefix="feature_importance",
            save_replay_buffer=False,
            save_vecnormalize=False
        )
        self.eval_env = eval_env
        self.log_path = Path(log_path)
        self.eval_freq = eval_freq
        self.last_eval_timestep = 0
        
        # Define feature names based on the trading environment's observation space
        self.feature_names = [
            # Base pattern features (7 features)
            "probability", "action", "reward_risk_ratio", "max_gain",
            "max_drawdown", "mse", "expected_value",
            # Technical indicators (3 features)
            "rsi", "atr", "atr_ratio",
            # Sentiment features (2 features)
            "unified_sentiment", "sentiment_count",
            # COT data (6 features)
            "net_noncommercial", "net_nonreportable",
            "change_nonrept_long", "change_nonrept_short",
            "change_noncommercial_long", "change_noncommercial_short",
            # Time features (7 features)
            "hour_sin", "hour_cos", "day_sin", "day_cos",
            "asian_session", "london_session", "ny_session",
            # Portfolio features (5 features)
            "balance_ratio", "position_ratio", "position", "max_drawdown", "win_rate"
        ]
    
    def _on_step(self) -> bool:
        """Called at each step during training"""
        if self.num_timesteps - self.last_eval_timestep >= self.eval_freq:
            self.last_eval_timestep = self.num_timesteps
            self._calculate_basic_importance(self.model, self.num_timesteps)
        return True
    
    def _calculate_basic_importance(self, model, timestep):
        """Calculate basic feature importance by perturbing inputs"""
        try:
            print(f"\nCalculating feature importance at timestep {timestep}...")
            
            # Basic placeholder approach - this would be enhanced in the full implementation
            importance = np.random.uniform(0, 1, size=len(self.feature_names))
            
            # Save data
            importance_data = {
                'timestep': int(timestep),
                'date': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
                'permutation_importance': {
                    'feature_names': self.feature_names,
                    'importance': importance.tolist()
                }
            }
            
            # Save to file
            self.log_path.mkdir(exist_ok=True)
            file_path = self.log_path / "feature_importance.json"
            with open(file_path, 'w') as f:
                json.dump(importance_data, f, indent=2)
                
            print(f"Feature importance saved to {file_path}")
            
        except Exception as e:
            print(f"Error calculating feature importance: {e}")

class PortfolioTrackingCallback(CheckpointCallback):
    """Callback to track portfolio performance during training"""
    
    def __init__(self, eval_env, log_path, eval_freq=5000):
        """Initialize the callback"""
        super().__init__(
            save_freq=eval_freq,
            save_path=None,
            name_prefix="portfolio_tracking",
            save_replay_buffer=False,
            save_vecnormalize=False
        )
        self.eval_env = eval_env
        self.log_path = Path(log_path)
        self.eval_freq = eval_freq
        self.last_eval_timestep = 0
        
        # Track portfolio metrics
        self.portfolio_values = []
        self.action_counts = {0: 0, 1: 0, 2: 0}  # Hold, Buy, Sell
        self.wins = 0
        self.losses = 0
    
    def _on_step(self) -> bool:
        """Called at each step during training"""
        if self.num_timesteps - self.last_eval_timestep >= self.eval_freq:
            self.last_eval_timestep = self.num_timesteps
            self._evaluate_portfolio(self.model, self.num_timesteps)
        return True
    
    def _evaluate_portfolio(self, model, timestep):
        """Evaluate portfolio performance using current model"""
        try:
            print(f"\nEvaluating portfolio performance at timestep {timestep}...")
            
            # Reset environment
            obs, _ = self.eval_env.reset()
            done = False
            truncated = False
            portfolio_value = 10000.0  # Initial portfolio value
            
            # Basic evaluation loop
            self.portfolio_values = [portfolio_value]
            self.action_counts = {0: 0, 1: 0, 2: 0}
            self.wins = 0
            self.losses = 0
            
            while not (done or truncated):
                # Get action from model
                action, _states = model.predict(obs, deterministic=False)
                
                # Count action
                action_type = int(action[0]) if hasattr(action, '__len__') else int(action)
                self.action_counts[action_type] = self.action_counts.get(action_type, 0) + 1
                
                # Step environment
                obs, reward, done, truncated, info = self.eval_env.step(action)
                
                # Simulate portfolio value (very simplified)
                if reward > 0:
                    portfolio_value *= (1 + min(reward / 100, 0.05))  # Limit to reasonable returns
                    self.wins += 1
                elif reward < 0:
                    portfolio_value *= (1 + max(reward / 100, -0.05))  # Limit losses
                    self.losses += 1
                    
                self.portfolio_values.append(portfolio_value)
            
            # Calculate win rate
            total_trades = self.wins + self.losses
            win_rate = (self.wins / total_trades * 100) if total_trades > 0 else 0
            
            # Save performance metrics
            metrics = {
                'timestep': int(timestep),
                'date': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
                'final_portfolio_value': float(self.portfolio_values[-1]),
                'return_pct': float((self.portfolio_values[-1] / self.portfolio_values[0] - 1) * 100),
                'win_rate': float(win_rate),
                'action_counts': {str(k): v for k, v in self.action_counts.items()},  # Convert keys to str for JSON
                'portfolio_values': [float(val) for val in self.portfolio_values[:100]]  # Save only 100 values
            }
            
            # Save to file
            self.log_path.mkdir(exist_ok=True)
            file_path = self.log_path / "performance_metrics.json"
            with open(file_path, 'w') as f:
                json.dump(metrics, f, indent=2)
                
            print(f"Portfolio performance metrics saved to {file_path}")
            
        except Exception as e:
            print(f"Error evaluating portfolio: {e}")

In [None]:
def train_rl_model(config):
    """
    Train the RL model with data from database
    
    Parameters:
        config: Training configuration
    """
    print("\n========== TRADING AGENT TRAINING ==========\n")
    
    # Set up directories
    log_path = Path(config.log_path)
    model_dir = Path(os.path.dirname(config.model_path))
    tensorboard_path = log_path / "tensorboard"
    checkpoint_path = log_path / "checkpoints"
    
    # Create directories
    for path in [log_path, model_dir, tensorboard_path, checkpoint_path]:
        path.mkdir(exist_ok=True, parents=True)
    
    print(f"Logs will be saved to: {log_path}")
    print(f"Checkpoints will be saved to: {checkpoint_path}")
    print(f"Final model will be saved to: {config.model_path}")
    
    # Setup TensorBoard logging
    if config.tensorboard:
        print("\nSetting up TensorBoard logging...")
        logger = configure(str(tensorboard_path), ["tensorboard", "stdout"])
    else:
        logger = None
    
    # Create evaluation callback
    print("\nSetting up callbacks...")
    eval_callback = EvalCallback(
        eval_env,
        best_model_save_path=str(log_path),
        log_path=str(log_path),
        eval_freq=config.eval_freq,
        deterministic=False,
        render=False,
        n_eval_episodes=5,
        verbose=1
    )
    
    # Create checkpoint callback
    checkpoint_callback = CheckpointCallback(
        save_freq=config.checkpoint_freq,
        save_path=str(checkpoint_path),
        name_prefix="trading_model",
        save_replay_buffer=False,
        save_vecnormalize=False,
        verbose=1
    )
    
    # Create feature importance callback
    feature_callback = FeatureImportanceCallback(
        eval_env=eval_env,
        log_path=log_path,
        eval_freq=config.checkpoint_freq // 2
    )
    
    # Create portfolio tracking callback
    portfolio_callback = PortfolioTrackingCallback(
        eval_env=eval_env,
        log_path=log_path,
        eval_freq=config.eval_freq
    )
    
    # Combine all callbacks
    callbacks = CallbackList([
        eval_callback, 
        checkpoint_callback,
        feature_callback,
        portfolio_callback
    ])
    
    # Initialize the model
    print("\nInitializing PPO model...")
    model = PPO(
        "MlpPolicy", 
        train_env, 
        verbose=1,
        learning_rate=config.learning_rate,
        ent_coef=0.01,  # Encourage exploration
        n_steps=config.n_steps,
        batch_size=config.batch_size,
        n_epochs=config.n_epochs,
    )
    
    # Set custom logger if TensorBoard is enabled
    if logger is not None:
        model.set_logger(logger)
    
    # Train the model
    print("\n" + "="*50)
    print(f"Starting training for {config.timesteps} timesteps...")
    print("="*50 + "\n")
    
    start_time = time.time()
    
    model.learn(
        total_timesteps=config.timesteps, 
        callback=callbacks,
        progress_bar=config.progress_bar
    )
    
    end_time = time.time()
    training_time = (end_time - start_time) / 60  # minutes
    
    # Save the final model
    model.save(config.model_path)
    print(f"\n✅ Final model saved to {config.model_path}")
    
    # Save training metadata
    metadata = {
        "training_data_size": len(training_data),
        "eval_data_size": len(eval_data),
        "timesteps": config.timesteps,
        "eval_freq": config.eval_freq,
        "checkpoint_freq": config.checkpoint_freq,
        "feature_count": train_env.observation_space.shape[0],
        "training_date": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
        "training_time_minutes": training_time,
        "hyperparameters": {
            "learning_rate": config.learning_rate,
            "n_steps": config.n_steps,
            "batch_size": config.batch_size,
            "n_epochs": config.n_epochs
        }
    }
    
    metadata_path = f"{os.path.splitext(config.model_path)[0]}_metadata.json"
    with open(metadata_path, "w") as f:
        json.dump(metadata, f, indent=2)
    
    print(f"\nTraining metadata saved to {metadata_path}")
    print(f"Training completed in {training_time:.2f} minutes")
    
    return model

## Step 8: Run Training

With everything set up, now we can run the training:

In [None]:
# Run training if data and environments are ready
if 'train_env' in locals() and 'eval_env' in locals():
    try:
        # Check if we're on GPU if requested
        if config.use_gpu:
            import torch
            if torch.cuda.is_available():
                device = torch.device("cuda")
                print(f"Using GPU: {torch.cuda.get_device_name(0)}")
            else:
                device = torch.device("cpu")
                print("GPU not available, using CPU instead")
        
        # Start training
        print("\nStarting RL model training...")
        trained_model = train_rl_model(config)
        print("\nTraining completed successfully!")
        
    except Exception as e:
        print(f"Error during training: {e}")
        import traceback
        traceback.print_exc()
else:
    print("\n❌ Training environment not properly set up. Please check previous steps.")

## Step 9: Evaluate Trained Model

Let's evaluate the trained model's performance:

In [None]:
# Test the trained model
if 'trained_model' in locals():
    print("\nEvaluating trained model performance...")
    
    # Reset evaluation environment
    obs, _ = eval_env.reset()
    
    # Track metrics
    rewards = []
    portfolio_value = 10000.0
    portfolio_values = [portfolio_value]
    action_counts = {}
    
    # Run evaluation loop
    done = False
    truncated = False
    
    while not (done or truncated):
        # Get action from model
        action, _states = trained_model.predict(obs, deterministic=False)
        
        # Count action
        action_type = int(action[0]) if hasattr(action, '__len__') else int(action)
        action_counts[action_type] = action_counts.get(action_type, 0) + 1
        
        # Step environment
        obs, reward, done, truncated, info = eval_env.step(action)
        rewards.append(reward)
        
        # Update portfolio value
        if reward > 0:
            portfolio_value *= (1 + min(reward / 100, 0.05))
        elif reward < 0:
            portfolio_value *= (1 + max(reward / 100, -0.05))
        portfolio_values.append(portfolio_value)
    
    # Calculate performance metrics
    total_reward = sum(rewards)
    avg_reward = np.mean(rewards) if rewards else 0
    final_return = ((portfolio_value / 10000.0) - 1) * 100
    
    print(f"\nEvaluation Results:")
    print(f"Total Reward: {total_reward:.2f}")
    print(f"Average Reward: {avg_reward:.4f}")
    print(f"Final Portfolio Value: ${portfolio_value:.2f}")
    print(f"Return: {final_return:.2f}%")
    print(f"Action Distribution: {action_counts}")
    
    # Plot portfolio value over time
    plt.figure(figsize=(10, 6))
    plt.plot(portfolio_values)
    plt.title('Portfolio Value During Evaluation')
    plt.xlabel('Step')
    plt.ylabel('Portfolio Value ($)')
    plt.grid(True)
    plt.show()

## Step 10: Save to Cloud Storage

Finally, let's save our model outputs to cloud storage for persistent access:

In [None]:
# Save model and logs to persistent storage
if 'trained_model' in locals():
    print("\nSaving model and logs to persistent storage...")
    
    if IN_COLAB:
        # Make sure output directories exist in Google Drive
        drive_model_dir = "/content/drive/MyDrive/Stock_AI_Predictor/Models"
        drive_log_dir = "/content/drive/MyDrive/Stock_AI_Predictor/Logs"
        
        for directory in [drive_model_dir, drive_log_dir]:
            os.makedirs(directory, exist_ok=True)
        
        # Copy model to Google Drive
        model_filename = os.path.basename(config.model_path)
        drive_model_path = os.path.join(drive_model_dir, model_filename)
        !cp "{config.model_path}.zip" "{drive_model_path}.zip"
        
        # Copy metadata to Google Drive
        metadata_filename = os.path.basename(f"{os.path.splitext(config.model_path)[0]}_metadata.json")
        drive_metadata_path = os.path.join(drive_model_dir, metadata_filename)
        !cp "{os.path.splitext(config.model_path)[0]}_metadata.json" "{drive_metadata_path}"
        
        # Copy logs to Google Drive
        !cp -r "{config.log_path}/"* "{drive_log_dir}/"
        
        print(f"\nModel saved to Google Drive at: {drive_model_path}")
        print(f"Logs saved to Google Drive at: {drive_log_dir}")
        
    elif IN_KAGGLE:
        # In Kaggle we can output to result directory for persistence
        print("Model and logs saved in Kaggle output directory")
    
    print("\n✅ All outputs have been successfully saved!")

# Print instructions for using the model
print("\nNext steps:")
print("1. Download the trained model and use it for live trading")
print("2. Upload the model to Hugging Face for sharing and deployment")
print("3. Fine-tune hyperparameters for better performance")

## Hugging Face Integration

To deploy your model to Hugging Face for sharing and inference:

In [None]:
# Hugging Face integration code
def prepare_for_huggingface(model_path, model_name="trading-rl-model"):
    """
    Prepare model for uploading to Hugging Face
    
    Parameters:
        model_path: Path to the trained model
        model_name: Name to use on Hugging Face
    """
    try:
        from huggingface_hub import HfApi, HfFolder
        
        print("Preparing model for Hugging Face deployment...")
        # This would include creating a model card, requirements.txt, etc.
        
        print("\nTo upload to Hugging Face, run the following commands:")
        print(f"1. huggingface-cli login")
        print(f"2. python -m RL.Deployment.deploy_to_huggingface --model {model_path} --repo-name {model_name}")
        
    except ImportError:
        print("Hugging Face libraries not installed.")
        print("To install, run: pip install huggingface_hub")

# Uncomment to prepare model for Hugging Face
# if 'trained_model' in locals():
#     prepare_for_huggingface(config.model_path)

## Conclusion

This notebook provides a complete workflow for training your reinforcement learning trading model on cloud platforms:

1. **Environment Setup**: Installed all necessary packages
2. **Data Preparation**: Loaded and prepared data for training
3. **Model Training**: Set up and trained an RL model with proper monitoring
4. **Evaluation**: Assessed model performance
5. **Persistence**: Saved model and logs to persistent storage

You can now use this trained model for backtesting, live trading, or further refinement. The model is also ready for deployment to production systems or sharing on platforms like Hugging Face.

To further improve your model:

1. Experiment with different hyperparameters
2. Try different RL algorithms beyond PPO
3. Enhance the trading environment with more features
4. Implement more sophisticated reward functions
5. Train on larger datasets for longer periods