# Stock AI Predictor - Automated Update and Retraining

This Kaggle notebook automates the updating of datasets and retraining of models for the Stock AI Predictor project on Hugging Face. It can be scheduled to run at regular intervals using Kaggle's scheduling feature.

## Overview

The workflow performs the following tasks:

1. Fetches the latest market data
2. Updates the dataset on Hugging Face Datasets
3. Retrains the parameter tester model if needed
4. Retrains the RL trading model if needed
5. Deploys updated models to Hugging Face Hub

## Setup

First, let's install the necessary dependencies:

In [None]:
# Install required packages
!pip install huggingface_hub datasets pandas numpy requests python-dotenv stable-baselines3 gymnasium

## Clone Project Repository

Now, let's clone the project repository from GitHub:

In [None]:
# Clone the repository
!git clone https://github.com/yourusername/Stock_AI_Predictor.git
%cd Stock_AI_Predictor

## Set Up Environment Variables

We need to set up the environment variables for Hugging Face API access. In Kaggle, you should add these as secrets.

In [None]:
import os
import sys
from pathlib import Path

# Set environment variables from Kaggle secrets
# Note: You need to add these secrets to your Kaggle notebook settings
os.environ["HF_TOKEN"] = "your_huggingface_token"  # Replace with actual token or use Kaggle secrets
os.environ["PARAM_TESTER_REPO_ID"] = "your_username/stock-ai-parameter-tester"
os.environ["RL_MODEL_REPO_ID"] = "your_username/stock-ai-rl-trader"
os.environ["DATASET_REPO_ID"] = "your_username/stock-market-data"
os.environ["API_SPACE_ID"] = "your_username/stock-ai-predictor-api"

# Add project root to path
project_root = Path.cwd()
sys.path.append(str(project_root))

## Data Fetching

Now, let's implement the data fetching part that will update our database with the latest market data:

In [None]:
# Import required modules
from Data.Database.db import Database
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

def fetch_market_data():
    """Fetch latest market data and update database"""
    print("Fetching latest market data...")
    
    # Initialize database
    db = Database()
    
    # Get list of stocks/symbols to update
    stocks_query = "SELECT id, symbol FROM stocks"
    stocks_df = pd.read_sql_query(stocks_query, db.connection)
    
    # Get list of timeframes
    timeframes_query = "SELECT id, name FROM timeframes"
    timeframes_df = pd.read_sql_query(timeframes_query, db.connection)
    
    # For each stock and timeframe, fetch and update data
    for _, stock_row in stocks_df.iterrows():
        stock_id = stock_row['id']
        symbol = stock_row['symbol']
        
        for _, tf_row in timeframes_df.iterrows():
            tf_id = tf_row['id']
            tf_name = tf_row['name']
            
            print(f"Updating {symbol} for timeframe {tf_name}")
            
            # Fetch latest data for this stock and timeframe
            # This is where you would implement your data fetching logic
            # For example, using yfinance, alpha_vantage, or another API
            
            # Example using a dummy function (you'd replace this with actual data fetching)
            new_data = fetch_latest_prices(symbol, tf_name)
            
            # Update database with new data
            update_database(db, stock_id, tf_id, new_data)
    
    print("Market data update completed")

def fetch_latest_prices(symbol, timeframe):
    """Dummy function to fetch latest prices (replace with actual implementation)"""
    # In a real implementation, you would use yfinance, alpha_vantage, etc.
    # For demonstration purposes, we'll create some dummy data
    end_date = datetime.now()
    start_date = end_date - timedelta(days=7)
    
    dates = pd.date_range(start=start_date, end=end_date, freq='D')
    prices = np.random.normal(100, 5, size=len(dates))
    volumes = np.random.randint(1000, 10000, size=len(dates))
    
    df = pd.DataFrame({
        'datetime': dates,
        'open': prices,
        'high': prices * 1.02,
        'low': prices * 0.98,
        'close': prices * (1 + np.random.normal(0, 0.01, size=len(dates))),
        'volume': volumes
    })
    
    return df

def update_database(db, stock_id, timeframe_id, data):
    """Update database with new data (implement the actual logic)"""
    # This is where you would implement your database update logic
    # For example, inserting new rows, updating existing ones, etc.
    print(f"Updated database for stock_id={stock_id}, timeframe_id={timeframe_id} with {len(data)} records")

# Run data fetching
fetch_market_data()

## Dataset Deployment

Now, let's prepare and upload the dataset to Hugging Face Datasets:

In [None]:
from Deployment.dataset_uploader import prepare_dataset, upload_dataset
import os

def deploy_dataset():
    """Prepare and upload dataset to Hugging Face Hub"""
    print("Deploying dataset to Hugging Face Datasets...")
    
    # Get repository ID from environment
    dataset_repo_id = os.environ.get("DATASET_REPO_ID")
    hf_token = os.environ.get("HF_TOKEN")
    
    if not dataset_repo_id or not hf_token:
        raise ValueError("Dataset repository ID and HF token must be provided")
    
    # Prepare dataset
    dataset = prepare_dataset()
    
    # Upload dataset
    upload_dataset(dataset, repo_id=dataset_repo_id, token=hf_token)
    
    print(f"Dataset deployed to {dataset_repo_id}")

# Deploy dataset
deploy_dataset()

## Model Retraining and Deployment

Now, let's retrain the models if needed and deploy them to Hugging Face Hub:

In [None]:
from Deployment.deploy_to_huggingface import HuggingFaceDeployer
import os
from datetime import datetime

def should_retrain_models():
    """Determine if models should be retrained"""
    # For example, retrain on the 1st and 15th of each month
    today = datetime.now()
    return today.day in [1, 15]

def retrain_and_deploy_models():
    """Retrain models if needed and deploy to Hugging Face Hub"""
    if not should_retrain_models():
        print("Model retraining not scheduled for today. Skipping.")
        return
    
    print("Retraining and deploying models to Hugging Face Hub...")
    
    # Get repository IDs from environment
    param_tester_repo_id = os.environ.get("PARAM_TESTER_REPO_ID")
    rl_model_repo_id = os.environ.get("RL_MODEL_REPO_ID")
    hf_token = os.environ.get("HF_TOKEN")
    
    if not param_tester_repo_id or not rl_model_repo_id or not hf_token:
        raise ValueError("Repository IDs and HF token must be provided")
    
    # Initialize deployer
    deployer = HuggingFaceDeployer(token=hf_token)
    
    # Retrain parameter tester model
    print("Retraining parameter tester model...")
    # This is where you would implement your parameter tester retraining logic
    # For example: from Colab.parameter_tester import ParameterTester
    # tester = ParameterTester()
    # tester.run_optimization()
    
    # Deploy parameter tester model
    deployer.deploy_parameter_tester(repo_id=param_tester_repo_id)
    
    # Retrain RL model
    print("Retraining RL trading model...")
    # This is where you would implement your RL model retraining logic
    # For example: from RL.Scripts.train_rl_model import train_model
    # train_model(epochs=100)
    
    # Deploy RL model
    deployer.deploy_rl_model(repo_id=rl_model_repo_id)
    
    print("Models retrained and deployed successfully")

# Retrain and deploy models
retrain_and_deploy_models()

## Summary and Verification

Let's summarize what we've done and verify that everything was updated correctly:

In [None]:
from huggingface_hub import HfApi
import os

def verify_deployment():
    """Verify that all components were deployed correctly"""
    print("Verifying deployment...")
    
    # Get repository IDs from environment
    param_tester_repo_id = os.environ.get("PARAM_TESTER_REPO_ID")
    rl_model_repo_id = os.environ.get("RL_MODEL_REPO_ID")
    dataset_repo_id = os.environ.get("DATASET_REPO_ID")
    hf_token = os.environ.get("HF_TOKEN")
    
    if not any([param_tester_repo_id, rl_model_repo_id, dataset_repo_id, hf_token]):
        print("Repository IDs and HF token must be provided for verification")
        return
    
    # Initialize Hugging Face API
    api = HfApi(token=hf_token)
    
    # Check parameter tester model
    if param_tester_repo_id:
        try:
            model_info = api.model_info(param_tester_repo_id)
            print(f"Parameter tester model verified: {param_tester_repo_id}")
            print(f"  Last updated: {model_info.siblings[-1].lastModified if model_info.siblings else 'N/A'}")
        except Exception as e:
            print(f"Parameter tester model verification failed: {e}")
    
    # Check RL model
    if rl_model_repo_id:
        try:
            model_info = api.model_info(rl_model_repo_id)
            print(f"RL trading model verified: {rl_model_repo_id}")
            print(f"  Last updated: {model_info.siblings[-1].lastModified if model_info.siblings else 'N/A'}")
        except Exception as e:
            print(f"RL trading model verification failed: {e}")
    
    # Check dataset
    if dataset_repo_id:
        try:
            dataset_info = api.dataset_info(dataset_repo_id)
            print(f"Dataset verified: {dataset_repo_id}")
            print(f"  Last updated: {dataset_info.lastModified}")
        except Exception as e:
            print(f"Dataset verification failed: {e}")
    
    print("Verification completed")

# Verify deployment
verify_deployment()

## Conclusion

This notebook has successfully:

1. Fetched the latest market data
2. Updated the dataset on Hugging Face Datasets
3. Retrained and deployed models to Hugging Face Hub (if scheduled)

You can schedule this notebook to run weekly on Kaggle to keep your dataset and models up-to-date.

### Next Steps

1. Set up Kaggle scheduling for this notebook
2. Configure the actual data fetching logic for your specific data sources
3. Implement the actual model training logic based on your project requirements