# Stock Price Pattern Mining and Analysis

This notebook focuses on extracting, analyzing, and storing meaningful price patterns from financial time series data. It demonstrates:

1. Setting up the environment and importing required modules
2. Loading stock price data from CSV files
3. Storing data in a SQLite database
4. Mining perceptually important points (PIPs) from price data
5. Clustering similar patterns
6. Analyzing pattern performance

The Pattern_Miner class is used to identify key patterns in stock price movements and analyze their predictive power.

## Environment Setup and Imports

In [1]:
# System imports for path handling
import sys
import os
from pathlib import Path

# Configure Python path to allow imports from parent directory
current_dir = Path(os.getcwd())
project_root = current_dir.parent  # Stock_AI_Predictor root directory
sys.path.append(str(project_root))

# Data processing imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Machine learning imports
from sklearn.model_selection import train_test_split

# Custom modules
from Pattern.pip_pattern_miner import Pattern_Miner
from Data.Database.db_cloud import Database

# Initialize database connection
db = Database()

# Print confirmation of successful imports
print("Environment setup complete")

Connected to sqlite cloud database
Environment setup complete


## Stock Data Configuration

Defining the stocks and timeframes to be analyzed.

In [None]:
# Dictionary mapping stock IDs to names and ticker symbols
companies = {
    1: "GOLD (XAUUSD)",   # Gold spot price
    2: "BTC (BTCUSD)",    # Bitcoin
    3: "APPL (AAPL)",     # Apple Inc.
    4: "Amazon (AMZN)",   # Amazon.com Inc.
    5: "NVIDIA (NVDA)",   # NVIDIA Corporation
}

# Dictionary mapping timeframe IDs to minute values
time_frames = {
    1: 15,  # 15-minute intervals
    2: 60,  # 1-hour intervals
}

print(f"Configured {len(companies)} stocks and {len(time_frames)} timeframes")

## Data Loading and Processing

This section handles:
1. Reading stock data from CSV files
2. Processing and cleaning the data
3. Storing the processed data in the database

In [None]:
def process_stock_data(stock_id, symbol, time_frame):
    """
    Process stock data from CSV files and store in database.
    
    Parameters:
    -----------
    stock_id : int
        ID of the stock in the database
    symbol : str
        Stock symbol in format 'Name (TICKER)'
    time_frame : int
        Time frame in minutes (e.g., 15, 60)
        
    Returns:
    --------
    bool
        True if processing and storage successful, False otherwise
    """
    try:
        # Extract the ticker symbol from the format "Name (TICKER)"
        ticker = symbol.split('(')[-1].replace(')', '').strip()
        
        # Construct file path relative to project root
        file_path = f"../Data/Raw/Stocks/{ticker}{time_frame}.csv"
        print(f"Loading data from {file_path}")
        
        # Read and process the data
        df_original = pd.read_csv(file_path)
        
        # Create datetime index by combining date and time
        df_original['Date'] = pd.to_datetime(df_original['Date'] + ' ' + df_original['Time'])
        df_original['Date'] = df_original['Date'].astype('datetime64[s]')  # Standardize datetime format
        df_original = df_original.set_index('Date')
        
        # Drop the time column as it's now part of the index
        df_original = df_original.drop(columns=['Time'])
        
        # Clean data by removing any NaN values
        df_original = df_original.dropna()
        
        # Filter data to include only from 2019 onwards
        df_original = df_original.loc['2019-01-01':]
        
        # Store the processed data in the database
        rows_stored = db.store_stock_data(df_original, stock_id, ticker, time_frame)
        print(f"Stored {rows_stored} rows for {ticker} ({time_frame}m)")
        
        return True
        
    except FileNotFoundError:
        print(f"Error: File not found for {symbol} (ID: {stock_id}) with timeframe {time_frame}")
        return False
    except Exception as e:
        print(f"Error processing {symbol} (ID: {stock_id}): {str(e)}")
        return False

# Process all companies with both timeframes
successful_imports = 0
failed_imports = 0

for stock_id, symbol in companies.items():
    # Process 15-minute data
    if process_stock_data(stock_id, symbol, time_frames[1]):
        successful_imports += 1
    else:
        failed_imports += 1
        
    # Process 60-minute (hourly) data
    if process_stock_data(stock_id, symbol, time_frames[2]):
        successful_imports += 1
    else:
        failed_imports += 1

print(f"Data processing complete: {successful_imports} successful, {failed_imports} failed imports.")

KeyboardInterrupt: 

## Pattern Mining and Clustering

This section identifies important patterns in price data using the Pattern_Miner class, which:
1. Extracts perceptually important points (PIPs) from price series
2. Groups similar patterns using clustering techniques
3. Analyzes historical performance of each pattern cluster
4. Stores patterns and clusters in the database for later use

In [None]:
def perform_pattern_mining_for_all_stocks(n_pips=5, lookback=24, hold_period=6, returns_hold_period=12, time_frame=60):
    """
    Perform pattern mining and clustering for all stocks in the database.
    
    Parameters:
    -----------
    n_pips : int
        Number of perceptually important points to identify (default: 5)
    lookback : int
        Window size for pattern identification in candles (default: 24)
    hold_period : int
        Period to hold after pattern identification for outcome calculation (default: 6)
    returns_hold_period : int
        Extended period for calculating maximum gain/drawdown (default: 12)
    time_frame : int
        Time frame in minutes to use for analysis (default: 60)
        
    Returns:
    --------
    None
    """
    # Track processing statistics
    successful_processing = 0
    failed_processing = 0
    
    for stock_id, symbol in companies.items():
        try:
            print(f"\nProcessing {symbol} (ID: {stock_id}) with timeframe {time_frame} minutes...")
            
            # Fetch stock data from database
            df = db.get_stock_data(stock_id, time_frame)
            
            if df.empty:
                print(f"No data found for {symbol} (ID: {stock_id})")
                failed_processing += 1
                continue
                
            # Extract close prices for pattern mining
            arr = df['ClosePrice'].to_numpy()
            
            # Split data into train and test sets (80/20 split)
            train, test = train_test_split(arr, test_size=0.2, shuffle=False)
            print(f"Data split - Train: {train.shape}, Test: {test.shape}")
            
            # Create and train the pattern miner
            pip_miner = Pattern_Miner(n_pips, lookback, hold_period, returns_hold_period)
            pip_miner.train(train)
            
            # Print pattern mining results
            num_patterns = len(pip_miner._unique_pip_patterns)
            num_clusters = len(pip_miner._cluster_centers)
            print(f"Found {num_patterns} unique patterns grouped into {num_clusters} clusters")
            
            # Store the patterns and clusters in the database
            db.pip_pattern_miner = pip_miner
            
            # Store patterns with their properties
            patterns_stored = db.store_pattern_data(stock_id, pip_miner)
            print(f"Stored {patterns_stored} patterns in database")
            
            # Store clusters with their aggregated properties
            clusters_stored = db.store_cluster_data(stock_id, pip_miner)
            print(f"Stored {clusters_stored} clusters in database")
            
            # Associate patterns with their respective clusters
            db.bind_pattern_cluster(stock_id, pip_miner)
            
            # Calculate and update probability scores for all clusters
            db.update_all_cluster_probability_score(stock_id, pip_miner)
            
            print(f"Successfully processed patterns for {symbol} (ID: {stock_id})")
            successful_processing += 1
            
        except Exception as e:
            print(f"Error processing {symbol} (ID: {stock_id}): {str(e)}")
            failed_processing += 1
    
    # Close database connection
    db.close()
    print(f"\nPattern mining completed: {successful_processing} successful, {failed_processing} failed.")

# Run pattern mining with default parameters
print("Starting pattern mining process...")
perform_pattern_mining_for_all_stocks(n_pips=5, lookback=24, hold_period=6, returns_hold_period=12)


Processing GOLD (XAUUSD) (ID: 1) with timeframe 60...
Data split - Train: (29693,), Test: (7424,)
Successfully processed patterns for GOLD (XAUUSD) (ID: 1) with timeframe 60.

Processing BTC (BTCUSD) (ID: 2) with timeframe 60...
Data split - Train: (38004,), Test: (9502,)
Successfully processed patterns for BTC (BTCUSD) (ID: 2) with timeframe 60.

Processing APPL (AAPL) (ID: 3) with timeframe 60...
Data split - Train: (8804,), Test: (2202,)
Successfully processed patterns for APPL (AAPL) (ID: 3) with timeframe 60.

Processing Amazon (AMZN) (ID: 4) with timeframe 60...
Data split - Train: (8804,), Test: (2202,)
Successfully processed patterns for Amazon (AMZN) (ID: 4) with timeframe 60.

Processing NVIDIA (NVDA) (ID: 5) with timeframe 60...
Data split - Train: (8804,), Test: (2202,)
Successfully processed patterns for NVIDIA (NVDA) (ID: 5) with timeframe 60.
Closed connection to database: ../Data/data.db

Pattern mining completed for all stocks.


## Train/Test Split Date Ranges

This section calculates the appropriate date ranges for training and testing data, which is useful for backtesting and model evaluation.

In [None]:
def get_test_train_dates(stock_id, time_frame):
    """
    Determine the date ranges for training and testing datasets.
    
    Parameters:
    -----------
    stock_id : int
        ID of the stock in the database
    time_frame : int
        Time frame in minutes
        
    Returns:
    --------
    tuple
        ((train_start, train_end), (test_start, test_end)) date ranges
    """
    try:
        # Reconnect to database if needed
        if not hasattr(db, '_conn') or db._conn is None:
            db = Database()
            
        # Fetch stock data from database
        df = db.get_stock_data(stock_id, time_frame)
        
        if df.empty:
            print(f"No data found for stock ID: {stock_id}")
            return None, None
        
        # Get the first and last dates of the dataset
        start_date = df.index[0]
        end_date = df.index[-1]
        
        # Split the data into train (80%) and test (20%) sets
        train_start_date = start_date
        train_end_date = start_date + pd.DateOffset(days=int((end_date - start_date).days * 0.8))
        test_start_date = train_end_date + pd.DateOffset(days=1)
        test_end_date = end_date
        
        # Format dates for readability
        train_range = (train_start_date.strftime('%Y-%m-%d'), train_end_date.strftime('%Y-%m-%d'))
        test_range = (test_start_date.strftime('%Y-%m-%d'), test_end_date.strftime('%Y-%m-%d'))
        
        return train_range, test_range
        
    except Exception as e:
        print(f"Error fetching dates for stock ID {stock_id}: {str(e)}")
        return None, None
    
# Get train/test date ranges for GOLD (XAUUSD) with 1-hour timeframe
train_dates, test_dates = get_test_train_dates(1, 60)

if train_dates and test_dates:
    print(f"Training data period: {train_dates[0]} to {train_dates[1]}")
    print(f"Testing data period: {test_dates[0]} to {test_dates[1]}")
    
    # Calculate duration in days
    train_start = pd.to_datetime(train_dates[0])
    train_end = pd.to_datetime(train_dates[1])
    test_start = pd.to_datetime(test_dates[0])
    test_end = pd.to_datetime(test_dates[1])
    
    train_days = (train_end - train_start).days
    test_days = (test_end - test_start).days
    
    print(f"Training duration: {train_days} days ({train_days/365:.1f} years)")
    print(f"Testing duration: {test_days} days ({test_days/365:.1f} years)")
else:
    print("Could not retrieve date ranges")

Train Dates: (Timestamp('2019-01-02 01:00:00'), Timestamp('2024-01-08 01:00:00'))
Test Dates: (Timestamp('2024-01-09 01:00:00'), Timestamp('2025-04-10 23:00:00'))


## Pattern Visualization and Analysis

This section demonstrates how to visualize patterns and analyze their performance. It retrieves patterns from the database and shows their shapes, distributions, and predictive power.

In [None]:
def visualize_pattern_clusters(stock_id=1, time_frame=60, num_clusters_to_show=5):
    """
    Visualize pattern clusters from the database.
    
    Parameters:
    -----------
    stock_id : int
        ID of the stock to visualize patterns for
    time_frame : int
        Time frame in minutes
    num_clusters_to_show : int
        Number of clusters to display
    """
    try:
        # Reconnect to database if needed
        if not hasattr(db, '_conn') or db._conn is None:
            db = Database()
            
        # Get cluster data from database
        clusters = db.get_clusters(stock_id, time_frame=time_frame)
        
        if clusters.empty or len(clusters) == 0:
            print(f"No clusters found for stock ID {stock_id}")
            return
            
        # Limit the number of clusters to display
        clusters_to_show = min(num_clusters_to_show, len(clusters))
        
        # Set up the figure
        fig, axs = plt.subplots(clusters_to_show, 2, figsize=(14, 3*clusters_to_show))
        
        # Display each cluster
        for i in range(clusters_to_show):
            # Get cluster data
            cluster = clusters.iloc[i]
            price_points = np.array(cluster['AVGPricePoints'])
            
            # Plot the pattern shape
            axs[i, 0].plot(price_points, marker='o')
            axs[i, 0].set_title(f"Cluster {i+1}: {cluster['Label']}")
            axs[i, 0].grid(True)
            axs[i, 0].set_xlabel('Time Steps')
            axs[i, 0].set_ylabel('Normalized Price')
            
            # Plot expected outcome
            outcome = float(cluster['Outcome'])
            probability = float(cluster['ProbabilityScore']) if 'ProbabilityScore' in cluster else 0.5
            colors = ['red' if outcome < 0 else 'green']
            axs[i, 1].bar(['Return'], [outcome*100], color=colors)
            axs[i, 1].set_title(f"Expected Return: {outcome*100:.2f}% (Probability: {probability:.2f})")
            axs[i, 1].set_ylabel('Percentage Return')
            axs[i, 1].grid(True)
            axs[i, 1].axhline(y=0, color='black', linestyle='-', alpha=0.3)
            
        plt.tight_layout()
        plt.show()
        
    except Exception as e:
        print(f"Error visualizing patterns: {str(e)}")

# Visualize patterns for GOLD (XAUUSD) on 1-hour timeframe
print("Visualizing pattern clusters for GOLD (XAUUSD)...")
visualize_pattern_clusters(stock_id=1, time_frame=60, num_clusters_to_show=5)

## Summary and Next Steps

This notebook has demonstrated the complete workflow for stock pattern mining:

1. **Data Preparation**: Loading and processing stock data from CSV files
2. **Pattern Mining**: Identifying perceptually important points in price series
3. **Pattern Clustering**: Grouping similar patterns to find recurring market behaviors
4. **Performance Analysis**: Analyzing the predictive power of each pattern cluster
5. **Database Integration**: Storing all patterns, clusters, and analyses for later use

### Next Steps

1. Use the patterns for trading strategy development
2. Implement real-time pattern detection for live trading
3. Expand the analysis to include more technical indicators
4. Incorporate sentiment analysis to enhance pattern predictive power
5. Develop an automated backtesting system to evaluate pattern-based strategies

In [None]:
# Clean up and close database connection
try:
    db.close()
    print("Database connection closed successfully")
except:
    print("Database already closed or connection error")

print("Pattern analysis notebook completed")