# Stock Analysis & Trading System - Colab Quickstart

This notebook demonstrates the complete pipeline for stock analysis and backtesting:
1. Mount Google Drive
2. Install dependencies
3. Load stock data (Parquet OHLCV files)
4. Compute technical indicators (SMA, RSI)
5. Run backtests with multiple strategies
6. Scan for trading opportunities
7. Launch interactive Dash UI

**Note:** This system is optimized for Google Colab + Google Drive workflow.

## 1. Setup: Mount Google Drive

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Verify mount
import os
print("Drive mounted successfully!" if os.path.exists('/content/drive/MyDrive') else "Drive mount failed")

## 2. Clone Repository and Install Dependencies

In [None]:
# Clone the repository
!git clone https://github.com/vimala1500/sandt_v1.0.git
%cd sandt_v1.0

# Install dependencies
!pip install -q -r requirements.txt

print("\nInstallation complete!")

## 3. Prepare Sample Data (Optional)

If you don't have Parquet files in Google Drive yet, this cell creates sample data for testing.

In [None]:
import pandas as pd
import numpy as np
from pathlib import Path

# Create sample data directory
data_path = Path('/content/drive/MyDrive/stock_data')
data_path.mkdir(parents=True, exist_ok=True)

# Generate sample OHLCV data for a few symbols
def generate_sample_data(symbol, days=1000):
    """Generate sample OHLCV data."""
    dates = pd.date_range(end=pd.Timestamp.now(), periods=days, freq='D')
    
    # Generate random walk for prices
    np.random.seed(hash(symbol) % 2**32)
    returns = np.random.randn(days) * 0.02  # 2% daily volatility
    close = 100 * np.exp(np.cumsum(returns))
    
    # Generate OHLCV
    df = pd.DataFrame({
        'Date': dates,
        'Open': close * (1 + np.random.randn(days) * 0.01),
        'High': close * (1 + np.abs(np.random.randn(days)) * 0.02),
        'Low': close * (1 - np.abs(np.random.randn(days)) * 0.02),
        'Close': close,
        'Volume': np.random.randint(1000000, 10000000, days)
    })
    
    return df

# Generate sample data for a few symbols
sample_symbols = ['AAPL', 'GOOGL', 'MSFT', 'AMZN', 'TSLA']
for symbol in sample_symbols:
    df = generate_sample_data(symbol)
    df.to_parquet(data_path / f'{symbol}.parquet', index=False)
    print(f"Created sample data for {symbol}")

print(f"\nSample data created in: {data_path}")

## 4. Compute Technical Indicators

Calculate SMA and RSI indicators for all symbols.

In [None]:
from data_loader import DataLoader
from indicator_engine import IndicatorEngine

# Initialize
data_loader = DataLoader('/content/drive/MyDrive/stock_data')
indicator_engine = IndicatorEngine('/content/drive/MyDrive/indicators')

# List available symbols
symbols = data_loader.list_available_symbols()
print(f"Found {len(symbols)} symbols: {symbols}")

# Load data
print("\nLoading stock data...")
data_dict = data_loader.load_multiple_symbols(symbols)
print(f"Loaded {len(data_dict)} symbols")

# Compute indicators
print("\nComputing indicators...")
indicator_engine.process_multiple_symbols(
    data_dict,
    sma_periods=[20, 50, 200],
    rsi_periods=[14],
    show_progress=True
)

print("\nIndicators computed successfully!")

## 5. Run Backtests

Test multiple strategies across all symbols using Numba-accelerated backtesting.

In [None]:
from backtest_engine import BacktestEngine
from strategy import DEFAULT_STRATEGIES

# Initialize backtest engine
backtest_engine = BacktestEngine('/content/drive/MyDrive/backtests')

# Load indicators for backtesting
data_with_indicators = {}
for symbol in symbols:
    data = indicator_engine.load_indicators(symbol)
    if data is not None:
        data_with_indicators[symbol] = data

print(f"Running backtests for {len(data_with_indicators)} symbols...\n")

# Run backtests with default strategies
strategy_configs = list(DEFAULT_STRATEGIES.values())
results_df = backtest_engine.run_multiple_backtests(
    data_with_indicators,
    strategy_configs,
    show_progress=True
)

# Display results
print("\n" + "="*60)
print("BACKTEST RESULTS SUMMARY")
print("="*60)
print(f"Total backtests: {len(results_df)}")
print(f"\nAverage metrics:")
print(results_df[['cagr', 'sharpe_ratio', 'win_rate', 'max_drawdown']].mean())

print(f"\n\nTop 10 by Sharpe Ratio:")
print(results_df.nlargest(10, 'sharpe_ratio')[['symbol', 'strategy', 'sharpe_ratio', 'cagr', 'win_rate']])

## 6. Run Live Scans

Find stocks matching specific conditions with backtest performance data.

In [None]:
from scanner import Scanner

# Initialize scanner
scanner = Scanner(indicator_engine, backtest_engine)

# Scan for oversold stocks (RSI < 30)
print("Scanning for oversold stocks (RSI < 30)...")
oversold = scanner.scan_rsi_oversold(symbols, rsi_period=14, threshold=30)
print(f"\nFound {len(oversold)} oversold stocks:")
if len(oversold) > 0:
    print(oversold)

# Scan for overbought stocks (RSI > 70)
print("\n" + "="*60)
print("Scanning for overbought stocks (RSI > 70)...")
overbought = scanner.scan_rsi_overbought(symbols, rsi_period=14, threshold=70)
print(f"\nFound {len(overbought)} overbought stocks:")
if len(overbought) > 0:
    print(overbought)

# Get top performers
print("\n" + "="*60)
print("Top performers by Sharpe ratio (RSI strategy)...")
top = scanner.get_top_performers('rsi_meanrev', metric='sharpe_ratio', min_trades=5, top_n=10)
if len(top) > 0:
    print(top[['symbol', 'sharpe_ratio', 'cagr', 'win_rate', 'num_trades']])

## 7. Launch Interactive Dash UI

Start the web-based UI for interactive analysis. Click the link to open the app.

In [None]:
# Note: To use Dash in Colab, you need ngrok or similar tunneling service
# This is a basic example - for production use, set up proper tunneling

from dash_ui import create_app

# Create Dash app
ui = create_app(
    indicator_path='/content/drive/MyDrive/indicators',
    backtest_path='/content/drive/MyDrive/backtests'
)

# For Colab, you would typically use:
# !pip install pyngrok
# from pyngrok import ngrok
# public_url = ngrok.connect(8050)
# print(f"Dash app available at: {public_url}")

# Run in thread for Colab
import threading

def run_app():
    ui.run(host='0.0.0.0', port=8050, debug=False)

thread = threading.Thread(target=run_app)
thread.daemon = True
thread.start()

print("Dash UI started!")
print("Note: In Colab, use ngrok for public URL. See code comments.")

## Usage Notes

### Pipeline Modes

You can run the pipeline in different modes using the `main.py` script:

```python
from main import Pipeline

# Initialize
pipeline = Pipeline(
    data_path='/content/drive/MyDrive/stock_data',
    indicator_path='/content/drive/MyDrive/indicators',
    backtest_path='/content/drive/MyDrive/backtests'
)

# Run full pipeline
pipeline.run_full_pipeline()

# Or run individual stages
pipeline.run_indicators_only()
pipeline.run_backtests_only()
pipeline.run_scan('rsi_oversold', threshold=20)
```

### Data Organization

- **Stock Data**: `/content/drive/MyDrive/stock_data/*.parquet`
- **Indicators**: `/content/drive/MyDrive/indicators/indicators.h5` and `config.json`
- **Backtests**: `/content/drive/MyDrive/backtests/results.zarr`, `summary.parquet`, `metadata.json`

### Adding Custom Strategies

See `strategy.py` for examples. Add your strategy function to the `StrategyRegistry` class.

### Performance Tips

- Indicators and backtests are cached in Google Drive
- Use Numba-accelerated functions for custom indicators
- Zarr provides efficient chunked storage for large backtest results
- HDF5 stores indicator time series efficiently