# Model Training: Basic Signal Classification

This notebook trains a basic classifier to predict whether the next week's return will be positive (1) or negative (0) based on technical indicators.

**Workflow:**
- Load historical data with features (from `data_prep.py`)
- Engineer a binary target: 1 if next week's close > this week's close, else 0
- Train/test split
- Train RandomForestClassifier
- Evaluate performance
- Save model for later use


In [None]:
import os
import glob
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
import joblib


## Load Data
Load all CSVs generated by `data_prep.py` from the `data/` folder.

In [None]:
data_dir = os.path.join(os.path.dirname(os.path.abspath('')), 'data')
all_files = glob.glob(os.path.join(data_dir, '*.csv'))
dfs = [pd.read_csv(f, index_col=0, parse_dates=True) for f in all_files]
df = pd.concat(dfs)
df = df.sort_index()
print(f'Loaded {len(df)} rows from {len(all_files)} files.')


## Create Target Variable
Target = 1 if next week's close > this week's close, else 0.
We'll shift the 'close' column by -1 to get next week's close.

In [None]:
df['next_close'] = df['close'].shift(-1)
df['target'] = (df['next_close'] > df['close']).astype(int)
df = df.dropna(subset=['rsi_14', 'atr_21', 'sma_20', 'ema_20', 'target'])
print(df[['close','next_close','target']].head())


## Feature Selection
We'll use RSI, ATR, SMA, EMA as features.

In [None]:
features = ['rsi_14', 'atr_21', 'sma_20', 'ema_20']
X = df[features]
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)
print(f'Train size: {len(X_train)}, Test size: {len(X_test)}')


## Train RandomForestClassifier


In [None]:
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)


## Evaluate Model


In [None]:
y_pred = clf.predict(X_test)
print('Accuracy:', accuracy_score(y_test, y_pred))
print('Confusion Matrix:
', confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))


## Save Model
You can now use this model for signal generation.

In [None]:
joblib.dump(clf, 'rf_signal_classifier.joblib')
print('Model saved as rf_signal_classifier.joblib')
