# **Quantitative Trading Strategy Using Machine Learning**

## **Introduction**:
The **Quantitative Trading Strategy** project applies machine learning and data-driven insights to optimize trading decisions in the stock market. Utilizing historical financial data from the S&P 500, this project builds predictive models to forecast stock price movements and generate buy/sell signals. By integrating machine learning techniques, this system automates the analysis of large datasets, allowing for the development of effective trading strategies that aim to outperform traditional benchmarks.

### **Project Description**:
QuantStock is a stock performance evaluation and benchmarking system that analyzes the historical performance of selected S&P 500 stocks and compares them against a market benchmark. Using machine learning models, QuantStock calculates financial metrics, backtests trading strategies, and generates reports that help investors optimize their stock portfolios based on historical data and risk-adjusted returns.

### **Key Questions**

1. How can stock performance be benchmarked against the S&P 500 using machine learning models?
2. Which technical indicators can improve stock price prediction accuracy?
3. What are the most effective machine learning models for stock market analysis (RandomForest, XGBoost, LSTM)?
4. How can automated reporting and visualization aid decision-making for investors?

### **Technologies**

- Python (Pandas, Scikit-learn, yFinance, TensorFlow)
- Machine Learning (RandomForest, XGBoost, LSTM)
- Financial Indicators (SMA, EMA, RSI)
- Data Visualization (Matplotlib, Seaborn)
- Automated Reporting (FPDF for generating reports)

### **The system includes**:

**Predictive Model for Stock Movements**:
Utilizes machine learning models (Random Forest, XGBoost, LSTM) to predict stock price trends and generate actionable buy/sell signals for S&P 500 stocks.

**Technical Indicators**:
Incorporates essential technical analysis metrics such as Simple Moving Average (SMA), Exponential Moving Average (EMA), and Relative Strength Index (RSI) to support more informed trading decisions and improve prediction accuracy.

**Backtesting and Portfolio Simulation**:
Simulates the historical performance of the trading strategy using backtesting techniques, allowing users to assess key financial metrics like Sharpe Ratio, Cumulative Returns, and Maximum Drawdown.

**Financial Metrics**:
Analyzes the risk and return of the trading strategy through key financial indicators, including volatility, Sharpe Ratio, and risk-adjusted returns, to help traders evaluate profitability and manage risk effectively.

In [3]:
import os
os.chdir("/home/vic3/github/Quantitative-Trading-Strategy-using-Machine-Learning-Statistical-Modeling-on-Financial-Data/Quantitative_Trading_Strategy")
print(os.getcwd())

/home/vic3/github/Quantitative-Trading-Strategy-using-Machine-Learning-Statistical-Modeling-on-Financial-Data/Quantitative_Trading_Strategy


In [4]:
import sys
sys.path.append('/home/vic3/github/Quantitative-Trading-Strategy-using-Machine-Learning-Statistical-Modeling-on-Financial-Data/Quantitative_Trading_Strategy/scripts')

In [5]:
%load_ext autoreload
%autoreload 2

In [66]:
%%capture
! pip install -r '/home/vic3/github/Quantitative-Trading-Strategy-using-Machine-Learning-Statistical-Modeling-on-Financial-Data/Quantitative_Trading_Strategy/requirements.txt'

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


 ### 1. Import necessary libraries and scripts

In [51]:
# Import necessary libraries

import talib
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scripts.fetch_stock_data import fetch_stock_data
from scripts.models import train_random_forest, train_xgboost, train_lstm_model, scale_features
from scripts.backtest_strategy import generate_signals, backtest_strategy
from scripts.generate_report import generate_report
from scripts.sentiment_analysis import get_sentiment



Some layers from the model checkpoint at yiyanghkust/finbert-tone were not used when initializing TFBertForSequenceClassification: ['dropout_37']
- This IS expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertForSequenceClassification were initialized from the model checkpoint at yiyanghkust/finbert-tone.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForSequenceClassification for predictions without further training.


Sentiment: [{'label': 'Positive', 'score': 0.9999997615814209}]


## 2. Set up the stock symbols and parameters

In [52]:
# Fetch live stock data
symbols = ['AAPL', 'MSFT', 'GOOGL', 'SPY']
start_date = "2018-01-01"
initial_balance = 10000

### 3. Fetch stock data and perform feature engineering

In [53]:
stock_data = fetch_stock_data(symbols, start_date=start_date)

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed


In [54]:
# Display stock data to verify
print(stock_data['AAPL'].head())

                 Open       High        Low      Close  Adj Close     Volume  \
Date                                                                           
2018-01-02  42.540001  43.075001  42.314999  43.064999  40.568924  102223600   
2018-01-03  43.132500  43.637501  42.990002  43.057499  40.561871  118071600   
2018-01-04  43.134998  43.367500  43.020000  43.257500  40.750282   89738400   
2018-01-05  43.360001  43.842499  43.262501  43.750000  41.214230   94640000   
2018-01-08  43.587502  43.902500  43.482498  43.587502  41.061142   82271200   

            SMA_20  RSI  
Date                     
2018-01-02     NaN  NaN  
2018-01-03     NaN  NaN  
2018-01-04     NaN  NaN  
2018-01-05     NaN  NaN  
2018-01-08     NaN  NaN  


### 4. Sentiment Analysis (Optional - Using FinBERT)

In [55]:
# Example of analyzing sentiment from a news article
news_article = "The company's stock surged after reporting record earnings."
sentiment_result = get_sentiment(news_article)
print(f"Sentiment Analysis Result: {sentiment_result}")

Sentiment Analysis Result: [{'label': 'Positive', 'score': 0.9999997615814209}]


### 5. Prepare Data for Model Training

In [62]:
def prepare_data(ticker_data):
    # Ensure 'Close', 'SMA_20', and 'RSI' columns are present
    ticker_data = ticker_data[['Close', 'SMA_20', 'RSI']].dropna()
    
    # Prepare feature matrix X and target vector y
    X = ticker_data[['SMA_20', 'RSI']]
    y = pd.Series(np.where(ticker_data['Close'].shift(-1) > ticker_data['Close'], 1, 0), index=X.index)  # Binary classification for stock price movement
    
    # Split into training, validation, and test sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=42)  # 20% validation split
    
    return X_train, X_val, X_test, y_train, y_val, y_test

### 6. Train Models for Each Stock Symbol

In [1]:
all_backtests = {}

for ticker in symbols:
    print(f"Training models for {ticker}")
    ticker_data = stock_data[ticker]
    
    # Prepare data
    X_train, X_val, X_test, y_train, y_val, y_test = prepare_data(ticker_data)
    
    # Scale features
    X_train_scaled, X_val_scaled, X_test_scaled = scale_features(X_train, X_val, X_test)
    
    # Train models
    rf_model, rf_accuracy, rf_report = train_random_forest(X_train_scaled, y_train, X_val_scaled, y_val)
    xgb_model, xgb_accuracy, xgb_report = train_xgboost(X_train_scaled, y_train, X_val_scaled, y_val)
    lstm_model, lstm_accuracy, lstm_report = train_lstm_model(X_train_scaled, y_train, X_val_scaled, y_val)
    
    # Print model results
    print(f"Random Forest Validation Accuracy for {ticker}: {rf_accuracy}")
    print(f"XGBoost Validation Accuracy for {ticker}: {xgb_accuracy}")
    print(f"LSTM Validation Accuracy for {ticker}: {lstm_accuracy}")
    
    # ## 7. Backtesting the Models
    
    # Generate signals for the test set
    rf_signals = pd.Series(rf_model.predict(X_test), index=ticker_data.index[-len(X_test):])
    xgb_signals = pd.Series(xgb_model.predict(X_test), index=ticker_data.index[-len(X_test):])

    X_test_reshaped = X_test.values.reshape((X_test.shape[0], X_test.shape[1], 1))
    lstm_predictions = lstm_model.predict(X_test_reshaped).flatten()
    lstm_signals = pd.Series((lstm_predictions > 0.5).astype(int), index=ticker_data.index[-len(X_test):])

    # Backtest the models
    rf_backtest = backtest_strategy(ticker_data.copy(), rf_signals)
    xgb_backtest = backtest_strategy(ticker_data.copy(), xgb_signals)
    lstm_backtest = backtest_strategy(ticker_data.copy(), lstm_signals)
    
    # Store the backtest results
    all_backtests[ticker] = {'RandomForest': rf_backtest, 'XGBoost': xgb_backtest, 'LSTM': lstm_backtest}