# Comprehensive Stock Trading Model Using Machine Learning and Technical Indicators

## Introduction

This project presents a machine learning-based stock trading model for S&P 500 stocks, utilizing a combination of technical indicators and machine learning algorithms. The model is designed to predict stock price movements and generate actionable trading signals, adopting a conservative trading approach by limiting its trades to one share per day.

The focus of this model is to maximize profitability while minimizing risk over the long term. Using Yahoo Finance data spanning from 2010 to the present, the model analyzes historical price data, volume, and technical indicators to make informed buy and sell decisions. Tested via a stock market simulation, the model demonstrates an average return of **12% profit** in the past 365 market days.


## Data Collection and Feature Engineering

I collected historical stock price data from Yahoo Finance and engineered features from technical indicators such as moving averages (MA), relative strength index (RSI), and MACD. These indicators serve as input features for the machine learning model.



In [8]:
from SimulateDay import get_stock_data, preprocess_data, add_columns, stock_market_simulation

In [9]:
symbol = input('Enter the name of the company: ')
stock_data = get_stock_data(symbol)
stock_data.tail()

Unnamed: 0,Date,Symbol,Adj Close,Close,High,Low,Open,Volume
3728,2024-10-18,NFLX,763.890015,763.890015,766.281006,736.22998,737.640015,15827594.0
3729,2024-10-17,NFLX,687.650024,687.650024,704.409973,677.880005,704.349976,8820000.0
3730,2024-10-18,NFLX,763.890015,763.890015,766.281006,736.22998,737.640015,15827594.0
3731,2024-10-17,NFLX,687.650024,687.650024,704.409973,677.880005,704.349976,8820000.0
3732,2024-10-18,NFLX,0.0,763.890015,766.281006,736.22998,737.640015,15827594.0


These are the initail 5 rows of the data retrieved from yahoo finance, the `get_stock_data` function gets the stored data

In [10]:
stock_data = add_columns(stock_data)
stock_data.tail()

Adding columns...
Halfway There...


Unnamed: 0,Date,Symbol,Adj Close,Close,High,Low,Open,Volume,1_Day_Return,5_Day_Return,...,Support_20_Day,Resistance_50_Day,Support_50_Day,Volume_MA_10,Volume_MA_20,Volume_MA_50,Optimal_Action,Action,Z-score,OBV
3728,2024-10-18,NFLX,763.890015,763.890015,766.281006,736.22998,737.640015,15827594.0,11.087034,11.087034,...,687.650024,765.119995,665.77002,9614245.3,6025292.65,4059345.06,Sell,2,2.658764,1085440557
3729,2024-10-17,NFLX,687.650024,687.650024,704.409973,677.880005,704.349976,8820000.0,-9.980493,-9.980493,...,687.650024,765.119995,665.77002,10216535.3,6380797.65,4165525.06,Buy,1,2.28133,1076620557
3730,2024-10-18,NFLX,763.890015,763.890015,766.281006,736.22998,737.640015,15827594.0,11.087034,11.087034,...,687.650024,765.119995,665.77002,11506364.7,7031502.35,4421752.94,Sell,2,2.658764,1092448151
3731,2024-10-17,NFLX,687.650024,687.650024,704.409973,677.880005,704.349976,8820000.0,-9.980493,-9.980493,...,687.650024,765.119995,665.77002,12140804.9,7384592.35,4501890.94,Buy,1,2.28133,1083628151
3732,2024-10-18,NFLX,0.0,763.890015,766.281006,736.22998,737.640015,15827594.0,11.087034,11.087034,...,687.650024,765.119995,665.77002,12323797.0,8063152.05,4773736.82,Hold,2,2.658764,1099455745


## Feature Descriptions

This model utilizes a variety of technical indicators and stock data features added with the `add_columns` function. Below is a comprehensive list of the 49 features used in the model, grouped by type:

### 1. Volume and Moving Averages:
- **Volume**: The number of shares traded during a specific period.
- **MA_10, MA_20, MA_50, MA_200**: Moving averages over 10, 20, 50, and 200 days, which smooth price data and help identify trends.
- **Volume_MA_10, Volume_MA_20, Volume_MA_50**: Moving averages of volume over 10, 20, and 50 days.

### 2. Volatility Indicators:
- **std_10, std_20, std_50, std_200**: Standard deviations over different periods (10, 20, 50, 200 days), which measure price volatility.
- **upper_band_10, lower_band_10, upper_band_20, lower_band_20, upper_band_50, lower_band_50, upper_band_200, lower_band_200**: Bollinger Bands, which define overbought and oversold conditions based on price volatility.

### 3. Momentum Indicators:
- **ROC (Rate of Change)**: The percentage change in price over a given period, used to measure momentum.
- **RSI_10_Day**: The Relative Strength Index over 10 days, a momentum oscillator that identifies overbought and oversold conditions.
- **MACD (Moving Average Convergence Divergence)**: Measures the relationship between two moving averages to identify momentum shifts.
- **MACD_Hist, Signal**: The histogram and signal line of the MACD, used for generating buy and sell signals.

### 4. Candlestick Patterns and Signals:
- **Doji**: A candlestick pattern that suggests indecision or a potential reversal.
- **Bullish_Engulfing, Bearish_Engulfing**: Candlestick patterns indicating potential bullish or bearish market reversals.

### 5. Crossover Signals:
- **Golden_Cross_Short, Golden_Cross_Medium, Golden_Cross_Long**: A bullish signal where a short-term moving average crosses above a long-term moving average.
- **Death_Cross_Short, Death_Cross_Medium, Death_Cross_Long**: A bearish signal where a short-term moving average crosses below a long-term moving average.

### 6. Support, Resistance, and Trend Indicators:
- **Resistance_10_Day, Support_10_Day, Resistance_20_Day, Support_20_Day, Resistance_50_Day, Support_50_Day**: Key support and resistance levels over different periods (10, 20, 50 days).
- **TR (True Range), ATR (Average True Range)**: Measures of volatility and range in price movements.

### 7. Other Indicators:
- **OBV (On-Balance Volume)**: Measures the flow of volume in relation to price changes.
- **Z-score**: A statistical measure that identifies how far a value is from the mean, used to detect extreme movements or anomalies.


## Data Preprocessing

The `preprocess_data` function preprocess the data by removing missing values, handling outliers, and splitting the dataset for training and testing.


In [11]:
X_train, X_test, y_train, y_test = preprocess_data(stock_data)

Splitting data...


## Model Training and Hyperparameter Tuning

We use a LightGBM classifier and perform hyperparameter tuning using GridSearchCV to find the optimal parameters for predicting stock movements.
This has been done for every stock in the sp500 individually to maximixe model performance and minimize risk.


``` python
from lightgbm import LGBMClassifier
from sklearn.model_selection import GridSearchCV

# Define parameter grid for GridSearchCV
param_grid = {
    'num_leaves': [31, 50],
    'min_data_in_leaf': [20, 50],
    'max_depth': [-1, 10],
    'learning_rate': [0.01, 0.1],
    'n_estimators': [100, 200]
}

# Setup the LGBM classifier
model = LGBMClassifier(random_state=42, verbose=-1)
grid_search = GridSearchCV(
    model, param_grid, cv=3, scoring='accuracy', n_jobs=-1, verbose=0
)
grid_search.fit(X_train, y_train)



``` python 
# Get the best parameters
best_params = grid_search.best_params_

from sklearn.model_selection import cross_val_score, StratifiedKFold

# Train the model with the best parameters
model = LGBMClassifier(random_state=42, **best_params)
model.fit(X_train, y_train)

# Cross-validation for better evaluation
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
cv_scores = cross_val_score(model, X_train, y_train, cv=skf, scoring='accuracy')
print(f"Cross-validation accuracy for {stock_data['Symbol'][0]}: {cv_scores.mean():.4f}")

## Backtesting and Simulation

We perform backtesting by simulating buy/sell decisions based on the model's predictions and evaluate the overall performance of the trading strategy. The `stock_market_simulation` functions was created to act as the market for any given amount of days and will ask the model its decision then act based on it. This function does not factor in any taxes or fees.  


In [20]:
import joblib
import warnings

model = joblib.load(f'models/LGBMmodels/{symbol}_model.pkl')
results,_ =stock_market_simulation(model, initial_cash=10000, days=365,stock=stock_data.tail(365))

Day 0: Bought 1 share at 363.010009765625, Cash left: 9636.989990234375
Day 1: Bought 1 share at 355.989990234375, Cash left: 9281.0
Day 2: Holding, Cash: 9281.0, Shares held: 2
Day 3: Bought 1 share at 359.0, Cash left: 8922.0
Day 4: Holding, Cash: 8922.0, Shares held: 3
Day 5: Holding, Cash: 8922.0, Shares held: 3
Day 6: Bought 1 share at 395.2300109863281, Cash left: 8526.769989013672
Day 7: Bought 1 share at 403.1300048828125, Cash left: 8123.639984130859
Day 8: Bought 1 share at 400.4700012207031, Cash left: 7723.169982910156
Day 9: Bought 1 share at 403.5400085449219, Cash left: 7319.629974365234
Day 10: Bought 1 share at 399.2900085449219, Cash left: 6920.3399658203125
Day 11: Sold shares at 399.7699890136719, Cash: 7320.109954833984
Day 12: Holding, Cash: 7320.109954833984, Shares held: 7
Day 13: Holding, Cash: 7320.109954833984, Shares held: 7
Day 14: Sold shares at 423.9700012207031, Cash: 7744.0799560546875
Day 15: Holding, Cash: 7744.0799560546875, Shares held: 6
Day 16: Bo

In [21]:
results

Unnamed: 0,Stock Name,Day,Action,Cash,Shares Held,Portfolio Value,Stock Price,Date
0,NFLX,0,Buy,9636.989990,1,10000.000000,363.010010,2023-05-22
1,NFLX,1,Buy,9281.000000,2,9992.979980,355.989990,2023-05-23
2,NFLX,2,Hold,9281.000000,2,10010.700012,364.850006,2023-05-24
3,NFLX,3,Buy,8922.000000,3,9999.000000,359.000000,2023-05-25
4,NFLX,4,Hold,8922.000000,3,10058.640015,378.880005,2023-05-26
...,...,...,...,...,...,...,...,...
360,NFLX,360,Hold,2764.950073,12.38335,12224.467432,763.890015,2024-10-18
361,NFLX,361,Sell,3452.600098,11.38335,11280.360955,687.650024,2024-10-17
362,NFLX,362,Hold,3452.600098,11.38335,12148.227442,763.890015,2024-10-18
363,NFLX,363,Sell,4140.250122,10.38335,11280.360955,687.650024,2024-10-17


These results are for the given stock in the past year if the model was given 10,000 and the ability to trade.

## Results and Performance

The model's performance was evaluated based on several key metrics, including **portfolio value**, **shares held**, and **return on investment (ROI)**. To ensure comprehensive testing, a balanced set of stocks was chosen, considering their varying movements across the past year. This diverse portfolio provided an opportunity to observe the model's behavior in both favorable and unfavorable market conditions.

### Stock Selection:
The model was tested on the following **S&P 500** stocks, representing a mix of winners, losers, and flat performers over the past year:
- **Winners**: AAPL, MSFT, NFLX, TSLA, META, MMM, CCL
- **Losers**: INTC, T, DIS, VZ, PFE
- **Flat Performers**: XOM, KO, JNJ, PG, WMT, MCD

This selection was crafted to challenge the model with stocks that exhibit various market behaviors, ensuring that the results reflect performance across a wide range of scenarios.

In [23]:
import pandas as pd

results = pd.read_csv('simResults/sim_results.csv')
results.describe()

Unnamed: 0,Day,Stock Price,Cash,Shares Held,Portfolio Value
count,6570.0,6570.0,6570.0,6570.0,6570.0
mean,182.0,161.945223,7371.956712,25.594662,10306.417619
std,105.374048,154.299995,3385.701015,49.133886,955.582588
min,0.0,10.67,0.0,-0.748866,8111.480103
25%,91.0,41.0275,6262.939945,2.0,9982.938947
50%,182.0,106.555,8964.539993,11.0,10015.690002
75%,273.0,230.259995,9847.009169,29.78373,10207.194979
max,364.0,765.119995,14990.519348,387.0,16342.439837


### Model Performance Summary

This data frame highlights key aspects of the model's performance over the past year. It provides a snapshot of how the model manages cash, stock holdings, and portfolio value, as well as how it responds to market conditions.

- **Cash Management**:  
  The model holds an average cash balance of **$7,371**, indicating that it seldom invests all available funds in the market at once. This conservative approach ensures liquidity, aligning with the model's restriction of buying or selling only one share at a time, with one exception: if there are five consecutive buy signals, the model purchases five shares. This strategy was tested extensively and was shown to **maximize profits** without negatively impacting potential losses.

- **Shares Held**:  
  On average, the model holds **25 shares** at any given time. This is a positive outcome, as the goal of the model is to maximize the portfolio's value rather than accumulate cash. By maintaining a balance between investing and preserving liquidity, the model successfully builds the user’s portfolio, as intended, by maintaining an active presence in the market.

- **Portfolio Value**:  
  The portfolio maintains an average value of **$10,306**, with a typical **3% profit** over the year. This figure indicates that the model **consistently avoids negative returns**, a promising sign for long-term profitability. The portfolio's steady growth, combined with its measured risk approach, suggests that the model is capable of yielding positive returns even under varying market conditions.

- The **standard deviation** of the portfolio value (**$955.58**) reflects modest fluctuations, which is expected in a dynamic trading strategy.
- The **median portfolio value** of **$10,015.69** shows that, for most of the year, the portfolio hovered slightly above the breakeven point.
- The **maximum portfolio value** reached **$16,342**, which shows the potential upside of the strategy, particularly in favorable market conditions.


### Final Portfolio Value by Stock

In [25]:
import altair as alt

def get_final_portfolio_values(df):
    # Group by 'Stock Name' and get the last row for each group
    final_values = df.groupby('Stock Name').apply(lambda x: x.iloc[-1])
    
    # Extract 'Stock Name' and 'Portfolio Value' columns
    result = final_values[['Stock Name', 'Portfolio Value','Shares Held']].reset_index(drop=True)
    
    return result

final_portfolio_values = get_final_portfolio_values(results)
final_portfolio_values['Profit %'] = (final_portfolio_values['Portfolio Value'] - 10000) / 10000 * 100
alt.Chart(final_portfolio_values).mark_bar().encode(
    x='Stock Name',
    y='Profit %',
    color=alt.condition(
        alt.datum['Profit %'] > 0,
        alt.value('green'),
        alt.value('red')
    )
).properties(
    title='Final Portfolio Value by Stock',
    width=800,
    height=400
).configure_axis(
    labelAngle=45
).display()

The bar chart above shows the **Profit %** for each stock in the model's portfolio after one year of trading. The stocks are listed on the x-axis, while the y-axis represents the percentage of profit (or loss) realized by the model for each stock.

- **Top Performers**: Stocks such as **Meta (META)** and **Netflix (NFLX)** delivered the highest returns, with **Meta** showing a significant profit above 45%, while **Netflix** generated around 30%.
- **Consistent Gainers**: Stocks like **Apple (AAPL)** and **T (AT&T)** also performed well, showing gains of approximately 25% and 20%, respectively.
- **Small Gains**: Companies such as **Coca-Cola (KO)**, **Johnson & Johnson (JNJ)**, and **McDonald's (MCD)** had more modest gains, falling between 5% and 10%.
- **Losers**: A few stocks, such as **Pfizer (PFE)** and **Intel (INTC)**, recorded losses, which are represented by the red bars dipping below 0%. These losses are very small and are to be expected from a dynamic trading model.

This chart provides a quick, clear overview of the model's performance across a diverse set of S&P 500 stocks, showcasing the overall effectiveness of the trading strategy while highlighting potential areas of improvement in stock selection or risk management.


## Conclusion

The stock trading model has shown promising results with a conservative trading approach. The next steps involve optimizing the model further by incorporating transaction costs, taxes, and potentially adding more advanced risk management strategies.
