# Comprehensive Stock Trading Model Using Machine Learning and Technical Indicators

## Introduction

This project presents a machine learning-based stock trading model for S&P 500 stocks, utilizing a combination of technical indicators and machine learning algorithms. The model is designed to predict stock price movements and generate actionable trading signals, adopting a conservative trading approach by limiting its trades to one share per day.

The focus of this model is to maximize profitability while minimizing risk over the long term. Using Yahoo Finance data spanning from 2010 to the present, the model analyzes historical price data, volume, and technical indicators to make informed buy and sell decisions. Tested via a stock market simulation, the model demonstrates an average return of **12% profit** year-to-date (YTD).


## Data Collection and Feature Engineering

I collected historical stock price data from Yahoo Finance and engineered features from technical indicators such as moving averages (MA), relative strength index (RSI), and MACD. These indicators serve as input features for the machine learning model.



In [1]:
from SimulateDay import get_stock_data, preprocess_data, add_columns

In [3]:

stock_data = get_stock_data(input('Enter the name of the company: '))
stock_data.tail()

Unnamed: 0,Date,Symbol,Adj Close,Close,High,Low,Open,Volume
3715,2024-10-08,NVDA,132.889999,132.889999,133.479996,129.419998,130.259995,285722500.0
3716,2024-10-09,NVDA,132.649994,132.649994,134.520004,131.380005,134.110001,246191600.0
3717,2024-10-10,NVDA,134.809998,134.809998,135.0,131.0,131.910004,242311300.0
3718,2024-10-11,NVDA,134.800003,134.800003,135.779999,133.660004,134.009995,169732000.0
3719,2024-10-17,NVDA,0.0,136.929993,140.889999,136.869995,139.350006,302722767.0


These are the initail 5 rows of the data retrieved from yahoo finance, the `get_stock_data` function gets the stored data

In [4]:
stock_data = add_columns(stock_data)
stock_data.head()

Adding columns...


  '10_Day_Return', '20_Day_Return', '50_Day_Return', '200_Day_Return']].idxmax(axis=1)


Halfway There...


Unnamed: 0,Date,Symbol,Adj Close,Close,High,Low,Open,Volume,1_Day_Return,5_Day_Return,10_Day_Return,20_Day_Return,50_Day_Return,200_Day_Return,Best_Return_Window,Best_Return,close_lag1,close_lag2,close_lag3,close_lag4,close_lag5,volume_lag1,volume_lag2,volume_lag3,volume_lag4,volume_lag5,MA_10,MA_20,MA_50,MA_200,std_10,std_20,std_50,std_200,upper_band_10,lower_band_10,upper_band_20,lower_band_20,upper_band_50,lower_band_50,...,lower_band_200,Golden_Cross_Short,Golden_Cross_Medium,Golden_Cross_Long,Death_Cross_Short,Death_Cross_Medium,Death_Cross_Long,ROC,AVG_Volume_10,AVG_Volume_20,AVG_Volume_50,AVG_Volume_200,Doji,Bullish_Engulfing,Bearish_Engulfing,EMA_short,EMA_long,MACD,Signal,MACD_Hist,Previous_Close,TR,ATR,RSI_10_Day,10_Day_ROC,20_Day_ROC,50_Day_ROC,Resistance_10_Day,Support_10_Day,Resistance_20_Day,Support_20_Day,Resistance_50_Day,Support_50_Day,Volume_MA_10,Volume_MA_20,Volume_MA_50,Optimal_Action,Action,Z-score,OBV
200,2010-10-19,NVDA,0.11922,0.28225,0.28425,0.2755,0.27775,866136000.0,-0.616205,2.450086,-0.26502,0.0,17.116179,-38.93997,50,17.116179,0.284,0.28225,0.279,0.2755,0.283,448156000.0,639940000.0,599172000.0,605208000.0,753588000.0,0.276525,0.28415,0.26201,0.338471,0.006435,0.010789,0.022472,0.079083,0.289395,0.263655,0.305728,0.262572,0.306955,0.217065,...,0.180306,0,0,0,0,0,0,-0.616205,709604800.0,820566600.0,836189040.0,770317820.0,False,False,False,0.28,0.276136,0.003864,0.004891,-0.001027,0.284,0.00875,0.009975,49.171264,-0.26502,0.0,17.116179,0.284,0.2675,0.3065,0.2675,0.3065,0.222,709604800.0,820566600.0,836189040.0,Hold,1,-0.499564,0
201,2010-10-20,NVDA,0.11922,0.28225,0.29,0.2775,0.2825,763532000.0,0.0,-0.440917,4.730983,-0.877973,22.186138,-39.818768,50,22.186138,0.28225,0.284,0.28225,0.2835,0.2695,866136000.0,448156000.0,639940000.0,968732000.0,1057604000.0,0.2778,0.284025,0.263035,0.337537,0.006145,0.010796,0.022196,0.078635,0.29009,0.26551,0.305617,0.262433,0.307427,0.218643,...,0.180268,0,0,0,0,0,0,0.0,680197600.0,808743400.0,836008480.0,770492240.0,True,False,False,0.280346,0.276589,0.003757,0.004664,-0.000907,0.28225,0.0125,0.009475,70.078782,4.730983,-0.877973,22.186138,0.284,0.2675,0.3065,0.2675,0.3065,0.222,680197600.0,808743400.0,836008480.0,Hold,1,-0.499564,0
202,2010-10-21,NVDA,0.117108,0.27725,0.283,0.273,0.2825,1014428000.0,-1.771478,-0.627248,3.644852,-4.561099,24.887382,-41.260595,50,24.887382,0.28225,0.28225,0.284,0.279,0.2675,763532000.0,866136000.0,448156000.0,599172000.0,761488000.0,0.278775,0.283362,0.26414,0.336564,0.004995,0.010785,0.021475,0.078166,0.288765,0.268785,0.304932,0.261793,0.30709,0.22119,...,0.180232,0,0,0,0,0,0,-1.771478,705491600.0,797421200.0,840374080.0,772318540.0,False,False,False,0.27987,0.276638,0.003232,0.004378,-0.001146,0.28225,0.01,0.009325,64.028765,3.644852,-4.561099,24.887382,0.284,0.27025,0.3065,0.2675,0.3065,0.224,705491600.0,797421200.0,840374080.0,Hold,0,-0.499774,-1014428000
203,2010-10-22,NVDA,0.124606,0.295,0.29725,0.276,0.279,1322676000.0,6.402162,4.517272,8.655616,-3.75204,31.696419,-36.250676,50,31.696419,0.27725,0.28225,0.28225,0.28225,0.2715,1014428000.0,763532000.0,866136000.0,639940000.0,706932000.0,0.281125,0.282787,0.26556,0.335725,0.006495,0.009742,0.021111,0.077704,0.294115,0.268135,0.302272,0.263303,0.307782,0.223338,...,0.180318,0,0,0,0,0,0,6.402162,767066000.0,809206200.0,839955920.0,776192960.0,False,False,False,0.282197,0.277998,0.004199,0.004342,-0.000143,0.27725,0.02125,0.01035,74.226814,8.655616,-3.75204,31.696419,0.295,0.27025,0.3,0.2675,0.3065,0.22875,767066000.0,809206200.0,839955920.0,Hold,2,-0.499029,308248000
204,2010-10-25,NVDA,0.125767,0.29775,0.3,0.2945,0.29725,673136000.0,0.932207,4.841545,10.175765,-0.750005,26.837058,-35.79515,50,26.837058,0.295,0.27725,0.28225,0.284,0.27025,1322676000.0,1014428000.0,763532000.0,448156000.0,442680000.0,0.283875,0.282675,0.26682,0.334895,0.007166,0.009544,0.021115,0.077214,0.298207,0.269543,0.301763,0.263587,0.30905,0.22459,...,0.180466,1,0,0,0,0,0,0.932207,790111600.0,792887800.0,817470800.0,777167800.0,True,False,False,0.28459,0.279461,0.005129,0.004499,0.00063,0.295,0.0055,0.0103,77.500013,10.175765,-0.750005,26.837058,0.29775,0.2755,0.29925,0.2675,0.3065,0.22875,790111600.0,792887800.0,817470800.0,Hold,0,-0.498914,981384000


## Feature Descriptions

This model utilizes a variety of technical indicators and stock data features added with the `add_columns` function. Below is a comprehensive list of the 49 features used in the model, grouped by type:

### 1. Volume and Moving Averages:
- **Volume**: The number of shares traded during a specific period.
- **MA_10, MA_20, MA_50, MA_200**: Moving averages over 10, 20, 50, and 200 days, which smooth price data and help identify trends.
- **Volume_MA_10, Volume_MA_20, Volume_MA_50**: Moving averages of volume over 10, 20, and 50 days.

### 2. Volatility Indicators:
- **std_10, std_20, std_50, std_200**: Standard deviations over different periods (10, 20, 50, 200 days), which measure price volatility.
- **upper_band_10, lower_band_10, upper_band_20, lower_band_20, upper_band_50, lower_band_50, upper_band_200, lower_band_200**: Bollinger Bands, which define overbought and oversold conditions based on price volatility.

### 3. Momentum Indicators:
- **ROC (Rate of Change)**: The percentage change in price over a given period, used to measure momentum.
- **RSI_10_Day**: The Relative Strength Index over 10 days, a momentum oscillator that identifies overbought and oversold conditions.
- **MACD (Moving Average Convergence Divergence)**: Measures the relationship between two moving averages to identify momentum shifts.
- **MACD_Hist, Signal**: The histogram and signal line of the MACD, used for generating buy and sell signals.

### 4. Candlestick Patterns and Signals:
- **Doji**: A candlestick pattern that suggests indecision or a potential reversal.
- **Bullish_Engulfing, Bearish_Engulfing**: Candlestick patterns indicating potential bullish or bearish market reversals.

### 5. Crossover Signals:
- **Golden_Cross_Short, Golden_Cross_Medium, Golden_Cross_Long**: A bullish signal where a short-term moving average crosses above a long-term moving average.
- **Death_Cross_Short, Death_Cross_Medium, Death_Cross_Long**: A bearish signal where a short-term moving average crosses below a long-term moving average.

### 6. Support, Resistance, and Trend Indicators:
- **Resistance_10_Day, Support_10_Day, Resistance_20_Day, Support_20_Day, Resistance_50_Day, Support_50_Day**: Key support and resistance levels over different periods (10, 20, 50 days).
- **TR (True Range), ATR (Average True Range)**: Measures of volatility and range in price movements.

### 7. Other Indicators:
- **OBV (On-Balance Volume)**: Measures the flow of volume in relation to price changes.
- **Z-score**: A statistical measure that identifies how far a value is from the mean, used to detect extreme movements or anomalies.


## Data Preprocessing

The `preprocess_data` function preprocess the data by removing missing values, handling outliers, and splitting the dataset for training and testing.


In [7]:
X_train, X_test, y_train, y_test = preprocess_data(stock_data)

Splitting data...


## Model Training and Hyperparameter Tuning

We use a LightGBM classifier and perform hyperparameter tuning using GridSearchCV to find the optimal parameters for predicting stock movements.
This has been done for every stock in the sp500 indic=vidually to maximixe model performance and minimize risk.


In [8]:
from lightgbm import LGBMClassifier
from sklearn.model_selection import GridSearchCV

# Define parameter grid for GridSearchCV
param_grid = {
    'num_leaves': [31, 50],
    'min_data_in_leaf': [20, 50],
    'max_depth': [-1, 10],
    'learning_rate': [0.01, 0.1],
    'n_estimators': [100, 200]
}

# Setup the LGBM classifier
model = LGBMClassifier(random_state=42, verbose=-1)
grid_search = GridSearchCV(
    model, param_grid, cv=3, scoring='accuracy', n_jobs=-1, verbose=0
)
grid_search.fit(X_train, y_train)

# Get the best parameters
best_params = grid_search.best_params_
best_params

KeyboardInterrupt: 

In [None]:
from sklearn.model_selection import cross_val_score, StratifiedKFold

# Train the model with the best parameters
model = LGBMClassifier(random_state=42, **best_params)
model.fit(X_train, y_train)

# Cross-validation for better evaluation
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
cv_scores = cross_val_score(model, X_train, y_train, cv=skf, scoring='accuracy')
print(f"Cross-validation accuracy for {stock_data['Symbol'][0]}: {cv_scores.mean():.4f}")

## Backtesting and Simulation

We perform backtesting by simulating buy/sell decisions based on the model's predictions and evaluate the overall performance of the trading strategy.


In [None]:
# Simulate portfolio performance
portfolio = 10000  # Starting cash
shares_held = 0
for day in test_data.index:
    signal = model.predict(test_data.loc[day, ['MA_10', 'MA_20', 'Volume', 'RSI_10_Day', 'MACD']].values.reshape(1, -1))[0]
    if signal == 1 and portfolio >= test_data.loc[day, 'Close']:  # Buy signal
        shares_held += 1
        portfolio -= test_data.loc[day, 'Close']
    elif signal == -1 and shares_held > 0:  # Sell signal
        portfolio += test_data.loc[day, 'Close']
        shares_held -= 1

# Record portfolio value
test_data['Portfolio Value'] = portfolio + (shares_held * test_data['Close'])

# Plotting portfolio value over time
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 5))
plt.plot(test_data.index, test_data['Portfolio Value'])
plt.title('Portfolio Value Over Time')
plt.ylabel('Portfolio Value')
plt.xlabel('Date')
plt.show()

## Results and Performance

We evaluate the performance of the model by analyzing the portfolio value, profit percentage, and key financial metrics such as return on investment (ROI).


In [None]:
# Performance metrics summary
performance_summary = test_data[['Portfolio Value']].describe()
performance_summary

## Conclusion

The stock trading model has shown promising results with a conservative trading approach. The next steps involve optimizing the model further by incorporating transaction costs, taxes, and potentially adding more advanced risk management strategies.
