# Stock Market Prediction
***
## Table of Contents

***

In [None]:
import torch
import torch.nn as nn
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from datetime import date

## 1. Introduction


## 2. Device Agnostic Code
Mac GPU acceleration (`mps` backend) delivers significant speed-up over CPU for deep learning tasks, especially for large models and batch sizes. On Windows, `cuda` is used instead of `mps`.

However, during the training process, I encountered several issues with LSTM networks on the MPS backend (e.g., error metrics became substantially higher compared to the CPU, etc.). Therefore, the CPU will be used throughout this project for '*safety*'.

**Reference**:
- [Training results from using MPS backend are poor compared to CPU and CUDA](https://github.com/pytorch/pytorch/issues/109457)
- [MPS backend produces bad training results in comparison to other backends](https://github.com/pytorch/pytorch/issues/92615)
- [Memory Leak in MPS Backend During LSTM Iterations (Out of Memory Error)](https://github.com/pytorch/pytorch/issues/145374)

In [115]:
# Set device
# device = "cuda" if torch.cuda.is_available() else "cpu"  # For Windows
# device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")  # For Mac
device = torch.device("cpu")  # For Mac, safer device use
device

device(type='cpu')

## 3. Loading Dataset

The default DataFrame has a two-level column index: `(column name, ticker)`, so the second level will be removed.

In [122]:
TICKERS = ["AAPL"]  # Apple Inc. stock data
START_DATE = "2015-01-01"
END_DATE = "2025-12-31"
df = yf.download(TICKERS, start=START_DATE, end=END_DATE, auto_adjust=True)
df.columns = df.columns.droplevel(1)  # Drop the second level

[*********************100%***********************]  1 of 1 completed


In [124]:
print("="*50)
print(df.head())
print("="*50)
print(f'Shape of the dataset: {df.shape}')
print("="*50)
print(df.info())
print("="*50)
print(f"Count of null values: {df.isnull().sum().sum()}")
print("="*50)
print(df.describe())

Price           Close       High        Low       Open     Volume
Date                                                             
2015-01-02  24.288574  24.757328  23.848700  24.746220  212818400
2015-01-05  23.604328  24.137509  23.417716  24.057531  257142000
2015-01-06  23.606554  23.866479  23.244435  23.668758  263188400
2015-01-07  23.937571  24.037541  23.704304  23.815383  160423600
2015-01-08  24.857309  24.915071  24.148623  24.266369  237458000
Shape of the dataset: (2661, 5)
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2661 entries, 2015-01-02 to 2025-08-01
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Close   2661 non-null   float64
 1   High    2661 non-null   float64
 2   Low     2661 non-null   float64
 3   Open    2661 non-null   float64
 4   Volume  2661 non-null   int64  
dtypes: float64(4), int64(1)
memory usage: 124.7 KB
None
Count of null values: 0
Price        Close         High          L