An end-to-end Machine Learning workflow focused on stock price forecasting. This project encompasses everything from the ML development process to the ML production lifecycle.
- Forecasting stock price using Machine Learning
- Real-time dashboard showing forecasted values
- Observability dashboard for ML models and their performance
- Open APIs for data retrieval
- BinanceAPI - for data retrieval from Binance
- MageAI - for data processing pipeline
- DVC - for data versioning control
- PyCaret - for Automated ML model training
- Weights & Biases - for model experiment tracking
- Aporia's MLNotify - for model training monitoring service (Aporia official website)
- Docker - for system containerization and deployment
- BentoML - for model serving (BentoML official website)
- Yatai - for model serving at Scale on Kubernetes (BentoML official website)
- Caddy - for reverse proxy of services
- Grafana and Prometheus - for system, model, hardware observability
- Coming soon
System Architecture (coming soon)
Functional Architecture (coming soon)
# clone this repository
git clone https://github.com/thakorneyp11/stock-price-prediction.git
# change directory to project
cd stock-price-prediction
# create virtual environment (`pip3 install virtualenv` if not installed)
virtualenv env
# activate virtual environment
source env/bin/activate
# install dependencies
pip3 install -r requirements.txt
Data Sources:
- Binance Data Dumper: included data from 2017 to Now
- Kaggle - prasoonkottarathil/btcinusd: only included data from 2017-2021
- Binance API: only support few candles per request, not suitable for historical data retrieval
Download Historical Data:
- Download historical data from Binance Data Dumper:
python3 data_download.py
- Raw CSV dataset:
dataset/BTCUSDT_15m_Aug2017-Oct2023.csv
Retrieve Real-time Data:
- sample script can be found in
data_retrieval.py
(later will scheduled executed using MageAI) - note: need to update
.env
file with Binance API key and secret
![Raw CSV dataset](https://private-user-images.githubusercontent.com/58812639/284072561-4961dd43-c409-4416-9552-773de503d6fd.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjAzODA3NTIsIm5iZiI6MTcyMDM4MDQ1MiwicGF0aCI6Ii81ODgxMjYzOS8yODQwNzI1NjEtNDk2MWRkNDMtYzQwOS00NDE2LTk1NTItNzczZGU1MDNkNmZkLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzA3VDE5MjczMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTU4MGJhMGNiMzlkMzk1NWE2ZTVjMDE2YWNhNmRmMmI3ODU0MDAwYWU2Nzg2YmM4NjlkMzY2OWU0ZTY2ODA5N2ImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.k4fsSjxLn88RY0EDistaZhZDsEp9GZmgdThGfsdkIYg)
- Exploratory Data Analysis (EDA):
eda.ipynb
- Feature Engineering:
feature_engineering.ipynb
(reference) - Processed CSV dataset: 1)
dataset/feature_extracted_data.csv
and 2)dataset/feature_selected_data.csv
- Coming soon
- Coming soon
- Coming soon
- Coming soon
- Coming soon
- Coming soon