This repository contains the complete workflow for handling stock data, from initial acquisition to final visualization. This README will guide you through setting up the project, understanding the workflow, and using the provided scripts.
- Introduction
- Workflow Overview
- Setup Instructions
- Data Acquisition
- Data Loading into MySQL
- Data Transformation
- Visualization with Flask
- Designing the Web Interface
- Example Use Case
- Challenges and Solutions
- Future Enhancements
- Contributing
The goal of this project is to provide a robust and efficient workflow for processing stock data. This involves acquiring the data, loading it into a database, transforming it, and visualizing it through a web interface.
- Data Acquisition: Setting up a landing area and using a Python script to automate data ingestion.
- Data Loading into MySQL: Parsing files, ensuring data integrity, and storing them in MySQL.
- Data Transformation: Using ETL processes to transform raw data into an intermediate layer.
- Visualization with Flask: Creating a web application to visualize and interact with the transformed data.
- Designing the Web Interface: Building a responsive and interactive web page for data exploration.
-
Clone the Repository:
git clone git@github.com:yogeshmapari/Stock_Data_Processing.git cd Stock_Data_Processing
-
Create a Virtual Environment and Install Dependencies:
python3 -m venv venv source venv/bin/activate pip install -r requirements.txt
-
Set Up MySQL Database:
Install MySQL and create a database for the project. Update the database configuration in config.py. Run the Data Acquisition Script: run below sql code in mysql for all basic database creations
python data_acquisition.py
-
Run the Data Loading Script:
python file_based_trigger.py
-
Run the Data Transformation Script:
python modify_tables_tranformatin.py
-
Run the Flask Application:
flask run
-
open url in local browser:
Establish a dedicated area for incoming stock data files. we called it as landing area. folder name= landing_area
A Python script monitors this area for new files and triggers the data loading process. we have file_based_trigger.py file that will check the folder if any new files are their it will load them in mysql raw layer as table.
Processes incoming files and loads them into MySQL. we have design the python file i.e raw_load.py for the loading file data to mysql.
Ensures data is correctly parsed and validated before insertion.
we will move files to archival area once we are done with the data loading. may or may nor due to some issue we got data corrption we will load the file again by accessing it from archival area.
folder name= archive_area
Transforms raw data into an intermediate layer. i.e we are doing some calculation on base data we recived as
New colunms added are ----> Daily_Return, MA_5_Day, MA_10_Day, Volatility, RSI, MACD, Signal_9, Upper_Band, Lower_Band, Gain, Loss, AvgGain, AvgLoss, EMA_12, EMA_26, MA_20, SD_20
Extracts, transforms, and loads data to prepare it for analysis.
A lightweight framework for building web applications.
Retrieves transformed data and renders it dynamically using Jinja templates. we get the data for all the stock and will make them avilable on webpage for analysis.
Features interactive charts and tables for data exploration.
Responsive design ensures accessibility across devices.
- Data variability
- Performance
- Integration
- Data quality checks
- Optimization techniques
- Continuous improvement
- we will try to impliment live batch processing on local device
- we will try to my scraping for the stock data from internet.
- will add airflow based batch processing.
Incorporate machine learning for predictive analysis.
Enable real-time data streaming and analysis.
Iterate based on user interaction and requirements.
For contributions, feedback, or issues, please contact Yogesh Mapari at patilmapari@gmail.com.
https://medium.com/@patilmapari/stock-data-processing-workflow-5426d1df9a33