This project demonstrates an ETL (Extract, Transform, Load) process using Python, integrated with SAP GUI. The goal is to extract data from SAP, transform it according to business needs, and load it into a specified destination.
Ensure that you have the following installed:
- Python 3.12
- SAP GUI
- Git
- Virtual environment setup tools
To get a copy of the project locally, run:
git clone git@github.com:jasonssdev/etl-sap-gui.git
- On Windows (Git Bash):
python -m venv .venv
- On Windows (Git Bash):
source .venv/Scripts/activate
- On Windows (CMD):
.venv\Scripts\activate.bat
- On Windows (PowerShell):
.venv\Scripts\activate.ps1
- On Windows (Git Bash):
pip install -r requirements.txt
To run the scripts correctly, ensure that your Python environment is set up properly:
- CMD (Windows)
set PYTHONPATH=%PYTHONPATH%;C:\Users\youruser\repository
- BASH (Windows)
export PYTHONPATH="$PYTHONPATH:/c/Users/youruser/repository"
- PowerShell (Windows)
$env:PYTHONPATH = "$env:PYTHONPATH;C:\Users\youruser\repository"
Check the PYTHONPATH to ensure it's set:
echo %PYTHONPATH%
You can configure the project in VS Code by modifying the settings.json file. This ensures that the IDE is using the correct Python interpreter and paths.
- To open the settings file:
notepad %APPDATA%\Code\User\settings.json
Add the following configuration:
{
"python.pythonPath": "${workspaceFolder}/.venv/Scripts/python.exe",
"terminal.integrated.env.windows": {
"PYTHONPATH": "${workspaceFolder}/src"
}
}
Once everything is set up, you can start running your ETL scripts within the activated virtual environment. Be sure to activate the virtual environment every time before running your Python scripts.
├── LICENSE
|
├── README.md <- The top-level README for developers using this project
|
├── data
│ ├── preprocessed <- Data before to upload to remote server
| |
│ └── raw <- The original data, immutable data dump
│
├── notebooks <- Jupyter notebooks, where code was tested
│
├── references <- Scripts for reference
│
├── requirements.txt <- The requirements file for reproducing the environment
|
├── .gitignore <- Directories and files to ignore in git
│
└── src <- Source code directory for the project
|
| ├─ sap_extract <- Python scripts to extract Data
| |
| ├─ script_transform <- Python scripts to transform Data
| |
| ├─ sql_server_load <- Python scripts to load Data
| |
| ├─ main_2.py <- main script to run every 24 hours
| |
| └─ main.py <- main script to run every 2 hours
|
├── run_mat_main_2.bat <- script to run the app automatically every 24 hour
│
├── run_mat_main.bat <- script to run the app automatically every 2 hour
|
├── .env <- file to handle environment variables
This project is licensed under the Apache License 2.0. See the LICENSE file for details.
If you encounter issues with virtual environment activation, check the system's execution policy on Windows:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
Feel free to submit pull requests or open issues to suggest improvements or report bugs.