Skip to content

jasonssdev/etl-sap-gui

Repository files navigation

ETL SAP GUI with Python

This project demonstrates an ETL (Extract, Transform, Load) process using Python, integrated with SAP GUI. The goal is to extract data from SAP, transform it according to business needs, and load it into a specified destination.

Tech Stack

Python  Pandas  Jupyter  SQL_Server  PowerBI  SAP  Terminal  Windows  Git  Github  VSCode 

Getting Started

Prerequisites

Ensure that you have the following installed:

Installation

1. Clone the Repository

To get a copy of the project locally, run:

git clone git@github.com:jasonssdev/etl-sap-gui.git

2. Set Up a Virtual Environment

  • On Windows (Git Bash):
python -m venv .venv

3. Set Up a Virtual Environment

  • On Windows (Git Bash):
source .venv/Scripts/activate
  • On Windows (CMD):
.venv\Scripts\activate.bat
  • On Windows (PowerShell):
.venv\Scripts\activate.ps1

4. Install Dependencies

  • On Windows (Git Bash):
pip install -r requirements.txt

Environment Setup

To run the scripts correctly, ensure that your Python environment is set up properly:

  • CMD (Windows)
set PYTHONPATH=%PYTHONPATH%;C:\Users\youruser\repository
  • BASH (Windows)
export PYTHONPATH="$PYTHONPATH:/c/Users/youruser/repository"
  • PowerShell (Windows)
$env:PYTHONPATH = "$env:PYTHONPATH;C:\Users\youruser\repository"

Check the PYTHONPATH to ensure it's set:

echo %PYTHONPATH%

Configuration (VS Code)

You can configure the project in VS Code by modifying the settings.json file. This ensures that the IDE is using the correct Python interpreter and paths.

  • To open the settings file:
notepad %APPDATA%\Code\User\settings.json

Add the following configuration:

{
    "python.pythonPath": "${workspaceFolder}/.venv/Scripts/python.exe",
    "terminal.integrated.env.windows": {
        "PYTHONPATH": "${workspaceFolder}/src"
    }
}

Configuration (VS Code)

Once everything is set up, you can start running your ETL scripts within the activated virtual environment. Be sure to activate the virtual environment every time before running your Python scripts.

Project Structure

    ├── LICENSE
    |
    ├── README.md  <- The top-level README for developers using this project
    |
    ├── data
    │       ├── preprocessed  <- Data before to upload to remote server
    |       |
    │       └── raw  <- The original data, immutable data dump
    │
    ├── notebooks   <- Jupyter notebooks, where code was tested
    │
    ├── references  <- Scripts for reference
    │
    ├── requirements.txt <- The requirements file for reproducing the environment
    |
    ├── .gitignore  <- Directories and files to ignore in git
    │
    └── src  <- Source code directory for the project
            |
    |       ├─ sap_extract   <- Python scripts to extract Data
    |       |
    |       ├─ script_transform   <- Python scripts to transform Data
    |       |
    |       ├─ sql_server_load   <- Python scripts to load Data
    |       |
    |       ├─ main_2.py   <- main script to run every 24 hours
    |       |
    |       └─ main.py   <- main script to run every 2 hours
    |
    ├── run_mat_main_2.bat  <- script to run the app automatically every 24 hour
    │
    ├── run_mat_main.bat  <- script to run the app automatically every 2 hour
    |
    ├── .env  <- file to handle environment variables

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

Troubleshooting

If you encounter issues with virtual environment activation, check the system's execution policy on Windows:

Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser

Contributing

Feel free to submit pull requests or open issues to suggest improvements or report bugs.

Social media contact

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published