GDP Data Pipeline

Description

This project extracts GDP data for South American countries from the World Bank API and loads it into a PostgreSQL database. The data is then queried to produce a pivoted report for the last 5 years.

Setup

Prerequisites

Docker
Docker Compose

Steps to Run

Clone the repository:

git clone https://github.com/vitorjpc10/ETL-GDP-of-South-American-countries-using-the-World-Bank-API.git

Move to the newly cloned repository:

cd ETL-GDP-of-South-American-countries-using-the-World-Bank-API

Build and run the Docker containers:
```
docker-compose up --build
```
The data will be extracted and loaded, based on the logic from main.py, into the PostgreSQL database.

To generate the pivoted report, access the PostgreSQL database and execute the query.sql SQL File:

docker exec -it etl-gdp-of-south-american-countries-using-the-world-bank-api-db-1 psql -U postgres -c "\i query.sql"

Assumptions and Design Decisions

The project uses Docker and Docker Compose for containerization and orchestration to ensure consistent development and deployment environments.
Docker volumes are utilized to persist PostgreSQL data, ensuring that the data remains intact even if the containers are stopped or removed.
The PostgreSQL database is selected for data storage due to its reliability, scalability, and support for SQL queries.
Pure Python and SQL are used for data manipulation without the use of dataframe libraries, ensuring lightweight and efficient data processing.
The World Bank API is assumed to return consistent and accurate data. To potentially minimize the number of requests and make the process faster and more optimal, the per_page parameter in the API endpoint is increased.
The SQL query for generating the pivoted report is stored in a separate file (query.sql). This allows for easy modification of the query and provides a convenient way to preview the results.
To generate the pivoted report, the SQL query is executed within the PostgreSQL database container. This approach simplifies the process and ensures that the query can be easily run and modified as needed. Additionally, the query results are previewed in the terminal after successfully loading the data to the database for convenience.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yaml		docker-compose.yaml
img.png		img.png
main.py		main.py
query.sql		query.sql
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GDP Data Pipeline

Description

Setup

Prerequisites

Steps to Run

Assumptions and Design Decisions

Output Sample

About

Releases

Packages

Languages

vitorjpc10/ETL-GDP-of-South-American-countries-using-the-World-Bank-API

Folders and files

Latest commit

History

Repository files navigation

GDP Data Pipeline

Description

Setup

Prerequisites

Steps to Run

Assumptions and Design Decisions

Output Sample

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages