Data Engineering

This repository presents a set of research-driven projects by Iuliia Vitiugova. The focus is on designing efficient data pipelines, experimenting with data structures, and applying advanced transformations in a reproducible, professional manner.

The work includes both Jupyter Notebooks for transparent exploration and Python modules for reusable code.

Repository Structure

data-engineering-portfolio/
├── notebooks/        # Research notebooks (cleaned: outputs cleared, uniform headers)
├── src/              # Reusable Python code auto-extracted from notebooks
├── requirements.txt  # Minimal environment (auto-detected from imports)
├── README.md
└── LICENSE

Notebooks

TP1_Data_Engineering.ipynb — Prototype data pipeline: loading, cleaning, validation.
TP2_Data_Engineering.ipynb — Transformations at scale and performance profiling.
TP3_Data_Engineering.ipynb — Storage strategies, efficient joins, memory profiling.
Project_Data_Engineering.ipynb — End-to-end pipeline from ingestion to reporting.

All notebooks are standardized with cover pages, reproducibility notes, and cleared outputs.

Source Code

Auto-extracted top-level functions/classes consolidated in src/common.py for reuse.

Installation

pip install -r requirements.txt

▶ Usage

Run any notebook from the notebooks/ folder via Jupyter Lab/Notebook. You can also import utilities:

from src.common import *

Research Scope & Highlights

Modular data pipelines: ingestion → validation → preprocessing → transformation → analysis.
Efficient data structures and memory-aware operations.
Reproducibility: clean outputs, fixed kernel metadata, minimal dependencies.
Clear documentation: cover pages, section templates, and conclusions.

License

MIT License. See LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Engineering

Repository Structure

Notebooks

Source Code

Installation

▶ Usage

Research Scope & Highlights

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
notebooks		notebooks
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

vitjuli/data-engineering-portfolio

Folders and files

Latest commit

History

Repository files navigation

Data Engineering

Repository Structure

Notebooks

Source Code

Installation

▶ Usage

Research Scope & Highlights

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages