The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
-
Updated
May 31, 2024 - Python
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Automates the collection, transformation, and presentation of web analytics data from Google Analytics 4 and Google Search Console into Google Sheets for streamlined reporting and analysis.
Leveraging AWS Cloud Services, an ETL pipeline transforms YouTube video statistics data. Data is downloaded from Kaggle, uploaded to an S3 bucket, and cataloged using AWS Glue for querying with Athena. AWS Lambda and Glue converts to Parquet format and stores it in a cleansed S3 bucket. AWS QuickSight then visualizes the materialised data.
Pythonic tool for orchestrating machine-learning/high performance/quantum-computing workflows in heterogeneous compute environments.
Code for "Efficient Data Processing in Spark" Course
Aqueduct Core is responsible for the core functionality of Aqueduct, an experiment management system.
An end-to-end, offline, audio categorization, transcription, and segmentation.
ETL pipeline using Pulumi, AWS services, and Snowflake for automated data flow.
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
A Data Engineering project. Repository for backend infrastructure and Streamlit app files for a Premier League Dashboard.
A convenience tool for small-scale data pipelines in Python
A repository for the Methods of Advanced Data Engineering course at FAU
This repository hosts materials for the Docker for Data Engineers workshop, offering hands-on exercises and resources tailored for data engineering professionals.
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.
Wikidata and Wikipedia language data extraction
Streamlined Data Pipelining with Dagster: Advanced Solutions for Robust and Observable Data Workflows.
A dbt package to support modelling event data via split tables for use in downstream tools and systems.
Pipeline de dados automatizado para extração e armazenamento de previsões meteorológicas para o setor de turismo.
Project involved the development of a data pipeline using airflow and python. The data pipeline ingested trending movies' and distributors' data from imdb and box office, cleansed, formatted, combined and indexed the data on elastic search. Also, a dashboard was created from the data using kibana analytics. The tools and libraries used in this p…
Add a description, image, and links to the data-pipeline topic page so that developers can more easily learn about it.
To associate your repository with the data-pipeline topic, visit your repo's landing page and select "manage topics."