Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
-
Updated
May 26, 2024 - Python
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Prefect is a workflow orchestration tool empowering developers to build, observe, and react to data pipelines
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
An orchestration platform for the development, production, and observation of data assets.
Always know what to expect from your data.
Turns Data and AI algorithms into production-ready web applications in no time.
🐚 Python-powered, cross-platform, Unix-gazing shell.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
The Open Source Feature Store for Machine Learning
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
Compare tables within or across databases
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
⚡ Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
Implementing best practices for PySpark ETL jobs and applications.
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.
Clean APIs for data cleaning. Python implementation of R package Janitor
Python Stream Processing
Add a description, image, and links to the data-engineering topic page so that developers can more easily learn about it.
To associate your repository with the data-engineering topic, visit your repo's landing page and select "manage topics."