Compare tables within or across databases
-
Updated
May 17, 2024 - Python
Compare tables within or across databases
Efficient data transformation and modeling framework that is backwards compatible with dbt.
Code and data for the Modern Polars book
end-to-end data engineering project to get insights from PyPi using python, duckdb, MotherDuck & Evidence
A Data Platform built for AWS, powered by Kubernetes.
Simple stream processing pipeline
Found a data engineering challenge or participated in a selection process ? Share with us!
Data engineering interviews Q&A for data community by data community
Build, test, deploy, iterate - Dev and prod tool for data science pipelines
Build & Learn Data Engineering,Machine Learning over Kubernetes. No Shortcut approach.
Sample project that use Dagster, dbt, DuckDB and Dash to visualize car and motorcycle Spanish market
Data Engineering/Scraping Project. Creating a detailed Sports Relational Database for the Top European Soccer Leagues.
Project for "Data pipeline design patterns" blog.
An open-source project dedicated to constructing robust data pipelines and scalable software infrastructure. We leverage industry-standard tools favored by developers to enhance efficiency and reliability. Uniquely, these pipelines are field-tested on farms across Sumatra, Indonesia, ensuring real-world applicability and resilience.
This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenAI LLM, Kafka and Elasticsearch. It covers each stage from data acquisition, processing, sentiment analysis with ChatGPT, production to kafka topic and connection to elasticsearch.
kedro cli plugin for generating a static kedro viz site (html, css, js) that can be deployed on many serverless tools.
The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.
An End-to-End ETL data pipeline that leverages pyspark parallel processing to process about 25 million rows of data coming from a SaaS application using Apache Airflow as an orchestration tool and various data warehouse technologies and finally using Apache Superset to connect to DWH for generating BI dashboards for weekly reports
Add a description, image, and links to the dataengineering topic page so that developers can more easily learn about it.
To associate your repository with the dataengineering topic, visit your repo's landing page and select "manage topics."