data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
-
Updated
Jun 28, 2024 - Python
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Enterprise-grade, production-hardened, serverless data lake on AWS
Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
Apache Spark 3 - Structured Streaming Course Material
The DBT of ML, as Aligned describes data dependencies in ML systems, and reduce technical data debt
Learn how to use Kinesis Firehose, AWS Glue, S3, and Amazon Athena by streaming and analyzing reddit comments in realtime. 100-200 level tutorial.
A Fast, Declarative, and Extensible ETL Framework for Graph Databases.
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
A collection of data engineering projects: data modeling, ETL pipelines, data lakes, infrastructure configuration on AWS, data warehousing, containerization, and a dashboard to monitor data pipeline KPIs
a simple lakeFS webhook for pre-commit and pre-merge validation of data objects
Prominent data platform design with AWS well-architected framework
Data Lake on the Edge
Create Data Lake on AWS S3 to store dimensional tables after processing data using Spark on AWS EMR cluster
wrapper for multiple linkml storage engines (alpha software)
An end-to-end data pipeline for building Data Lake and supporting report using Apache Spark.
Serverless streaming ETL in with glue job & querying with Athena
Add a description, image, and links to the data-lake topic page so that developers can more easily learn about it.
To associate your repository with the data-lake topic, visit your repo's landing page and select "manage topics."