Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
-
Updated
Jun 21, 2024 - Python
Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
Amazon SageMaker Local Mode Examples
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.
Free High-Quality Financial Data in Azure
Schema mappings in SQL and PySpark for ELT pipelines to normalize data to OCSF
Example of local pyspark setup including DeltaLake for unit-testing
Data Streaming with Debezium, Kafka, Spark Streaming, Delta Lake, and MinIO
Spark data pipeline that processes movie ratings data.
Formula1 Load, Ingestion, Transformation and analysis pipeline
Spark Structured Streaming data pipeline that processes movie ratings data in real-time.
A Delta Lake reader for Dask
Sample project to demonstrate data engineering best practices
Building a poor man's data lake: Exploring the Power of Polars and Delta Lake
Completed the SQL Basics for Data Science Specialization from the University of California, Davis, gaining proficiency in Data Analysis, SQL, Apache Spark, and Delta Lake.
Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data
A quick example for Delta Lake running on AWS EMR Serverless Spark
deltalake tutorial w/ spark, hive, hadoop
Add a description, image, and links to the delta-lake topic page so that developers can more easily learn about it.
To associate your repository with the delta-lake topic, visit your repo's landing page and select "manage topics."