delta-lake

Here are 45 public repositories matching this topic...

Nike-Inc / koheesio

Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.

python pyspark data-engineering pydantic delta-lake

Updated Jun 21, 2024
Python

aws-samples / amazon-sagemaker-local-mode

Star

Amazon SageMaker Local Mode Examples

Updated Jun 19, 2024
Python

The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.

framework big-data spark data-engineering databricks data-quality delta-lake great-expectations lakehouse configuration-driven

Updated Jun 17, 2024
Python

cheukhin1024 / Financial-Data-Project-in-Azure

Star

Free High-Quality Financial Data in Azure

python database apache-spark azure sparkml financial-data azure-databricks mlflow delta-lake sckit-learn

Updated Jun 15, 2024
Python

databricks-industry-solutions / ocsf

Star

Schema mappings in SQL and PySpark for ELT pipelines to normalize data to OCSF

sql pyspark cybersecurity ocsf delta-lake delta-live-tables

Updated May 28, 2024
Python

frizzleqq / pyspark-deltalake

Star

Example of local pyspark setup including DeltaLake for unit-testing

spark pytest pyspark delta-lake

Updated May 28, 2024
Python

trannhatnguyen2 / streaming_data_processing

Star

Data Streaming with Debezium, Kafka, Spark Streaming, Delta Lake, and MinIO

airflow kafka minio spark-streaming debezium delta-lake

Updated May 15, 2024
Python

guidok91 / spark-movies-etl

Star

Spark data pipeline that processes movie ratings data.

spark etl poetry pyspark data-engineering elt data-pipeline apache-airflow delta-lake

Updated Jun 17, 2024
Python

Giovanni2414 / formula1-databricks-pipeline

Star

Formula1 Load, Ingestion, Transformation and analysis pipeline

etl data-engineering azure-storage databricks etl-pipeline data-architecture delta-lake

Updated Apr 21, 2024
Python

harrydevforlife / building-lakehouse

Star

Building Data Lakehouse by open source technology. Support end to end data pipeline, from source data on AWS S3 to Lakehouse, visualize and recommend app.

python airflow spark s3 metabase minio dbt flask-api hive-metastore delta-lake lakehouse

Updated Apr 20, 2024
Python

guidok91 / spark-structured-streaming-kafka

Star

Spark Structured Streaming data pipeline that processes movie ratings data in real-time.

streaming real-time kafka spark apache-spark etl pyspark data-engineering apache-kafka spark-structured-streaming delta-lake

Updated Apr 19, 2024
Python

dask-contrib / dask-deltatable

Star

A Delta Lake reader for Dask

python parquet dask delta-lake dask-dataframes

Updated Jun 26, 2024
Python

josephmachado / data_engineering_best_practices

Star

Sample project to demonstrate data engineering best practices

spark etl pyspark data-engineering minio delta-lake great-expectations

Updated Feb 24, 2024
Python

edgBR / delta-lake-polars

Star

Building a poor man's data lake: Exploring the Power of Polars and Delta Lake

data-engineering delta datalake delta-lake polars polars-dataframe

Updated Feb 23, 2024
Python

Stefen-Taime / investissement

Star

Jenkins Delta pipeline

spark minio jenkins-pipeline delta-lake

Updated Jan 9, 2024
Python

Gurpreet17 / UC-Davis-SQL-for-Data-Science-Specialization

Star

Completed the SQL Basics for Data Science Specialization from the University of California, Davis, gaining proficiency in Data Analysis, SQL, Apache Spark, and Delta Lake.

data-science apache-spark sqlite bigdata data-analysis delta-lake