Amazon SageMaker Local Mode Examples
-
Updated
Jun 19, 2024 - Python
Amazon SageMaker Local Mode Examples
Exercícios do módulo 1 - Bootcamp EDC - IGTI 2021
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.
Sample project to demonstrate data engineering best practices
Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
A Delta Lake reader for Dask
Spark data pipeline that processes movie ratings data.
Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data
This repository exemplifies a simple ELT process using delta to perform upsert and remove data files that aren't in the latest state of the transaction log for the table.
A 1 hour workshop running through the data lakehouse and deep dive into delta lake
Free High-Quality Financial Data in Azure
Implementation of an ETL process for real-time sentiment analysis of tweets with Docker, Apache Kafka, Spark Streaming, MongoDB and Delta Lake
Spark Structured Streaming data pipeline that processes movie ratings data in real-time.
UI to run SQL on Delta Lake tables and visualize the variations of the result among tables versions
This project involves building a real-time data pipeline using Apache Kafka and Apache Spark Streaming. The pipeline ingests data, processes it in real-time, and outputs the processed data to datalake for storage and further analysis.
Automated provisioning of an industry Lakehouse with enterprise data model
Schema mappings in SQL and PySpark for ELT pipelines to normalize data to OCSF
Running Spark ETL Jobs with Airflow
Add a description, image, and links to the delta-lake topic page so that developers can more easily learn about it.
To associate your repository with the delta-lake topic, visit your repo's landing page and select "manage topics."