🌀 𝗧𝗵𝗲 𝗙𝘂𝗹𝗹 𝗦𝘁𝗮𝗰𝗸 𝟳-𝗦𝘁𝗲𝗽𝘀 𝗠𝗟𝗢𝗽𝘀 𝗙𝗿𝗮𝗺𝗲𝘄𝗼𝗿𝗸 | 𝗟𝗲𝗮𝗿𝗻 𝗠𝗟𝗘 & 𝗠𝗟𝗢𝗽𝘀 for free by designing, building and deploying an end-to-end ML batch system ~ 𝘴𝘰𝘶𝘳𝘤𝘦 𝘤𝘰𝘥𝘦 + 2.5 𝘩𝘰𝘶𝘳𝘴 𝘰𝘧 𝘳𝘦𝘢𝘥𝘪𝘯𝘨 & 𝘷𝘪𝘥𝘦𝘰 𝘮𝘢𝘵𝘦𝘳𝘪𝘢𝘭𝘴
-
Updated
Apr 3, 2024 - Python
🌀 𝗧𝗵𝗲 𝗙𝘂𝗹𝗹 𝗦𝘁𝗮𝗰𝗸 𝟳-𝗦𝘁𝗲𝗽𝘀 𝗠𝗟𝗢𝗽𝘀 𝗙𝗿𝗮𝗺𝗲𝘄𝗼𝗿𝗸 | 𝗟𝗲𝗮𝗿𝗻 𝗠𝗟𝗘 & 𝗠𝗟𝗢𝗽𝘀 for free by designing, building and deploying an end-to-end ML batch system ~ 𝘴𝘰𝘶𝘳𝘤𝘦 𝘤𝘰𝘥𝘦 + 2.5 𝘩𝘰𝘶𝘳𝘴 𝘰𝘧 𝘳𝘦𝘢𝘥𝘪𝘯𝘨 & 𝘷𝘪𝘥𝘦𝘰 𝘮𝘢𝘵𝘦𝘳𝘪𝘢𝘭𝘴
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.
Sample project to demonstrate data engineering best practices
Data Quality Gate based on AWS
Prefect integrations for interacting with Great Expectations
A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation and CDK, deployable via Github Actions.
A project for exploring how Great Expectations can be used to ensure data quality and validate batches within a data pipeline defined in Airflow.
This repository serves as a comprehensive guide to effective data modeling and robust data quality assurance using popular open-source tools
Code to demonstrate data engineering metadata & logging best practices
Run greatexpectations.io on ANY SQL Engine using REST API. Supported by FastAPI, Pydantic and SQLAlchemy as best data quality tool
This library is inspired by the Great Expectations library. The library has made the various expectations found in Great Expectations available when using the inbuilt python unittest assertions.
Using Great Expectations and Notion's API, this repo aims to provide data quality for our databases in Notion.
An ML pipeline to flip nfts that makes use of the cloud and containers.
Create data pipeline using Lambda architecture with Spark, Kafka, Airflow and Snowflake
A pipeline to forecast the direction stock prices from data from eodhistoricaldata.com
Personal Data Engineering project witch the objective is create the Data Lakehouse for a B2B e-commerce that must store the transactional and analytical data of the business. The final system delivers structured and clean data with the purpose of generate reports and find opportunities.
Kafka-Spark jobs orchestrated with Airflow
Tutorials for DevOps tools such as Google Codelabs, Apache Airflow, Streamlit, FastAPI, Great Expectations, etc.
Add a description, image, and links to the great-expectations topic page so that developers can more easily learn about it.
To associate your repository with the great-expectations topic, visit your repo's landing page and select "manage topics."