Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
-
Updated
May 7, 2024 - Java
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
Egeria core
All development now happens over here: https://github.com/cwensel/cascading. Cascading is a feature rich API for defining and executing complex and fault tolerant data processing workflows on various cluster computing platforms.
The DataHelix generator allows you to quickly create data, based on a JSON profile that defines fields and the relationships between them, for the purpose of testing and validation
Examples for using Apache Flink® with DataStream API, Table API, Flink SQL and connectors such as MySQL, JDBC, CDC, Kafka.
A Spring Boot Camel boilerplate that aims to consume events from Apache Kafka, process it and send to a PostgreSQL database.
Clusterless is a tool for scheduling decentralized, scalable, and secure data pipelines for continuously arriving data, across clouds.
ETL scripts for Hedera Hashgraph
A store abstraction and analytics system for real-time event data.
System Design, Solution Architecture, Data Systems Practice
This repository contains an end-to-end data engineering project using Apache Flink, focused on performing sales analytics. The project demonstrates how to ingest, process, and analyze sales data, showcasing the capabilities of Apache Flink for big data processing.
This repository contains code for running Dataflow pipelines for processing public Band Protocol data in Google Cloud Platform
This is Quantumics.AI's public repository, inviting people from arround the world to contrubute and take advantage of free No code DataOps platform
Dataflow pipeline for detecting anomalous transactions on the Ethereum and Bitcoin blockchains
A data engineering cli for reading and writing data to/from multiple locations across multiple formats.
A semantic monitoring framework for aggregating data from heterogeneous sources.
LinkedIn's previous generation Kafka to HDFS pipeline.
First academic big data project to implement analysis using MapReduce and Hive platform
Generates fake data for big data projects. Have capability to generate medical, industry datasets. File size as well number of files and number of records can be configured
Add a description, image, and links to the data-engineering topic page so that developers can more easily learn about it.
To associate your repository with the data-engineering topic, visit your repo's landing page and select "manage topics."