data-engineering

Here are 41 public repositories matching this topic...

kestra-io / kestra

Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.

workflow data pipeline etl workflow-engine scheduler orchestration data-engineering data-integration elt data-pipeline data-quality low-code data-orchestration data-orchestrator reverse-etl

Updated May 7, 2024
Java

opendatadiscovery / odd-platform

Star

First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.

Updated May 6, 2024
Java

odpi / egeria

Star

Egeria core

java data-engineering hacktoberfest governance metadata-management odpi linux-foundation data-governance odpi-egeria egeria

Updated May 2, 2024
Java

All development now happens over here: https://github.com/cwensel/cascading. Cascading is a feature rich API for defining and executing complex and fault tolerant data processing workflows on various cluster computing platforms.

machine-learning hadoop etl cascading data-engineering scalding flink tez

Updated Nov 29, 2018
Java

finos / datahelix

Star

The DataHelix generator allows you to quickly create data, based on a JSON profile that defines fields and the relationships between them, for the purpose of testing and validation

java data-engineering data-generation data-generator test-data-generator

Updated Apr 14, 2023
Java

twalthr / flink-api-examples

Star

Examples for using Apache Flink® with DataStream API, Table API, Flink SQL and connectors such as MySQL, JDBC, CDC, Kafka.

stream-processing data-engineering apache-flink flink flink-examples flink-sql

Updated Sep 26, 2023
Java

obrunodelgado / camelboilerplate

Star

A Spring Boot Camel boilerplate that aims to consume events from Apache Kafka, process it and send to a PostgreSQL database.

java docker unit-testing kafka spring-boot migrations postgresql camel data-engineering flyway junit5 java-11 camel-boilerplate

Updated Apr 13, 2021
Java

ClusterlessHQ / clusterless

Sponsor

Star

Clusterless is a tool for scheduling decentralized, scalable, and secure data pipelines for continuously arriving data, across clouds.

python java cli aws data-science cloud data-engineering mlops

Updated Dec 20, 2023
Java

blockchain-etl / hedera-etl

Star

ETL scripts for Hedera Hashgraph

crypto etl gcp google-cloud cryptocurrency data-engineering data-analytics apache-beam web3 google-cloud-platform google-bigquery google-dataflow google-pubsub blockchain-analytics hedera hedera-hashgraph on-chain-analysis

Updated Feb 14, 2023
Java

Flipkart / foxtrot

Star

A store abstraction and analytics system for real-time event data.

java elasticsearch data-science monitoring analytics alerting hbase data-visualization data-engineering

Updated Dec 16, 2023
Java

sonhmai / data-system-design

Star

System Design, Solution Architecture, Data Systems Practice

architecture data-engineering streaming-data system-design data-governance data-system-design

Updated Apr 29, 2024
Java

airscholar / ApacheFlink-SalesAnalytics

Star

This repository contains an end-to-end data engineering project using Apache Flink, focused on performing sales analytics. The project demonstrates how to ingest, process, and analyze sales data, showcasing the capabilities of Apache Flink for big data processing.

data-engineering apache-flink sales-analytics end-to-end-data-engineering

Updated Nov 18, 2023
Java

blockchain-etl / band-dataflow-sample-applications

Star

This repository contains code for running Dataflow pipelines for processing public Band Protocol data in Google Cloud Platform

data-science crypto gcp google-cloud cryptocurrency data-engineering data-analytics web3 google-cloud-platform anomaly-detection blockchain-analytics bandchain on-chain-analysis

Updated Oct 5, 2020
Java

Quantumics-AI / quantumics-opensource

Star

This is Quantumics.AI's public repository, inviting people from arround the world to contrubute and take advantage of free No code DataOps platform

data-science data-visualization data-structures data-engineering data-analysis

Updated Feb 6, 2024
Java

blockchain-etl / anomalous-transactions-detector-dataflow

Star

Dataflow pipeline for detecting anomalous transactions on the Ethereum and Bitcoin blockchains

Updated Oct 13, 2020
Java

ClusterlessHQ / tessellate

Sponsor

Star

A data engineering cli for reading and writing data to/from multiple locations across multiple formats.

java cli aws data-science s3 cascading data-engineering parquet mlops

Updated Dec 6, 2023
Java

giros-dit / semantic-data-aggregator

Star

A semantic monitoring framework for aggregating data from heterogeneous sources.

semantics yang data-engineering data-integration nifi flink data-modeling gnmi ngsi-ld

Updated Apr 30, 2024
Java

mbrtargeting / camus

Star

LinkedIn's previous generation Kafka to HDFS pipeline.

kafka hadoop data-engineering hdfs data-pipeline

Updated Mar 12, 2019
Java

Michu-dev / big-data-first-project

Star

First academic big data project to implement analysis using MapReduce and Hive platform

airflow hive data-engineering mapreduce-java orcfile

Updated Jan 3, 2023
Java

vikramsinghchandel / dataGenerator

Star

Generates fake data for big data projects. Have capability to generate medical, industry datasets. File size as well number of files and number of records can be configured

data-science data big-data bigdata data-engineering data-generator

Updated Sep 24, 2018
Java

Improve this page

Add a description, image, and links to the data-engineering topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-engineering topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data-engineering

Here are 41 public repositories matching this topic...

kestra-io / kestra

opendatadiscovery / odd-platform

odpi / egeria

Cascading / cascading

finos / datahelix

twalthr / flink-api-examples

obrunodelgado / camelboilerplate

ClusterlessHQ / clusterless

blockchain-etl / hedera-etl

Flipkart / foxtrot

sonhmai / data-system-design

airscholar / ApacheFlink-SalesAnalytics

blockchain-etl / band-dataflow-sample-applications

Quantumics-AI / quantumics-opensource

blockchain-etl / anomalous-transactions-detector-dataflow

ClusterlessHQ / tessellate

giros-dit / semantic-data-aggregator

mbrtargeting / camus

Michu-dev / big-data-first-project

vikramsinghchandel / dataGenerator

Improve this page

Add this topic to your repo