YTsaurus is a scalable and fault-tolerant open-source big data platform.
-
Updated
May 27, 2024 - C++
YTsaurus is a scalable and fault-tolerant open-source big data platform.
StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries. InfoWorld’s 2023 BOSSIE Award for best open source software.
World's most powerful data catalog service with providing a high-performance, geo-distributed and federated metadata lake.
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
Examples of using Terraform to deploy Databricks resources
Helm chart for deploying ParadeDB on Kubernetes
FLaNK AI Weekly covering Apache NiFi, Apache Flink, Apache Kafka, Apache Spark, Apache Iceberg, Apache Ozone, Apache Pulsar, and more...
A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
The Goal of this project is to provide documentation for the Lakehouse Engine framework.
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.
Build Your First End-to-End Lakehouse Solution (aka.ms/fabconlake)
Analytical table access method for Postgres
Supercharge Your Compute for Analytics & AI
A curated list of open source tools used in analytical stacks and data engineering ecosystem
Add a description, image, and links to the lakehouse topic page so that developers can more easily learn about it.
To associate your repository with the lakehouse topic, visit your repo's landing page and select "manage topics."