Apache Spark
![spark logo](https://raw.githubusercontent.com/github/explore/6f5025830918df26b37d23b3ffffbc35725fe15f/topics/spark/spark.png)
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Here are 1,153 public repositories matching this topic...
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
-
Updated
Aug 15, 2024 - Java
The next generation of cloud-native big data management expert , Aims to help users rapidly build stable, efficient, and scalable cloud-native platforms for big data.
-
Updated
Aug 15, 2024 - Java
Official code repository for GATK versions 4 and up
-
Updated
Aug 14, 2024 - Java
Apache Wayang(incubating) is the first cross-platform data processing system.
-
Updated
Aug 14, 2024 - Java
Know your data better!Datavines is Next-gen Data Observability Platform, support metadata manage and data quality.
-
Updated
Aug 14, 2024 - Java
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
-
Updated
Aug 14, 2024 - Java
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
-
Updated
Aug 15, 2024 - Java
Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.
-
Updated
Aug 14, 2024 - Java
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learn...
-
Updated
Aug 13, 2024 - Java
A large-scale entity and relation database supporting aggregation of properties
-
Updated
Aug 13, 2024 - Java
Big data computing platform based on Spark <至轻云-打造大数据计算平台/数据中台>
-
Updated
Aug 14, 2024 - Java
Splittable Gzip codec for Hadoop
-
Updated
Aug 12, 2024 - Java
Created by Matei Zaharia
Released May 26, 2014
- Followers
- 420 followers
- Repository
- apache/spark
- Website
- spark.apache.org
- Wikipedia
- Wikipedia