#

Apache Spark

Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

Here are 1,877 public repositories matching this topic...

delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

big-data spark analytics acid delta-lake

Updated Jul 16, 2024
Scala

apache / spark

Apache Spark - A unified analytics engine for large-scale data processing

python java r scala sql big-data spark jdbc

Updated Jul 16, 2024
Scala

NVIDIA / spark-rapids

Spark RAPIDS plugin - accelerate Apache Spark with GPUs

big-data spark gpu rapids

Updated Jul 16, 2024
Scala

spark-nlp

JohnSnowLabs / spark-nlp

State of the Art Natural Language Processing

Updated Jul 16, 2024
Scala

qbeast-spark

Qbeast-io / qbeast-spark

Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!

scala big-data spark sampling datasource spark-sql data-lakehouse

Updated Jul 16, 2024
Scala

apache / kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

kubernetes sql spark hive hadoop jdbc thrift data-lake hacktoberfest spark-sql

Updated Jul 16, 2024
Scala

Azure / azure-kusto-spark

Apache Spark Connector for Azure Kusto

scala spark azure kusto

Updated Jul 16, 2024
Scala

smart-data-lake / smart-data-lake

Smart Automation Tool for building modern Data Lakes and Data Pipelines

scala spark hive hadoop transform-data data-lake data-pipelines deltalake smart-data-lake

Updated Jul 16, 2024
Scala

LucaCanali / sparkMeasure

This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of Spark metrics, making it a practical choice for both developers and data engineers.

python scala spark apache-spark performance-metrics performance-troubleshooting

Updated Jul 16, 2024
Scala

AbsaOSS / cobrix

A COBOL parser and Mainframe/EBCDIC data source for Apache Spark

spark scalable etl cobol mainframe ebcdic copybook cobol-parser

Updated Jul 16, 2024
Scala

SynapseML

microsoft / SynapseML

Simple and Distributed Machine Learning

Updated Jul 16, 2024
Scala

delta-io / delta-sharing

An open protocol for secure data sharing

big-data spark pandas data-sharing delta-lake

Updated Jul 16, 2024
Scala

opensearch-project / opensearch-spark

Spark Accelerator framework ; It enables secondary indices to remote data stores.

spark compute opensearch secondary-index

Updated Jul 16, 2024
Scala

crealytics / spark-excel

A Spark plugin for reading and writing Excel files

scala spark etl excel data-frame

Updated Jul 15, 2024
Scala

AbsaOSS / pramen

Resilient data pipeline framework running on Apache Spark

scala big-data spark etl hacktoberfest data-pipeline

Updated Jul 16, 2024
Scala

cognitedata / cdp-spark-datasource

Spark data source for Cognite Data Fusion

scala spark datasource cognite

Updated Jul 15, 2024
Scala

TianZonglin / BigEYES

A distributed graph computing platform that enables simple visual analysis of large-scale relational data.

canvas spark websocket distributed-computing graph-drawing

Updated Jul 15, 2024
Scala

zio / zio-quill

Compile-time Language Integrated Queries for Scala

mysql linq postgres scala database spark cassandra jdbc scalajs sparksql

Updated Jul 16, 2024
Scala

data-tools / big-data-types

A library to transform Scala product types and Schemes from different systems into other Schemes. Any implemented type automatically gets methods to convert it into the rest of the types and vice versa. E.g: a Spark Schema can be transformed into a BigQuery table.

bigquery scala spark cassandra apache-spark circe typeclass typesafe schemas database-types typeclass-derivation bigquery-tables

Updated Jul 14, 2024
Scala

datastax / spark-cassandra-connector

DataStax Connector for Apache Spark to Apache Cassandra

scala spark cassandra

Updated Jul 16, 2024
Scala

Created by Matei Zaharia

Released May 26, 2014

Followers: 420 followers
Repository: apache/spark
Website: spark.apache.org
Wikipedia: Wikipedia

Related Topics