Skip to content
#

apache-spark

spark logo

Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

Here are 383 public repositories matching this topic...

An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.

  • Updated Oct 5, 2023
  • Python

Created by Matei Zaharia

Released May 26, 2014

Followers
417 followers
Repository
apache/spark
Website
spark.apache.org
Wikipedia
Wikipedia

Related Topics

hadoop scala