#

Apache Spark

Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

Here are 8,630 public repositories matching this topic...

apache / spark

Apache Spark - A unified analytics engine for large-scale data processing

python java r scala sql big-data spark jdbc

Updated Nov 10, 2024
Scala

donnemartin / data-science-ipython-notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

python aws data-science machine-learning caffe theano big-data spark deep-learning hadoop tensorflow numpy scikit-learn keras pandas kaggle scipy matplotlib mapreduce

Updated Mar 20, 2024
Python

getredash / redash

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

visualization javascript mysql python bigquery bi spark dashboard athena analytics postgresql business-intelligence redash redshift databricks hacktoberfest spark-sql

Updated Nov 6, 2024
Python

DataTalksClub / data-engineering-zoomcamp

Free Data Engineering course!

docker kafka spark data-engineering dbt prefect

Updated Nov 4, 2024
Jupyter Notebook

yeasy / docker_practice

Learn and understand Docker&Container technologies, with real DevOps practice!

linux docker kubernetes devops spark book container mesos swarm cloud-computing

Updated Sep 26, 2024
Go

heibaiying / BigData-Notes

大数据入门指南 ⭐

phoenix scala kafka big-data spark yarn hive hadoop storm bigdata hbase zookeeper hdfs mapreduce flume azkaban sqoop

Updated Jan 5, 2024
Java

GaiZhenbiao / ChuanhuChatGPT

GUI for ChatGPT API and many LLMs. Supports agents, file-based QA, GPT finetuning and query with web search. All with a neat UI.

spark chatbot gemini llama minimax moss gemma claude ernie midjourney chatgpt-api chatglm stablelm ollama qwen dalle3 inspurai

Updated Oct 22, 2024
Python

FavioVazquez / ds-cheatsheets

List of Data Science Cheatsheets to rule the world

python r programming spark jupyter datascience cheatsheet

Updated Jul 18, 2024

flink-learning

zhisheng17 / flink-learning

flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例，还有 Flink 落地应用的大型项目案例（PVUV、日志存储、百亿数据实时去重、监控告警）分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》

mysql redis elasticsearch streaming kafka spark influxdb rabbitmq clickhouse hbase stream-processing opentsdb loki flink rocketmq

Updated May 25, 2024
Java

horovod / horovod

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

machine-learning spark deep-learning uber mxnet tensorflow mpi keras pytorch machinelearning baidu deeplearning ray

Updated Aug 31, 2024
Python

aalansehaiyang / technology-talk

【大厂面试专栏】一份Java程序员需要的技术指南，这里有面试题、系统架构、职场锦囊、主流中间件等，让你成为更牛的自己！

git java kafka spark spring es6 hbase springboot dubbo mycat

Updated Oct 28, 2023

deeplearning4j / deeplearning4j

Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learn...

python java clojure scala spark hadoop gpu intellij linear-algebra artificial-intelligence deeplearning neural-nets dl4j matrix-library deeplearning4j

Updated Nov 9, 2024
Java

apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.

bigquery real-time sql database spark hive hadoop etl snowflake olap query-engine redshift dbt elt iceberg hudi delta-lake lakehouse

Updated Nov 10, 2024
Java

wangzhiwubigdata / God-Of-BigData

专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...

kafka spark hive hadoop bigdata hbase zookeeper hdfs flume flink azkaban

Updated Aug 7, 2023

mage-ai / mage-ai

🧙 Build, run, and manage data pipelines for integrating and transforming data.

python data-science data machine-learning sql spark pipeline etl pipelines orchestration artificial-intelligence data-engineering data-integration dbt elt transformation data-pipelines reverse-etl

Updated Nov 8, 2024
Python

delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

big-data spark analytics acid delta-lake

Updated Nov 9, 2024
Scala

h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Updated Nov 9, 2024
Jupyter Notebook

Alluxio / alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud

spark presto hadoop tensorflow data-analysis alluxio memory-speed data-orchestration virtual-distributed-filesystem

Updated Nov 7, 2024
Java

Angel-ML / angel

A Flexible and Powerful Parameter Server for large-scale machine learning

machine-learning scala spark model spark-streaming online-learning parameter-server high-dimensional

Updated Jan 16, 2024
Java

tobymao / sqlglot

Python SQL Parser and Transpiler

Updated Nov 9, 2024
Python

Created by Matei Zaharia

Released May 26, 2014

Followers: 422 followers
Repository: apache/spark
Website: spark.apache.org
Wikipedia: Wikipedia

Related Topics