Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
-
Updated
Mar 9, 2024 - HTML
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
Apache Spark™ and Scala Workshops
Teaching Materials for Distributed Statistical Computing (大数据分布式计算教学材料)
Workshop Big Data en Español
Spark library for generalized K-Means clustering. Supports general Bregman divergences. Suitable for clustering probabilistic data, time series data, high dimensional data, and very large data.
Toolkit for Apache Spark ML for Feature clean-up, feature Importance calculation suite, Information Gain selection, Distributed SMOTE, Model selection and training, Hyper parameter optimization and selection, Model interprability.
A concise resource repository for machine learning
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka and Cassandra
Taller SparkR para las Jornadas de Usuarios de R
Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline
Spark algorithms for building k-nn graphs
Kaggle's Predict Future Sales competition project (TOP 15 solution as of March 2020)
Lecture: Big Data
Ansible roles to deploy Kubernetes, JupyterHub, Jupyter Enterprise Gateway and Spark on Kubernetes cluster
Adds a notification panel to your Laravel Spark Kiosk, allowing you to send notifications to users.
基于Spark的电影推荐系统
Created by Matei Zaharia
Released May 26, 2014