Spark library for generalized K-Means clustering. Supports general Bregman divergences. Suitable for clustering probabilistic data, time series data, high dimensional data, and very large data.
-
Updated
Jan 19, 2024 - HTML
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Spark library for generalized K-Means clustering. Supports general Bregman divergences. Suitable for clustering probabilistic data, time series data, high dimensional data, and very large data.
Apache Spark™ and Scala Workshops
Toolkit for Apache Spark ML for Feature clean-up, feature Importance calculation suite, Information Gain selection, Distributed SMOTE, Model selection and training, Hyper parameter optimization and selection, Model interprability.
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
A concise resource repository for machine learning
Teaching Materials for Distributed Statistical Computing (大数据分布式计算教学材料)
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka and Cassandra
Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline
Spark algorithms for building k-nn graphs
Ansible roles to deploy Kubernetes, JupyterHub, Jupyter Enterprise Gateway and Spark on Kubernetes cluster
Kaggle's Predict Future Sales competition project (TOP 15 solution as of March 2020)
Adds a notification panel to your Laravel Spark Kiosk, allowing you to send notifications to users.
Workshop Big Data en Español
Lecture: Big Data
Get Twitter trends with twitter4j, stream it to a Kafka topic, save it to MongoDB and visualize in Google Maps
A tool to help you to test and develop pyspark code with sampled and local data
Taller SparkR para las Jornadas de Usuarios de R
Created by Matei Zaharia
Released May 26, 2014