Pyspark Notebook With Docker
-
Updated
Aug 18, 2015 - Python
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Pyspark Notebook With Docker
Simple demonstration of how to build a complex real time machine learning visualization tool.
geneSpark is a bioinformatics software program written in Python and Apache Spark for big data epigenetic histone modification ChIP-seq analysis.
An introductory lab for PySpark MapReduce framework made for CSSE434
Install Spark, Kafka, Cassandra, Zookeeper
Distributed Machine Learning for Bio-marker Prediction from Big Data Stream collected from Multi-modal Wearable Sensor Data
A forecasting project based on Apache-Spark and implemented with Naive Bayes theorem.
Projeto de BigData Mestrado Mackenzie
Apache Spark - From installation to performing awesome operations in Apache Spark Stack
A movie recommendor implemnted in Python
ADMM based Scalable Machine Learning on Spark
A basic Spam-filter built using Apache Spark.
Python scripts utilizing the PySpark API to convert a huge data set (about 3.5 GB) of flight data into various data storage formats such as CSV, JSON, Sequence file system
Structured Spark Streaming with Apache Kafka and Twitter
Real-time bidding with Apache Spark
Naive Bayes/Decision Tree/Logistic Regression in Apache Spark and Python
Created by Matei Zaharia
Released May 26, 2014