Serene Data Integration Platform
-
Updated
Aug 20, 2017 - HTML
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Serene Data Integration Platform
My solution to Introduction to Big Data with Apache Spark MOOC at Edx
PROJECTS from Data Science and Analytics, MSc Program 2016-2017 | Hira Fatima
NYC Yellow Taxi Analysis
Taxi versus Uber in NYC
Exploratory Analysis of Amazon Product Reviews Dataset comprising of various categories spanning over 14 years
Notes on Apache Spark (pyspark)
Demo created for "Life is but a Stream" presentation at Spark AI Summit 2019 (San Francisco, CA)
This is a course project for MIE1512. All the details of the project will be covered in the notebook itself.
A repository for ipython notebook backup
Distributed ML: Predicting Churn from Click Data with Apache Spark
NiFi, Data Engineering, Data Ingest, REST, ETL, Mapping, ELT, SQL, Spark, Kafka for Good
Big Data Engineering studying using a dataset that includes all reported crimes from Chicago, IL from 2001 to present day.
apache_spark_UCBerkeleyX
Infant Mortality Data Prediction and Analysis
Toolkit for Apache Spark ML for Feature clean-up, feature Importance calculation suite, Information Gain selection, Distributed SMOTE, Model selection and training, Hyper parameter optimization and selection, Model interprability.
Rails application for the Archives Unleashed Cloud.
UC Davis Distributed Computing with Spark SQL (with Databricks) and Databricks Apache Spark SQL for Data Analysts
Created by Matei Zaharia
Released May 26, 2014