You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Toolkit for Apache Spark ML for Feature clean-up, feature Importance calculation suite, Information Gain selection, Distributed SMOTE, Model selection and training, Hyper parameter optimization and selection, Model interprability.
This project was a joint effort by Lucas De Oliveira, Chandrish Ambati, and Anish Mukherjee to create a song and playlist embeddings for recommendations in a distributed fashion using a 1M playlist dataset by Spotify.
This project provides an end-to-end data processing and visualization of visa numbers in Japan using PySpark and Plotly. The spark clusters are set up within a Docker container on Azure.
The goal was to perform predictive maintenance on commercial turbofan engine. The approach used here is a data-driven approach, meaning that data collected from the operational jet engine is used to perform predictive maintenance modeling. To be specific, to build a predictive model to estimate the Remaining Useful Life ( RUL) of a jet engine ba…