Distributed processing challenge
-
Updated
Feb 18, 2023 - HTML
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Distributed processing challenge
The objective is predict the percentage of chance a flight being delayed. If there was a delay in the arrival of the flight, it's considered a delay. Tools: Spark, RDDs, Spark ML
Functional programming in Scala Certification path (EPFL)
Kaggle-Facebook-Recruiting-Challenge
Final Year Research project to recommend movies based on user behavioral data using the Big 5 personality model and user rating data. The model uses K-Means Clustering for Big 5 scores and 3 ALS models to recommend movies
(Interview) Mixin Data Engineering & Data Science with PySpark
Distributed Hot Spot Analysis over Big Spatio-temporal data about ship trips.
Software Engineering Final Project @ MSc. Computer Science & Engineering - Politecnico di Milano.
分布式;SpringBoot;SpringCloud;SpringMVC;MQ;Redis;ELK
Predicting churn rates for a music company using Spark MLlib
Created by Matei Zaharia
Released May 26, 2014