Distributed processing challenge
-
Updated
Feb 18, 2023 - HTML
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Distributed processing challenge
The objective is predict the percentage of chance a flight being delayed. If there was a delay in the arrival of the flight, it's considered a delay. Tools: Spark, RDDs, Spark ML
Using PySpark for handling BigData and Machine Learning
This project will show an auto-updated map with the people interaction during COVID19 in the US using big data technologies to analysis a real-time stream of Twitter data.
Coursework for the Big Data module, report (pages) - https://nutellaweera.github.io/BD_Groupwork/
Predicting the house price using Apache Spark
Functional programming in Scala Certification path (EPFL)
Kaggle-Facebook-Recruiting-Challenge
Final Year Research project to recommend movies based on user behavioral data using the Big 5 personality model and user rating data. The model uses K-Means Clustering for Big 5 scores and 3 ALS models to recommend movies
Created by Matei Zaharia
Released May 26, 2014