This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
-
Updated
May 8, 2024 - Scala
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
CTR prediction model based on spark(LR, GBDT, DNN)
Qubole Sparklens tool for performance tuning Apache Spark
🌟 ✨ Analyze and visualize Twitter Sentiment on a world map using Spark MLlib
Random Forests in Apache Spark
Natural Korean Processor for Apache Spark
✨ Spark ML implementation of SOM algorithm (Kohonen self-organizing map)
Examples of all Machine Learning Algorithm in Apache Spark
This package contains the code for executing clustering validity indices in Spark. The package includes BD-Silhouette, BD-Dunn, Davies-Bouldin and WSSSE indices.
This package contains the code for calculating external clustering validity indices in Spark. The package includes Chi Index among others.
Dataset deduplication using the spark ML lib and Scala
Implementation of SMOTE - Synthetic Minority Over-sampling Technique in SparkML / MLLib
Streaming Twitter Sentiment Analysis with Apache Spark
A project which involves analysis of Authorship graph data from Microsoft academic graph. In this project we calculate different graph features using temporal parameters of the authors and tried different classifiers. The final aim is to predict the link or coauthorsip possibility between two authors based on topological graph features and also …
Spark ML Dashboard built to plug-in and tweak the model params to real-time verify classification results on sample test data
For detecting the fraud credit card transactions at real time
Financial Forecasting and its correlation with Human Sentiments using Distributed Computing on Spark Framework
😅 A topic model of reddit.com/r/EmojiPasta trained with Spark and an LDA model (NSFW) - Trigger Warning: The r/emojipasta subreddit posts controversial content and anything I have crawled is to provide visibility of a topic modeling some of this controversial content. Unfortunately there is also discriminatory speech which must be called out!
This is a web-based movie recommendation application written in Scala using Apache Spark and Livy.
Iterative filter-based feature selection on large datasets with Apache Spark
Add a description, image, and links to the spark-mllib topic page so that developers can more easily learn about it.
To associate your repository with the spark-mllib topic, visit your repo's landing page and select "manage topics."