You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An investigatory analysis of restaurant sales data using Apache Spark in an attempt to give some insights as to how to boost up the sales of less frequently sold items. This is a real-world dataset from an actual restaurant.
Exploring World Development Indicators: Identifying relationship between Health Indicators using Linear Regression & Classification of Income Group based on Health Indicators using Logistic Regression.
Created a SparkML RandomForest model to predict total employee compensation. Queried data with SparkSQL, ran PySpark scripts to run EDA, pre-process data, and train model achieving with 0.98 R2 score.
This project will show an auto-updated map with the people interaction during COVID19 in the US using big data technologies to analysis a real-time stream of Twitter data.
This project extracts list, information and statistics from Wikipedia articles of current and past NBA players. I used Spark SQL to extract information from html documents and save it to a csv file. In the nearby future, I will post the same objective achieved using Pig