No description, website, or topics provided.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.circleci
Documentation
UI
data-cleaning-app
dataset
presentation
project
scala-ml-rec-app
zepplin-notebooks
.gitignore
.gitlab-ci.yml
CODE_CONVENTIONS.md
README.md
build.sbt

README.md

Santander Product Recommendation

Codacy Badge

Build Status: CircleCI

Introduction:

Course : CSYE7200 Big Data Engineering with Scala

Professor: Robin Hillyard

Semester: Spring 2018

Team member:

Arpit Rawat - [rawat.a@husky.neu.edu] (mailto:rawat.a@husky.neu.edu)

Nishant Gandhi - [gandhi.n@husky.neu.edu] (mailto:gandhi.n@husky.neu.edu])

Vaishali Lambe - [lambe.v@husky.neu.edu] (mailto:lambe.v@husky.neu.edu )

Programming Language: Scala

Tools / Framework:

  • Apache Spark
  • Zepplin
  • Play Framework
  • IntelliJ IDEA
  • CircleCI
  • GitlabCI

Data Source:

https://www.kaggle.com/c/santander-product-recommendation/data

Data Size: ~ 2.3GB [Rows: ~1.3M]

Backup Repository: https://gitlab.com/nishantgandhi99/Team_7_Santander_Product_Recommendation

Synopsis:

  • Problem Statement:

    In this project, we built a recommendation system for a customer to predict which products they will use in the next month based on their past behavior and that of similar customers. With a more effective recommendation system in place, Santander Bank can better meet the individual needs of all customers and ensure their satisfaction no matter where they are in life.

  • Approach:

    We followed the CRISP-DM Methodology for building the recommendation system. Here is the pipeline of our project:

    • Data Exploratory Analysis (Zeppelin) -> Data Cleaning (Spark Dataset/Dataframe) -> Data Modelling (Spark MLLib) -> Predictions -> Play Framework (to show predictions)
  • Model Evaluation Metric

    Precision achieved with this predictive model is 0.63

Project Setup

Test Project

$ sbt test

Build Project

$ sbt package

Build Fat(Uber) Jar

$ sbt assembly

Generating Coverage Jar

$ sbt clean coverage test
$ sbt coverage test
$ sbt coverageReport
$ sbt coverageAggregate

target/scala-2.11/scoverage-report/index.html

Submit Fatjar to Spark in Local Mode

1. Data Cleaning App
$ /path/to/spark-2.2.0-bin-hadoop2.6/bin/spark-submit  --class edu.neu.coe.csye7200.prodrec.dataclean.main.AppRunner --master local[*] /path/to/Team_7_Santander_Product_Recommendation/data-cleaning-app/target/scala-2.11/DataCleaningApp-assembly-1.0.jar  -i /path/to/train_ver2.csv -o /path/to/outputFolder
2. UI App

Final Project Prsentation

https://prezi.com/view/L9AIqnlsLZrmKhNYkX50/

PDF Verison