Skip to content

Building a scalable model to predict customer churn using spark.

Notifications You must be signed in to change notification settings

thiagolimaop/sparkify_churn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Table of Contents

This project will cover how to create a scalable model using spark to predict a customer churn. The data used for this project was available by Udacity and it’s about a user log for a fictional streaming music app called Sparkify.

  1. Installation
  2. Project Motivation
  3. File Descriptions
  4. Results
  5. Acknowledgments
  6. Licensing and Author

Installation

The necessary libraries to run the code in Python version 3.*:

  • Anaconda (Pandas, Numpy, MatPlotLib, Datetime) and PySpark.

Obs.: It's necessary to unzip the data to run the project.

Project Motivation

In the business world, churn is defined when a customer cancels or abandons the service. Predicting when a customer tends to churn can be very profitable to companies, since this could increase the retention rate, by offering discounts and incentives.

File Description

  • Sparkify.ipynb: a notebook containing all the processes to build a scalable model to predict customer churn for a fictional streaming music app called Sparkify using spark.
  • mini_sparkify_event_data.json.zip: a zipped data
  • Sparkify.html: an HTML version of the notebook

Results

All the process and results of this project can be found at the post available here.

Acknowledgments

Thanks to Udacity for providing such a amazing project.

Licensing and Author

Must give credit to Udacity for the data. Otherwise, feel free to use the code here as you would like!

About

Building a scalable model to predict customer churn using spark.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published