# Case Study Problem Solving Framework

## Data Science and Machine Learning

by Tobias Reaper

---
---

## Outline

1. Frame the problem
2. Prepare and explore the data
3. Evaluate several different models
4. Deploy the best model to production

---
---

## 1. Frame the Problem

> The first step in the process is to keep asking "Why?" until all of the important constraints, tradeoffs, etc. are defined enough that the process will actually solve the correct problem.

* Why does this problem matter?
  * Don't ask directly, but get to the answer of "What is the business model?"
    * Or, what perspective to take on the tradeoffs / costs
  * How will this model be used?
* What's the measure of success?

---
---

## 2. Prepare and explore the data

* Data types
* Data ranges
* Distributions
  * Discrete or continuous?
  * Can I transform the data such that it becomes linear?
  * Mean vs median; or, how much of an effect do the outliers have?
  * How can outliers be dealt with — which would be best and why?
* Feature relationships
  * Correlations
  * Multicollinearity
  * Chi-squared
  * Plots

---
---

## 3. Evaluate several different models

* How to create a train-val split?
* ELI5 the model/algorithm choices
* What are the tradeoffs of each type of model?
  * What are the most important considerations (tradeoffs) given the business case?
  * Or, justify the choice of models to validate
* How will the models be evaluated?
  * What evaluation metrics can be used?
  * What are the tradeoffs/ differences?
  * Which one should be used and why?

---
---

## 4. Decide on and deploy model to production

* Decide on a model for production
* Justify the choice of final model
  * What are the tradeoffs as far as complexity, efficiency (memory and computation)?
* Explain how the model will be used
  * What can be done to make it more useful to / usable by the end user?