# SparkCognition - Response to Marketing Campaign

## Table of Contents
* [Assignment](#Assignment)
* [Data Description](#Data-Description)
* [Data Exploration](#Data-Exploration)

## Assignment

You are working for SparkCognition as a Data Scientist. SparkCognition has been commissioned by an insurance company to develop a tool to optimize their marketing efforts. They have given us a data set as a result of an email marketing campaign. The data set includes customer information, described below, as well as whether the customer responded to the marketing campaign or not.

Design a model that will be able to predict whether a customer will respond to the marketing campaign based on his/her information. In other words, predict the responded target variable described above based on all the input variables provided.

Briefly answer the following questions:

1. Describe your model and why did you choose this model over other types of models?
2. Describe any other models you have tried and why do you think this model preforms better?
3. How did you handle missing data?
4. How did you handle categorical (string) data?
5. How did you handle unbalanced data?
6. How did you test your model?

## Data Description
**Files:**

`marketing_training.csv` = contains the training set that you will use to build the model. The target variable is responded.


`marketing_test.csv` = contains testing data where the input variables are provided but not the responded target column.

**Descriptions of each column:** 


`custAge` = The age of the customer (in years)


`profession` = Type of job


`marital` = Marital status


`schooling` = Education level


`default` = Has a previous defaulted account?


`housing` = Has a housing loan?


`contact` = Preferred contact type


`month` = Last contact month


`day_of_week` = Last contact day of week


`ampaign` = Number of times the customer was contacted


`pdays` = Number of days that passed by after the client was last contacted from a previous campaign (numeric; 999 means client was not previous contacted)


`previous` = Number of contacts performed before this campaign for this client


`poutcome` = Outcome of the previous marketing campaign


`emp.var.rate` = Employment variation rate - quartlerly indicator


`cons.price.idx` = Consumer price index - monthly indicator


`cons.conf.idx` = Consumer confidence index - monthly indicator


`euribor3m` = Euribor 3 months rate - daily indicator


`nr.employed` = Number of employees - quarterly indicator


`pmonths` = Number of months that passed by after the client was last contacted from a previous campaign (numeric; 999 means client was not previous contacted)


`pastEmail` = Number of previous emails sent to this user


`responded` = Did the customer respond to the marketing campaign and purchase a policy?


## Data Exploration

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
marketing_training = pd.read_csv('marketing_training.csv')
marketing_training.head(3)

Unnamed: 0,custAge,profession,marital,schooling,default,housing,loan,contact,month,day_of_week,...,previous,poutcome,emp.var.rate,cons.price.idx,cons.conf.idx,euribor3m,nr.employed,pmonths,pastEmail,responded
0,55.0,admin.,single,university.degree,unknown,no,no,cellular,nov,mon,...,0,nonexistent,-0.1,93.2,-42.0,4.191,5195.8,999.0,0,no
1,,blue-collar,married,,no,no,no,cellular,jul,mon,...,0,nonexistent,1.4,93.918,-42.7,4.96,5228.1,999.0,0,no
2,42.0,technician,married,high.school,no,no,no,telephone,may,mon,...,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,999.0,0,no


In [3]:
marketing_test = pd.read_csv('marketing_test.csv')
marketing_test.head(3)

Unnamed: 0.1,Unnamed: 0,custAge,profession,marital,schooling,default,housing,loan,contact,month,...,pdays,previous,poutcome,emp.var.rate,cons.price.idx,cons.conf.idx,euribor3m,nr.employed,pmonths,pastEmail
0,0,,entrepreneur,married,university.degree,no,yes,no,cellular,jul,...,999,0,nonexistent,1.4,93.918,-42.7,4.963,5228.1,999.0,0
1,1,58.0,entrepreneur,married,university.degree,unknown,no,no,telephone,jun,...,999,0,nonexistent,1.4,94.465,-41.8,4.959,5228.1,999.0,0
2,2,48.0,entrepreneur,married,,no,no,no,cellular,jul,...,999,0,nonexistent,1.4,93.918,-42.7,4.96,5228.1,999.0,0


In [4]:
marketing_training.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7414 entries, 0 to 7413
Data columns (total 22 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   custAge         5610 non-null   float64
 1   profession      7414 non-null   object 
 2   marital         7414 non-null   object 
 3   schooling       5259 non-null   object 
 4   default         7414 non-null   object 
 5   housing         7414 non-null   object 
 6   loan            7414 non-null   object 
 7   contact         7414 non-null   object 
 8   month           7414 non-null   object 
 9   day_of_week     6703 non-null   object 
 10  campaign        7414 non-null   int64  
 11  pdays           7414 non-null   int64  
 12  previous        7414 non-null   int64  
 13  poutcome        7414 non-null   object 
 14  emp.var.rate    7414 non-null   float64
 15  cons.price.idx  7414 non-null   float64
 16  cons.conf.idx   7414 non-null   float64
 17  euribor3m       7414 non-null   f

In [5]:
marketing_test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 824 entries, 0 to 823
Data columns (total 22 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Unnamed: 0      824 non-null    int64  
 1   custAge         614 non-null    float64
 2   profession      824 non-null    object 
 3   marital         824 non-null    object 
 4   schooling       573 non-null    object 
 5   default         824 non-null    object 
 6   housing         824 non-null    object 
 7   loan            824 non-null    object 
 8   contact         824 non-null    object 
 9   month           824 non-null    object 
 10  day_of_week     748 non-null    object 
 11  campaign        824 non-null    int64  
 12  pdays           824 non-null    int64  
 13  previous        824 non-null    int64  
 14  poutcome        824 non-null    object 
 15  emp.var.rate    824 non-null    float64
 16  cons.price.idx  824 non-null    float64
 17  cons.conf.idx   824 non-null    flo

We have so many NaN values. 