### Will a Customer Accept the Coupon?

**Context**

Imagine driving through town and a coupon is delivered to your cell phone for a restaraunt near where you are driving. Would you accept that coupon and take a short detour to the restaraunt? Would you accept the coupon but use it on a sunbsequent trip? Would you ignore the coupon entirely? What if the coupon was for a bar instead of a restaraunt? What about a coffee house? Would you accept a bar coupon with a minor passenger in the car? What about if it was just you and your partner in the car? Would weather impact the rate of acceptance? What about the time of day?

Obviously, proximity to the business is a factor on whether the coupon is delivered to the driver or not, but what are the factors that determine whether a driver accepts the coupon once it is delivered to them? How would you determine whether a driver is likely to accept a coupon?

**Overview**

The goal of this project is to use what you know about visualizations and probability distributions to distinguish between customers who accepted a driving coupon versus those that did not.

**Data**

This data comes to us from the UCI Machine Learning repository and was collected via a survey on Amazon Mechanical Turk. The survey describes different driving scenarios including the destination, current time, weather, passenger, etc., and then ask the person whether he will accept the coupon if he is the driver. Answers that the user will drive there ‘right away’ or ‘later before the coupon expires’ are labeled as ‘Y = 1’ and answers ‘no, I do not want the coupon’ are labeled as ‘Y = 0’.  There are five different types of coupons -- less expensive restaurants (under \\$20), coffee houses, carry out & take away, bar, and more expensive restaurants (\\$20 - \\$50). 

**Deliverables**

Your final product should be a brief report that highlights the differences between customers who did and did not accept the coupons.  To explore the data you will utilize your knowledge of plotting, statistical summaries, and visualization using Python. You will publish your findings in a public facing github repository as your first portfolio piece. 





### Data Description
Keep in mind that these values mentioned below are average values.

The attributes of this data set include:
1. User attributes
    -  Gender: male, female
    -  Age: below 21, 21 to 25, 26 to 30, etc.
    -  Marital Status: single, married partner, unmarried partner, or widowed
    -  Number of children: 0, 1, or more than 1
    -  Education: high school, bachelors degree, associates degree, or graduate degree
    -  Occupation: architecture & engineering, business & financial, etc.
    -  Annual income: less than \\$12500, \\$12500 - \\$24999, \\$25000 - \\$37499, etc.
    -  Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    -  Number of times that he/she buys takeaway food: 0, less than 1, 1 to 3, 4 to 8 or greater
    than 8
    -  Number of times that he/she goes to a coffee house: 0, less than 1, 1 to 3, 4 to 8 or
    greater than 8
    -  Number of times that he/she eats at a restaurant with average expense less than \\$20 per
    person: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    -  Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    

2. Contextual attributes
    - Driving destination: home, work, or no urgent destination
    - Location of user, coupon and destination: we provide a map to show the geographical
    location of the user, destination, and the venue, and we mark the distance between each
    two places with time of driving. The user can see whether the venue is in the same
    direction as the destination.
    - Weather: sunny, rainy, or snowy
    - Temperature: 30F, 55F, or 80F
    - Time: 10AM, 2PM, or 6PM
    - Passenger: alone, partner, kid(s), or friend(s)


3. Coupon attributes
    - time before it expires: 2 hours or one day

In [2]:
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns
import pandas as pd
import numpy as np

In [3]:
#Fix printing issue from interactive shell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

### Problems

Use the prompts below to get started with your data analysis.  

1. Read in the `coupons.csv` file.




In [4]:
data = pd.read_csv('data/coupons.csv')

In [3]:
data.head(2)
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12684 entries, 0 to 12683
Data columns (total 26 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   destination           12684 non-null  object
 1   passanger             12684 non-null  object
 2   weather               12684 non-null  object
 3   temperature           12684 non-null  int64 
 4   time                  12684 non-null  object
 5   coupon                12684 non-null  object
 6   expiration            12684 non-null  object
 7   gender                12684 non-null  object
 8   age                   12684 non-null  object
 9   maritalStatus         12684 non-null  object
 10  has_children          12684 non-null  int64 
 11  education             12684 non-null  object
 12  occupation            12684 non-null  object
 13  income                12684 non-null  object
 14  car                   108 non-null    object
 15  Bar                   12577 non-null

2. Investigate the dataset for missing or problematic data.

In [5]:
data.describe()
data.head(5)
print(data["Bar"].value_counts())
print(data["Bar"].isnull().value_counts())
print(data["CoffeeHouse"].value_counts())
print(data["CoffeeHouse"].isnull().value_counts())
print(data["CarryAway"].value_counts())
print(data["CarryAway"].isnull().value_counts())
print(data["Restaurant20To50"].value_counts())
print(data["Restaurant20To50"].isnull().value_counts())
print(data["RestaurantLessThan20"].value_counts())
print(data["RestaurantLessThan20"].isnull().value_counts())
print(data["car"].value_counts())
print(data["car"].isnull().value_counts())
data.head(5)

Unnamed: 0,temperature,has_children,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y
count,12684.0,12684.0,12684.0,12684.0,12684.0,12684.0,12684.0,12684.0
mean,63.301798,0.414144,1.0,0.561495,0.119126,0.214759,0.785241,0.568433
std,19.154486,0.492593,0.0,0.496224,0.32395,0.410671,0.410671,0.495314
min,30.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
25%,55.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0
50%,80.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0
75%,80.0,1.0,1.0,1.0,0.0,0.0,1.0,1.0
max,80.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


Unnamed: 0,destination,passanger,weather,temperature,time,coupon,expiration,gender,age,maritalStatus,...,CoffeeHouse,CarryAway,RestaurantLessThan20,Restaurant20To50,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y
0,No Urgent Place,Alone,Sunny,55,2PM,Restaurant(<20),1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,0,0,0,1,1
1,No Urgent Place,Friend(s),Sunny,80,10AM,Coffee House,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,0,0,0,1,0
2,No Urgent Place,Friend(s),Sunny,80,10AM,Carry out & Take away,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,1
3,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,0
4,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,0


Bar
never    5197
less1    3482
1~3      2473
4~8      1076
gt8       349
Name: count, dtype: int64
Bar
False    12577
True       107
Name: count, dtype: int64
CoffeeHouse
less1    3385
1~3      3225
never    2962
4~8      1784
gt8      1111
Name: count, dtype: int64
CoffeeHouse
False    12467
True       217
Name: count, dtype: int64
CarryAway
1~3      4672
4~8      4258
less1    1856
gt8      1594
never     153
Name: count, dtype: int64
CarryAway
False    12533
True       151
Name: count, dtype: int64
Restaurant20To50
less1    6077
1~3      3290
never    2136
4~8       728
gt8       264
Name: count, dtype: int64
Restaurant20To50
False    12495
True       189
Name: count, dtype: int64
RestaurantLessThan20
1~3      5376
4~8      3580
less1    2093
gt8      1285
never     220
Name: count, dtype: int64
RestaurantLessThan20
False    12554
True       130
Name: count, dtype: int64
car
Scooter and motorcycle                      22
Mazda5                                      22
do not drive  

Unnamed: 0,destination,passanger,weather,temperature,time,coupon,expiration,gender,age,maritalStatus,...,CoffeeHouse,CarryAway,RestaurantLessThan20,Restaurant20To50,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y
0,No Urgent Place,Alone,Sunny,55,2PM,Restaurant(<20),1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,0,0,0,1,1
1,No Urgent Place,Friend(s),Sunny,80,10AM,Coffee House,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,0,0,0,1,0
2,No Urgent Place,Friend(s),Sunny,80,10AM,Carry out & Take away,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,1
3,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,0
4,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,0


3. Decide what to do about your missing data -- drop, replace, other...

In [6]:
#replace NaN's with 'never'
data.fillna('never', inplace=True)
#fix typo in column name - leave poor capitalization for now
data = data.rename(columns={'passanger': 'passenger'})
print(data.head(6))
data.describe()
data.head(5)
print(data["Bar"].value_counts())
print(data["Bar"].isnull().value_counts())
print(data["CoffeeHouse"].value_counts())
print(data["CoffeeHouse"].isnull().value_counts())
print(data["CarryAway"].value_counts())
print(data["CarryAway"].isnull().value_counts())
print(data["Restaurant20To50"].value_counts())
print(data["Restaurant20To50"].isnull().value_counts())
print(data["RestaurantLessThan20"].value_counts())
print(data["RestaurantLessThan20"].isnull().value_counts())
#car column has little data and will be largely ignored, so don't bother fixing it more
print(data["car"].value_counts())
print(data["car"].isnull().value_counts())

       destination  passenger weather  temperature  time  \
0  No Urgent Place      Alone   Sunny           55   2PM   
1  No Urgent Place  Friend(s)   Sunny           80  10AM   
2  No Urgent Place  Friend(s)   Sunny           80  10AM   
3  No Urgent Place  Friend(s)   Sunny           80   2PM   
4  No Urgent Place  Friend(s)   Sunny           80   2PM   
5  No Urgent Place  Friend(s)   Sunny           80   6PM   

                  coupon expiration  gender age      maritalStatus  ...  \
0        Restaurant(<20)         1d  Female  21  Unmarried partner  ...   
1           Coffee House         2h  Female  21  Unmarried partner  ...   
2  Carry out & Take away         2h  Female  21  Unmarried partner  ...   
3           Coffee House         2h  Female  21  Unmarried partner  ...   
4           Coffee House         1d  Female  21  Unmarried partner  ...   
5        Restaurant(<20)         2h  Female  21  Unmarried partner  ...   

   CoffeeHouse CarryAway RestaurantLessThan20 Restaur

Unnamed: 0,temperature,has_children,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y
count,12684.0,12684.0,12684.0,12684.0,12684.0,12684.0,12684.0,12684.0
mean,63.301798,0.414144,1.0,0.561495,0.119126,0.214759,0.785241,0.568433
std,19.154486,0.492593,0.0,0.496224,0.32395,0.410671,0.410671,0.495314
min,30.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
25%,55.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0
50%,80.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0
75%,80.0,1.0,1.0,1.0,0.0,0.0,1.0,1.0
max,80.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


Unnamed: 0,destination,passenger,weather,temperature,time,coupon,expiration,gender,age,maritalStatus,...,CoffeeHouse,CarryAway,RestaurantLessThan20,Restaurant20To50,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y
0,No Urgent Place,Alone,Sunny,55,2PM,Restaurant(<20),1d,Female,21,Unmarried partner,...,never,never,4~8,1~3,1,0,0,0,1,1
1,No Urgent Place,Friend(s),Sunny,80,10AM,Coffee House,2h,Female,21,Unmarried partner,...,never,never,4~8,1~3,1,0,0,0,1,0
2,No Urgent Place,Friend(s),Sunny,80,10AM,Carry out & Take away,2h,Female,21,Unmarried partner,...,never,never,4~8,1~3,1,1,0,0,1,1
3,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,2h,Female,21,Unmarried partner,...,never,never,4~8,1~3,1,1,0,0,1,0
4,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,1d,Female,21,Unmarried partner,...,never,never,4~8,1~3,1,1,0,0,1,0


Bar
never    5304
less1    3482
1~3      2473
4~8      1076
gt8       349
Name: count, dtype: int64
Bar
False    12684
Name: count, dtype: int64
CoffeeHouse
less1    3385
1~3      3225
never    3179
4~8      1784
gt8      1111
Name: count, dtype: int64
CoffeeHouse
False    12684
Name: count, dtype: int64
CarryAway
1~3      4672
4~8      4258
less1    1856
gt8      1594
never     304
Name: count, dtype: int64
CarryAway
False    12684
Name: count, dtype: int64
Restaurant20To50
less1    6077
1~3      3290
never    2325
4~8       728
gt8       264
Name: count, dtype: int64
Restaurant20To50
False    12684
Name: count, dtype: int64
RestaurantLessThan20
1~3      5376
4~8      3580
less1    2093
gt8      1285
never     350
Name: count, dtype: int64
RestaurantLessThan20
False    12684
Name: count, dtype: int64
car
never                                       12576
Scooter and motorcycle                         22
Mazda5                                         22
do not drive                     

4. What proportion of the total observations chose to accept the coupon? 



In [10]:
print('Obserrvations with 1 in the Y column accepted the coupon, so the proportion is: ') 
print(data["Y"].value_counts(normalize=True))

Obserrvations with 1 in the Y column accepted the coupon, so the proportion is: 
Y
1    0.568433
0    0.431567
Name: proportion, dtype: float64


5. Use a bar plot to visualize the `coupon` column.

In [11]:
fig = px.bar(data, x = "coupon", color = "coupon", title = "Coupon Values")
fig.show()
#sns.barplot(x = "coupon", data = data)

6. Use a histogram to visualize the temperature column.

In [12]:
fig = px.histogram(data, x="temperature", title = "Number of Days by Temperature")
fig.show()
#sns.histplot(data,x = "temperature")

**Investigating the Bar Coupons**

Now, we will lead you through an exploration of just the bar related coupons.  

1. Create a new `DataFrame` that contains just the bar coupons.


In [13]:
DataBar = data[data['coupon'] == 'Bar']

2. What proportion of bar coupons were accepted?


In [14]:
DataBar['Y'].value_counts(normalize = True)

Y
0    0.589985
1    0.410015
Name: proportion, dtype: float64

3. Compare the acceptance rate between those who went to a bar 3 or fewer times a month to those who went more.


In [18]:
zeroto3 = data.query("Bar == ['less1','never','1~3']")["Y"].value_counts(normalize=True)
print('The acceptance rate between those who went to a bar 3 or fewer times a month:')
print(zeroto3)
print('The acceptance rate between those who went more than 3 times a month:')
gt3 = data.query("Bar == ['4~8', 'gt8']")["Y"].value_counts(normalize=True)
print(gt3)

The acceptance rate between those who went to a bar 3 or fewer times a month:
Y
1    0.561595
0    0.438405
Name: proportion, dtype: float64
The acceptance rate between those who went more than 3 times a month:
Y
1    0.622456
0    0.377544
Name: proportion, dtype: float64


4. Compare the acceptance rate between drivers who go to a bar more than once a month and are over the age of 25 to the all others.  Is there a difference?


In [19]:
#data["age"].value_counts()
print("The acceptance rate between drivers who go to a bar more than once a month and are over the age of 25")
data.query("Bar == ['1~3', '4~8', 'gt8'] and age == ['26', '31', '36', '41', '46', '50plus']")["Y"].value_counts(normalize=True)
print("The acceptance rate for all the others")
data.query("Bar != ['1~3', '4~8', 'gt8'] or age != ['26', '31', '36', '41', '46', '50plus']")["Y"].value_counts(normalize=True)

The acceptance rate between drivers who go to a bar more than once a month and are over the age of 25


Y
1    0.621534
0    0.378466
Name: proportion, dtype: float64

The acceptance rate for all the others


Y
1    0.553548
0    0.446452
Name: proportion, dtype: float64

5. Use the same process to compare the acceptance rate between drivers who go to bars more than once a month and had passengers that were not a kid and had occupations other than farming, fishing, or forestry. 


In [22]:
print("The acceptance rate between drivers who go to bars more than once a month and had passengers that were not a kid and had occupations other than farming, fishing, or forestry:")
data.query("Bar == ['1~3', '4~8', 'gt8'] and passenger != 'Kid(s)' and occupation != 'Farming Fishing & Forestry'")["Y"].value_counts(normalize=True)
#We note that the third clause about occupation does not have any effect on this query, as none of the selected subgroup have that occuation

The acceptance rate between drivers who go to bars more than once a month and had passengers that were not a kid and had occupations other than farming, fishing, or forestry:


Y
1    0.623106
0    0.376894
Name: proportion, dtype: float64

6. Compare the acceptance rates between those drivers who:

- go to bars more than once a month, had passengers that were not a kid, and were not widowed *OR*
- go to bars more than once a month and are under the age of 30 *OR*
- go to cheap restaurants more than 4 times a month and income is less than 50K. 



In [23]:
print("The acceptance rates between those drivers who go to bars more than once a month, had passengers that were not a kid, and were not widowed:")
data.query("Bar == ['1~3', '4~8', 'gt8'] and passenger != 'Kid(s)' and maritalStatus != 'Widowed'")["Y"].value_counts(normalize=True)
print("The acceptance rates between those drivers who go to bars more than once a month and are under the age of 30:")
data.query("Bar == ['1~3', '4~8', 'gt8'] and age == ['below21', '21', '26']")["Y"].value_counts(normalize = True)
print("The acceptance rates between those drivers who go to cheap restaurants more than 4 times a month and income is less than 50K:")
data.query("RestaurantLessThan20 == ['4~8', 'gt8'] and income == ['Less than $12500', '$12500 - $24999', '$25000 - $37499', '$37500 - $49999']")["Y"].value_counts(normalize=True)

The acceptance rates between those drivers who go to bars more than once a month, had passengers that were not a kid, and were not widowed:


Y
1    0.623106
0    0.376894
Name: proportion, dtype: float64

The acceptance rates between those drivers who go to bars more than once a month and are under the age of 30:


Y
1    0.628081
0    0.371919
Name: proportion, dtype: float64

The acceptance rates between those drivers who go to cheap restaurants more than 4 times a month and income is less than 50K:


Y
1    0.600702
0    0.399298
Name: proportion, dtype: float64

7.  Based on these observations, what do you hypothesize about drivers who accepted the bar coupons?

In [28]:
print("Young people (under 30 years of old) who we know drink (ie. have been to a bar at least once a month) are the most likley group to accept the bar coupon, from those groups we have considered.")
print("It is also worth noting that the sum of the groups does not represent every possible scenario, so there are potentially other groups that have even higher acceptance rates.")

Young people (under 30 years of old) who we know drink (have been to a bar at least once a month) are the most likley group to accept the bar coupon, from those groups we have considered.
It is also worth noting that the sum of the groups does not represent every possible scenario, so there are potentially other groups that have even higher acceptance rates.


### Independent Investigation

Using the bar coupon example as motivation, you are to explore one of the other coupon groups and try to determine the characteristics of passengers who accept the coupons.  

In [44]:
#data["coupon"].value_counts()
data.head(2)
#data["CoffeeHouse"].value_counts()
data["CoffeeHouse"].value_counts()
data["age"].value_counts()
data["passenger"].value_counts()
#data.query("Bar == ['1~3', '4~8', 'gt8'] and age == ['below21', '21', '26']")["Y"].value_counts(normalize = True)
print("Acceptance rates of those that go to coffee houses 4 or more times a month and are under 30 years old:")
data.query("CoffeeHouse == ['4~8','gt8'] and age == ['below21', '21', '26']")["Y"].value_counts(normalize = True)
print("Acceptance rates of all othes:")
data.query("CoffeeHouse != ['4~8','gt8'] or age != ['below21', '21', '26']")["Y"].value_counts(normalize = True)
print("Acceptance rates of those that go to coffee houses 4 or more times per month and are over 30 years old:")
data.query("CoffeeHouse == ['4~8','gt8'] and age != ['below21', '21', '26'] ")["Y"].value_counts(normalize = True)
print("Acceptance rates of those that go to coffee houses 4 or more times per month and are under 30 years old and have children:")
data.query("CoffeeHouse == ['4~8','gt8'] and age == ['below21', '21', '26'] and passenger == 'Kid(s)'")["Y"].value_counts(normalize = True)


Unnamed: 0,destination,passenger,weather,temperature,time,coupon,expiration,gender,age,maritalStatus,...,CoffeeHouse,CarryAway,RestaurantLessThan20,Restaurant20To50,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y
0,No Urgent Place,Alone,Sunny,55,2PM,Restaurant(<20),1d,Female,21,Unmarried partner,...,never,never,4~8,1~3,1,0,0,0,1,1
1,No Urgent Place,Friend(s),Sunny,80,10AM,Coffee House,2h,Female,21,Unmarried partner,...,never,never,4~8,1~3,1,0,0,0,1,0


CoffeeHouse
less1    3385
1~3      3225
never    3179
4~8      1784
gt8      1111
Name: count, dtype: int64

age
21         2653
26         2559
31         2039
50plus     1788
36         1319
41         1093
46          686
below21     547
Name: count, dtype: int64

passenger
Alone        7305
Friend(s)    3298
Partner      1075
Kid(s)       1006
Name: count, dtype: int64

Acceptance rates of those that go to coffee houses 4 or more times a month and are under 30 years old:


Y
1    0.646751
0    0.353249
Name: proportion, dtype: float64

Acceptance rates of all othes:


Y
1    0.559189
0    0.440811
Name: proportion, dtype: float64

Acceptance rates of those that go to coffee houses 4 or more times per month and are over 30 years old:


Y
1    0.586118
0    0.413882
Name: proportion, dtype: float64

Acceptance rates of those that go to coffee houses 4 or more times per month and are under 30 years old and have children:


Y
1    0.659574
0    0.340426
Name: proportion, dtype: float64

In [72]:
data["coupon"].value_counts()
fig = px.histogram(data.query("coupon == 'Coffee House'"), y = "Y", x="age", color = "age", title = "Coffee House Coupon Acceptance by age", category_orders=dict(age=['below21', '21', '26', '31', '36', '41', '46', '51', '50plus']))
fig.update_layout(yaxis_title="Coffee House Coupons Accepted", showlegend=False)

coupon
Coffee House             3996
Restaurant(<20)          2786
Carry out & Take away    2393
Bar                      2017
Restaurant(20-50)        1492
Name: count, dtype: int64