### Will a Customer Accept the Coupon?

**Context**

Imagine driving through town and a coupon is delivered to your cell phone for a restaraunt near where you are driving. Would you accept that coupon and take a short detour to the restaraunt? Would you accept the coupon but use it on a sunbsequent trip? Would you ignore the coupon entirely? What if the coupon was for a bar instead of a restaraunt? What about a coffee house? Would you accept a bar coupon with a minor passenger in the car? What about if it was just you and your partner in the car? Would weather impact the rate of acceptance? What about the time of day?

Obviously, proximity to the business is a factor on whether the coupon is delivered to the driver or not, but what are the factors that determine whether a driver accepts the coupon once it is delivered to them? How would you determine whether a driver is likely to accept a coupon?

**Overview**

The goal of this project is to use what you know about visualizations and probability distributions to distinguish between customers who accepted a driving coupon versus those that did not.

**Data**

This data comes to us from the UCI Machine Learning repository and was collected via a survey on Amazon Mechanical Turk. The survey describes different driving scenarios including the destination, current time, weather, passenger, etc., and then ask the person whether he will accept the coupon if he is the driver. Answers that the user will drive there ‘right away’ or ‘later before the coupon expires’ are labeled as ‘Y = 1’ and answers ‘no, I do not want the coupon’ are labeled as ‘Y = 0’.  There are five different types of coupons -- less expensive restaurants (under \\$20), coffee houses, carry out & take away, bar, and more expensive restaurants (\\$20 - \\$50). 

**Deliverables**

Your final product should be a brief report that highlights the differences between customers who did and did not accept the coupons.  To explore the data you will utilize your knowledge of plotting, statistical summaries, and visualization using Python. You will publish your findings in a public facing github repository as your first portfolio piece. 





### Data Description
Keep in mind that these values mentioned below are average values.

The attributes of this data set include:
1. User attributes
    -  Gender: male, female
    -  Age: below 21, 21 to 25, 26 to 30, etc.
    -  Marital Status: single, married partner, unmarried partner, or widowed
    -  Number of children: 0, 1, or more than 1
    -  Education: high school, bachelors degree, associates degree, or graduate degree
    -  Occupation: architecture & engineering, business & financial, etc.
    -  Annual income: less than \\$12500, \\$12500 - \\$24999, \\$25000 - \\$37499, etc.
    -  Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    -  Number of times that he/she buys takeaway food: 0, less than 1, 1 to 3, 4 to 8 or greater
    than 8
    -  Number of times that he/she goes to a coffee house: 0, less than 1, 1 to 3, 4 to 8 or
    greater than 8
    -  Number of times that he/she eats at a restaurant with average expense less than \\$20 per
    person: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    -  Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    

2. Contextual attributes
    - Driving destination: home, work, or no urgent destination
    - Location of user, coupon and destination: we provide a map to show the geographical
    location of the user, destination, and the venue, and we mark the distance between each
    two places with time of driving. The user can see whether the venue is in the same
    direction as the destination.
    - Weather: sunny, rainy, or snowy
    - Temperature: 30F, 55F, or 80F
    - Time: 10AM, 2PM, or 6PM
    - Passenger: alone, partner, kid(s), or friend(s)


3. Coupon attributes
    - time before it expires: 2 hours or one day

### Import relavent libraries

In [337]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import LabelEncoder
pd.set_option('display.max_columns', None)
import plotly.express as px
import warnings
warnings.filterwarnings('ignore')
pd.set_option('display.max_columns', None)
import plotly.graph_objects as go

<b>Helper Function(s)</b> to filter dataframe with combinations of various features and its values. <b>Age</b> is a discrete variable so its added as an additional parameter

In [278]:
#Function to return bar chart for acceptance rate
def barchart(df):
    fig = px.bar(df, x='Feature Value', y='Acceptance Rate', color='Feature Value', text='Feature Value Count')
    fig.update_xaxes(categoryorder="total descending")
    fig.update_layout(yaxis_ticksuffix="%")
    fig.for_each_trace(lambda t: t.update(hovertemplate=t.hovertemplate.replace("sum of", "")))
    return fig

In [255]:
#Function to return top n correlating features:
def top_correlation (df,n,ascending):
    corr_matrix = df.corr()
    correlation = (corr_matrix.abs().where(np.triu(np.ones(corr_matrix.shape), k=1).astype(np.bool_))
                 .stack()
                 .sort_values(ascending=ascending))
    corr_df = pd.DataFrame(correlation).reset_index()
    corr_df.columns=["Feature 1","Feature 2","Correlation"]
    corr_df = corr_df.reindex(corr_df.Correlation.abs().sort_values(ascending=ascending).index).reset_index().drop(["index"],axis=1)
    return corr_df.head(n)

In [256]:
# Function to enclose a list into another list. Since assigning a list directly to a column in DataFrame spreads 
#the values into myltiple rows, we need to enclose the list into another list so it can be assigned to one row in the DF.
def extractSubList(lst):
    return [[elementl] for elementl in lst]

In [257]:
# Function to unwrap elements of list of items within another list
def returnSubList(lst):
    if(isinstance(lst[0],list)):
        return lst[0]
    else:
        return lst    

In [313]:
# Filter input dataframe based on various categorical features and its values to return the filtered dataframe 
# and an option second dataframe with acceptance rate. Filter condition takes the non tranformed input data.

def Filter_Dataframe(df,filter_values, age='',rate_df = False, target_var=''):
  
    if age == None or age == '':
        age = '> 0'
        
    categories_keys = [] 
    category_vals = []
    multipleKeys = True
    
    if(len(filter_values.keys()) == 1):
        item = filter_values[list(filter_values.keys())[0]]
        if (item == '' or item == []):
            multipleKeys = False
    
    for key in filter_values.keys():
        item = filter_values[key]
        if (item == '' or item == []):
            category_vals.extend(list(orig_col_vals[key].keys()))
            item = list(orig_col_vals[key].values())
            categories_keys.extend([key] * len(item))
        else:
            category_vals.append(item) # add it as list of list so dataframe won't break it into multiple rows
            item = [orig_col_vals[key][i] for i in item if i in orig_col_vals[key]]
            categories_keys.append(key)
            
        filter_values[key] = item
    
    category_vals = extractSubList(category_vals)
    
    #Form the query based on all the input conditions. Variable Age is numeric so its added as an additional parameter.
    query = [f"({col} in {val})" for col, val in filter_values.items()]
    query = ' & '.join(query)
    query +=' & age ' + age
    
    #Include Target Variable in Filter? If Y, then include the target variable. 
    if(target_var == 'Y'):
        query += ' & Y == 1 '
    
    df_filtered = df.query(query)
    df_filtered['Age_Category'] = age
    
    # if True, send another dataframe with acceptance rate for each category value.
    if (rate_df == True):
        subcatcount = []
        df_acc_rate = pd.DataFrame(columns=['Category','Value(s)','Transformed vals','Acceptance Rate','Age','Feature Value Count','Feature Value','Value Counts'])
        df_acc_rate['Category'] = categories_keys
        df_acc_rate['Value(s)'] = category_vals
        df_acc_rate['Value(s)'] = df_acc_rate['Value(s)'].apply(returnSubList)
        df_acc_rate['Age'] = age
        
        
        for index, row in df_acc_rate.iterrows():
            if(multipleKeys == False):
                row['Feature Value'] = ','.join(row['Value(s)'])
            else:
                row['Feature Value'] = row['Category'] + ' in ' + ','.join(row['Value(s)'])
            category = row['Category']
            category_vals = row['Value(s)']
            category_Transformed_vals = [orig_col_vals[category][i] for i in category_vals]
            row['Transformed vals'] = category_Transformed_vals
            row['Value Counts'] = data_copy[category].unique()
            #rowcount = df_filtered.query('{0} == {1}'.format(category,int(orig_col_vals[category][sub_category]))).shape[0]
            rowcount = df_filtered.query('{0} in {1}'.format(category,category_Transformed_vals)).shape[0]
            row['Feature Value Count'] = rowcount
            row['Acceptance Rate'] = round(rowcount / df.shape[0] * 100,2)
            
            #if(accepted == 'Y'):
            #    row['Acceptance Rate'] = round(rowcount / df.loc[df['Y'] == 1].shape[0] * 100,2)
            #else:
            #    
                
                                         
        return df_filtered,df_acc_rate
    else:
        return df_filtered

In [332]:
# This function returns a combined acceptance rate for multiple features.
#Eg: Acceptance rate for Occupation in [list] and Bar in [list] and income in [list]

def dfbyfeature(df):
    df1 = pd.DataFrame(columns=['Acceptance Rate','Feature Value Count', 'Age', 'Feature Value'])
    row = df.iloc[:1]
    df1['Acceptance Rate'] = row['Acceptance Rate']
    df1['Feature Value Count'] = row['Feature Value Count']
    df1['Age'] = ' ' + row['Age']
    df1['Feature Value'] = ' (&) '.join(df['Feature Value'])
    return df1    

### Problems

Use the prompts below to get started with your data analysis.  

##### 1. Read in the `coupons.csv` file.




In [338]:
data = pd.read_csv('data/coupons.csv')

#####  2. Investigate the dataset for missing or problematic data.

Feature <b>"Age"</b> has two distinct values as "50plus" and "below21". We can use the numeric values <b>51</b> for "50plus" and <b>20</b> for "below 21" so this column can be converted to numeric datatype

In [339]:
data['age']  = data['age'].replace({'50plus':'51','below21':'20'}).apply(pd.to_numeric)

###### Get information about missing values of each column

Calculate the percentage of <b>missing</b> values and the most recurring value <b>(mode) </b>
for each of those features that are missing values.

In [340]:
percent_missing = round(data.isnull().sum() * 100 / len(data),2)
missing_value_df = pd.DataFrame({'column_name': data.columns,
                                 'percent_missing': percent_missing})
missing_value_df = missing_value_df[missing_value_df['percent_missing'] > 0.00]
missing_value_df.set_index('column_name')
missing_value_df.drop(columns=['column_name'],axis=1,inplace=True)


cols = missing_value_df.index
most_recurring_vals = []
for c in cols:
    most_recurring_vals.append(data[c].value_counts().index[0])
missing_value_df['Most Recurring Value'] = most_recurring_vals
missing_value_df

Unnamed: 0,percent_missing,Most Recurring Value
car,99.15,Scooter and motorcycle
Bar,0.84,never
CoffeeHouse,1.71,less1
CarryAway,1.19,1~3
RestaurantLessThan20,1.02,1~3
Restaurant20To50,1.49,less1


#### 3. Decide what to do about your missing data -- drop, replace, other...

Since <b>99 %</b> of values are missing in feature <b>"car"</b>, lets drop the column from the dataset.

In [341]:
data.drop(columns=['car'], axis=1, inplace=True)
missing_value_df = missing_value_df.iloc[1:]
missing_value_df

Unnamed: 0,percent_missing,Most Recurring Value
Bar,0.84,never
CoffeeHouse,1.71,less1
CarryAway,1.19,1~3
RestaurantLessThan20,1.02,1~3
Restaurant20To50,1.49,less1


Feature <b>"toCoupon_GEQ5min" </b> contains only one value which is <b>1</b>. So we can drop this column as well.

In [342]:
data.drop(columns=['toCoupon_GEQ5min'], axis=1, inplace=True)

Populate the most recurring values <b>(mode)</b> for each feature that are <b>still missing</b> values

In [343]:
for rec_val in missing_value_df.index:
    data[rec_val] = data[rec_val].fillna(missing_value_df['Most Recurring Value'][rec_val])

Make a copy of the original Dataframe before transforming the <b>categorical</b> features.

In [344]:
data_copy = data.copy()

<b>Transform</b> the <b>categorical</b> variables to <b>numeric</b> to be used for further analysis

Store the <b>original values</b> of features values before transformation.

In [345]:
orig_col_vals = {}

In [346]:
from sklearn import preprocessing

Categorical_Colums = data.select_dtypes(include=['object']).columns.tolist()
le = preprocessing.LabelEncoder()
for column in Categorical_Colums:
    orig_col_vals[column] = {}
    data[column] = le.fit_transform(data[column])
    orig_col_vals[column] = dict(zip(le.classes_, le.transform(le.classes_)))    

Lets perform a <b>Simple Logistic Regression</b> to understand the significance <b>(p-values)</b> of features and its <b>coefficient</b> so we can explore these features further in this task.

In [347]:
import statsmodels.api as sm

In [348]:
#declare independent (y) variables and dependent (x1) variables
y = data['Y']
x1 = data.drop(columns=['Y','coupon'],axis=1)

In [349]:
x = sm.add_constant(x1)  #add constant (intercept)
reg_log = sm.Logit(y,x) #regression
results_log = reg_log.fit()

summary_df = pd.DataFrame(data=x1.columns.tolist(), columns=['Features'])
summary_df.set_index('Features')
summary_df['p-values'] = np.array(results_log.pvalues.iloc[1:].tolist(), dtype=float)
summary_df['coeff'] = np.array(results_log.params.iloc[1:].tolist(), dtype = float)
summary_df['p-values'] = round(summary_df['p-values'],3)
summary_df['coeff'] = round(summary_df['coeff'],3)

Optimization terminated successfully.
         Current function value: 0.646121
         Iterations 5


In [350]:
# Get top 6 Features which we can explore later
summary_df = summary_df[summary_df['p-values'] <= 0.05].sort_values(by='p-values', ascending=True).head(6)
summary_df.set_index('Features',inplace=True)
summary_df

Unnamed: 0_level_0,p-values,coeff
Features,Unnamed: 1_level_1,Unnamed: 2_level_1
destination,0.0,0.124
passanger,0.0,0.098
weather,0.0,0.248
time,0.0,-0.073
expiration,0.0,-0.613
gender,0.0,0.19


#### 4. What proportion of the total observations chose to accept the coupon? 



In [351]:
coupon_acceptance_percentage = round((data.query('Y == 1').shape[0] / data.shape[0]) * 100, 2)
print('Proportion of total observations that chose to accept coupon is {0} %'.format(coupon_acceptance_percentage))

Proportion of total observations that chose to accept coupon is 56.84 %


In [352]:
fig = px.pie(data['Y'],names=data['Y'].map({0:'Not Accepted',1:'Accepted'}),title='Coupon Acceptance Percentage')
fig.show()

F - 56.8% of drivers accepted coupons

#### 5. Use a bar plot to visualize the `coupon` column.

In [353]:
px.histogram(data_frame=data_copy, x='coupon', color=data_copy['Y'].map({0:'Not Accepted',1:'Accepted'}))

#### 6. Use a histogram to visualize the temperature column.

In [354]:
px.histogram(data_frame=data_copy, x='temperature', color=data_copy['Y'].map({0:'Not Accepted',1:'Accepted'}))

Acceptance Rate by feature <b>Weather</b> as Weather also compasses temperature.

In [355]:
df_weather = Filter_Dataframe(data,{'weather':[]},'',True,'Y')[1]
fig = barchart(df_weather)
fig.show()

F - Coupon acceptance rate is more when the weather is sunny

**Investigating the Bar Coupons**

Now, we will lead you through an exploration of just the bar related coupons.  

#### 1. Create a new `DataFrame` that contains just the bar coupons.


In [356]:
bar_df = data[data['coupon'] == orig_col_vals['coupon']['Bar']]

#### 2. What proportion of bar coupons were accepted?


In [357]:
bar_coupon_acceptance_percentage = round((bar_df.query('Y == 1').shape[0] / bar_df.shape[0]) * 100, 2)
print('Proportion of bar coupons were accepted is {0} %'.format(bar_coupon_acceptance_percentage))

Proportion of bar coupons were accepted is 41.0 %


#### 3. Compare the acceptance rate between those who went to a bar 3 or fewer times a month to those who went more.


Feature <b>"Bar"</b> has a value called <b>"never"</b>. What this means is that, the driver <b>never</b> goes to bar. However, in order to decide whether to include this value for the forthcoming analysis, check if there are rows where <b>Bar == 'never'</b> and coupon accepted = <b>'Y'</b>. If such rows exists, then we will consider this value.

In [358]:
bar_never_accepted_coupon = Filter_Dataframe(bar_df,{'Bar':['never']},'',False,'Y').shape[0]

Since there are <b>164</b> records, we will keep this value and tag is against drivers going to bar less than 1.

In [359]:
bar_3_or_less = ['never','less1','1~3']
bar_more_than_3 = ['4~8','gt8']

In [360]:
df_bar_3_or_less = dfbyfeature(Filter_Dataframe(bar_df,{'Bar':bar_3_or_less},'',True,'Y')[1])
df_bar_more_than_3 = dfbyfeature(Filter_Dataframe(bar_df,{'Bar':bar_more_than_3},'',True,'Y')[1])
df_bar_less3_and_greater3 = pd.concat([df_bar_3_or_less,df_bar_more_than_3])
fig = barchart(df_bar_less3_and_greater3)
fig.show()

F - Drivers who went to bar more than 3 times a month accepted <b>less</b> number of coupons than drivers who went to bar more than 3 times a month 

#### 4. Compare the acceptance rate between drivers who go to a bar more than once a month and are over the age of 25 to the all others.  Is there a difference?


In [361]:
df_bar_more_than_1_over_25yrs = dfbyfeature(Filter_Dataframe(bar_df,{'Bar':['1~3','4~8','gt8']},'>25',True,'Y')[1])
df_bar_less_than_1_over_25yrs = dfbyfeature(Filter_Dataframe(bar_df,{'Bar':['less1','never']},'>25',True,'Y')[1])
df_bar_more_than_1_under_25yrs = dfbyfeature(Filter_Dataframe(bar_df,{'Bar':['1~3','4~8','gt8']},'<=25',True,'Y')[1])
df_bar_less_than_1_under_25yrs = dfbyfeature(Filter_Dataframe(bar_df,{'Bar':['less1','never']},'<=25',True,'Y')[1])
df_combined = pd.concat([df_bar_more_than_1_over_25yrs,df_bar_less_than_1_over_25yrs,df_bar_more_than_1_under_25yrs,df_bar_less_than_1_under_25yrs])
fig = barchart(df_combined)
fig.show()

In [362]:
bar coupon acceptance rate based on age interval

SyntaxError: invalid syntax (4082488287.py, line 1)

F - Bar coupon acceptance rate for age over 25 is almost the same between driver going to bar more or less than once a month. Also, the acceptance rate for age under 25 is almost the same between driver going to bar more or less than once a month.

#### 5. Use the same process to compare the acceptance rate between drivers who go to bars more than once a month and had passengers that were not a kid and had occupations other than farming, fishing, or forestry. 


<b>Visualize the acceptance rate by occupation</b>

In [363]:
df = Filter_Dataframe(bar_df,{'occupation':[]},'',True,'Y')[1]
fig = px.bar(df, x='Feature Value', y='Acceptance Rate', color='Feature Value', text='Feature Value Count')
fig.update_xaxes(categoryorder="total descending")
fig.update_layout(yaxis_ticksuffix="%")
fig.show()

F - Student and Unemployed has the largest acceptance rate for coupon.

<b>Visualize the acceptance rate by passanger</b>

In [364]:
df = Filter_Dataframe(bar_df,{'passanger':[]},'',True,'Y')[1]
fig = px.bar(df, x='Feature Value', y='Acceptance Rate', color='Feature Value', text='Feature Value Count')
fig.update_xaxes(categoryorder="total descending")
fig.update_layout(yaxis_ticksuffix="%")
fig.show()

F - Passanger travelling <b>alone</b> had much more acceptance rate than travelling with others

In [365]:
#Acceptance rate of drivers going to bar more than once a month with and without kids as a passanger.

In [366]:
bar_unique_vals = list(data_copy['Bar'].unique())
bar_less_than_a_month = ['never', 'less1']
no_kids = list(data_copy['passanger'].unique())
no_kids.remove('Kid(s)')

bar_more_than_a_month = list(set(bar_unique_vals) - set(bar_less_than_a_month))
df1 = dfbyfeature(Filter_Dataframe(bar_df,{'Bar':bar_less_than_a_month,'passanger':no_kids},'',True,'Y')[1])
df2 = dfbyfeature(Filter_Dataframe(bar_df,{'Bar':bar_more_than_a_month,'passanger':no_kids},'',True,'Y')[1])
df_combined = pd.concat([df1,df2])
df_combined

Unnamed: 0,Acceptance Rate,Feature Value Count,Age,Feature Value
0,19.39,391,> 0,"Bar in never,less1 (&) passanger in Alone,Frie..."
0,19.48,393,> 0,"Bar in 1~3,gt8,4~8 (&) passanger in Alone,Frie..."


F - No significant difference in acceptance rate between drivers going to bar more than or less than once a month having kids as passanger at the time of receiving (accepting) the coupon.

Compare the coupon acceptance between <b>working</b> and <b>Non working </b> drivers. Note: Student, Unemployed and Retired are considered as non working.

In [367]:
lst_notworking = ['Student','Unemployed','Retired']
lst_all = list(data_copy['occupation'].unique())
lst_working = list(set(lst_all) - set(lst_notworking))

In [368]:
df_nonworking = dfbyfeature(Filter_Dataframe(bar_df,{'occupation':lst_notworking},'',True,'Y')[1])
df_working = dfbyfeature(Filter_Dataframe(bar_df,{'occupation':lst_working},'',True,'Y')[1])
df_working['Feature Value'] = 'All other occupation'
df_combined = pd.concat([df_nonworking,df_working])
df_combined

Unnamed: 0,Acceptance Rate,Feature Value Count,Age,Feature Value
0,11.35,229,> 0,"occupation in Student,Unemployed,Retired"
0,29.65,598,> 0,All other occupation


In [369]:
fig = px.bar(df_combined, x='Feature Value', y='Acceptance Rate', color='Feature Value',text='Feature Value Count')
fig.update_xaxes(categoryorder="total descending")
fig.update_layout(yaxis_ticksuffix="%")
fig.show()

#### 6. Compare the acceptance rates between those drivers who:

- go to bars more than once a month, had passengers that were not a kid, and were not widowed *OR*
- go to bars more than once a month and are under the age of 30 *OR*
- go to cheap restaurants more than 4 times a month and income is less than 50K. 



Visualize the acceptance rate based on age > 25 and <= 25

In [377]:
df_age_less_25 = round(bar_df.query('age <= 25 and Y==1').shape[0] / bar_df.shape[0] * 100,2)
df_age_grt_25 = round(bar_df.query('age > 25 and Y==1').shape[0] / bar_df.shape[0] * 100,2)
lst_acc_rate = [df_age_grt_25,df_age_less_25]

colors = ['green','red']
fig = go.Figure(data=[go.Pie(labels=['Age > 25','Age < 25'],
                             values=lst_acc_rate)])
fig.update_traces(hoverinfo='label+percent',textfont_size=20, textinfo='label+percent', #stylize it to include hover
                 pull=[0.1,0],  # if you want to pull out the first and third block in the array
                 marker=dict(colors=colors, line=dict(color='#FFFFFF', width=2)),
                 ) 
fig.update_layout(title="Acceptance Rate by drivers over and under the age 25")
fig.show()

Drivers over 25 accepted more coupons than drivers under 25 years old

Visualize the acceptance rate based on marital status

In [241]:
fig = barchart(Filter_Dataframe(bar_df,{'maritalStatus':[]},'',True,'Y')[1])
fig.show()

F - Single and Married Partners accepted the most

In [250]:
#maritalStatus
lst_notwidow = list(data_copy['maritalStatus'].unique())
lst_notwidow.remove('Widowed')

#RestaurantLessThan20
cheap_rest_more_than_4_times = ['4~8','gt8']
cheap_rest_less_than_4_times = list(data_copy['RestaurantLessThan20'].unique())
cheap_rest_less_than_4_times = [elm for elm in cheap_rest_less_than_4_times if elm not in cheap_rest_more_than_4_times]

#income
income_less_than_50k = ['Less than $12500','$100000 or More','$12500 - $24999','$25000 - $37499','$37500 - $49999']
income_more_than_50k = list(data_copy['income'].unique())
income_more_than_50k = [elm for elm in income_more_than_50k if elm not in income_less_than_50k]

In [334]:
pd.set_option('display.max_columns', None)
#go to bars more than once a month, had passengers that were not a kid, and were not widowed
df1 = dfbyfeature(Filter_Dataframe(bar_df,{'Bar':bar_more_than_a_month,'passanger':no_kids,'maritalStatus':lst_notwidow},'',True,'Y')[1])
df1['Feature Value'] = 'Bar > 4 times and no kid passanger and not a widow'
#go to bars more than once a month and are under the age of 30
df2 = dfbyfeature(Filter_Dataframe(bar_df,{'Bar':bar_more_than_a_month},'<30',True,'Y')[1])
df2['Feature Value'] = 'Bar > 4 times and age < 30'
#go to cheap restaurants more than 4 times a month and income is less than 50K
df3 = dfbyfeature(Filter_Dataframe(bar_df,{'RestaurantLessThan20':cheap_rest_more_than_4_times,'income':income_less_than_50k},'',True,'Y')[1])
df3['Feature Value'] = 'Cheap restaurant > 4 times and income < 50k '
fig = barchart(pd.concat([df1,df2,df3]))
fig.show()

#### 7.  Based on these observations, what do you hypothesize about drivers who accepted the bar coupons?

1) Drivers who are over 25 year old generally accepted more coupons that drivers under 25 years old.
2) Drivers who travelled alone accepted more coupons than travelling with others.
3) Drivers who are students or unemployed accepted more coupons than other drivers in other occupations.
4) Single and Married Partners accepted the most coupons.
4) No significant difference in acceptance rate between drivers going to bar more than or less than once a month having kids as passanger at the time of receiving (accepting) the coupon.

### Independent Investigation

Using the bar coupon example as motivation, you are to explore one of the other coupon groups and try to determine the characteristics of passengers who accept the coupons.  