# Starbucks Capstone Challenge

### Introduction

This data set contains simulated data that mimics customer behavior on the Starbucks rewards mobile app. Once every few days, Starbucks sends out an offer to users of the mobile app. An offer can be merely an advertisement for a drink or an actual offer such as a discount or BOGO (buy one get one free). Some users might not receive any offer during certain weeks. 

Not all users receive the same offer, and that is the challenge to solve with this data set.

Your task is to combine transaction, demographic and offer data to determine which demographic groups respond best to which offer type. This data set is a simplified version of the real Starbucks app because the underlying simulator only has one product whereas Starbucks actually sells dozens of products.

Every offer has a validity period before the offer expires. As an example, a BOGO offer might be valid for only 5 days. You'll see in the data set that informational offers have a validity period even though these ads are merely providing information about a product; for example, if an informational offer has 7 days of validity, you can assume the customer is feeling the influence of the offer for 7 days after receiving the advertisement.

You'll be given transactional data showing user purchases made on the app including the timestamp of purchase and the amount of money spent on a purchase. This transactional data also has a record for each offer that a user receives as well as a record for when a user actually views the offer. There are also records for when a user completes an offer. 

Keep in mind as well that someone using the app might make a purchase through the app without having received an offer or seen an offer.

### Example

To give an example, a user could receive a discount offer buy 10 dollars get 2 off on Monday. The offer is valid for 10 days from receipt. If the customer accumulates at least 10 dollars in purchases during the validity period, the customer completes the offer.

However, there are a few things to watch out for in this data set. Customers do not opt into the offers that they receive; in other words, a user can receive an offer, never actually view the offer, and still complete the offer. For example, a user might receive the "buy 10 dollars get 2 dollars off offer", but the user never opens the offer during the 10 day validity period. The customer spends 15 dollars during those ten days. There will be an offer completion record in the data set; however, the customer was not influenced by the offer because the customer never viewed the offer.

### Cleaning

This makes data cleaning especially important and tricky.

You'll also want to take into account that some demographic groups will make purchases even if they don't receive an offer. From a business perspective, if a customer is going to make a 10 dollar purchase without an offer anyway, you wouldn't want to send a buy 10 dollars get 2 dollars off offer. You'll want to try to assess what a certain demographic group will buy when not receiving any offers.

### Final Advice

Because this is a capstone project, you are free to analyze the data any way you see fit. For example, you could build a machine learning model that predicts how much someone will spend based on demographics and offer type. Or you could build a model that predicts whether or not someone will respond to an offer. Or, you don't need to build a machine learning model at all. You could develop a set of heuristics that determine what offer you should send to each customer (i.e., 75 percent of women customers who were 35 years old responded to offer A vs 40 percent from the same demographic to offer B, so send offer A).

# Data Sets

The data is contained in three files:

* portfolio.json - containing offer ids and meta data about each offer (duration, type, etc.)
* profile.json - demographic data for each customer
* transcript.json - records for transactions, offers received, offers viewed, and offers completed

Here is the schema and explanation of each variable in the files:

**portfolio.json**
* id (string) - offer id
* offer_type (string) - type of offer ie BOGO, discount, informational
* difficulty (int) - minimum required spend to complete an offer
* reward (int) - reward given for completing an offer
* duration (int) - time for offer to be open, in days
* channels (list of strings)

**profile.json**
* age (int) - age of the customer 
* became_member_on (int) - date when customer created an app account
* gender (str) - gender of the customer (note some entries contain 'O' for other rather than M or F)
* id (str) - customer id
* income (float) - customer's income

**transcript.json**
* event (str) - record description (ie transaction, offer received, offer viewed, etc.)
* person (str) - customer id
* time (int) - time in hours since start of test. The data begins at time t=0
* value - (dict of strings) - either an offer id or transaction amount depending on the record

**Note:** If you are using the workspace, you will need to go to the terminal and run the command `conda update pandas` before reading in the files. This is because the version of pandas in the workspace cannot read in the transcript.json file correctly, but the newest version of pandas can. You can access the termnal from the orange icon in the top left of this notebook.  

You can see how to access the terminal and how the install works using the two images below.  First you need to access the terminal:

<img src="pic1.png"/>

Then you will want to run the above command:

<img src="pic2.png"/>

Finally, when you enter back into the notebook (use the jupyter icon again), you should be able to run the below cell without any errors.

In [3]:
import pandas as pd
import numpy as np
import os
import math
import json
import plotly.graph_objects as go
# % matplotlib inline

# read in the json files
portfolio = pd.read_json('data/portfolio.json', orient='records', lines=True)
profile = pd.read_json('data/profile.json', orient='records', lines=True)
transcript = pd.read_json('data/transcript.json', orient='records', lines=True)

# Cleaning Data

## Pairing Event Data to User Dat
* For each user, we'd like to be able to see whether viewing an offer led to transactions or completions
    * To do this, we need to track offer views, completions, and transactions back to the time that the offer was received

#### Checking Event Types in Transcript DF

In [465]:
event_types = transcript['event'].unique().tolist()
event_types

['offer received', 'offer viewed', 'transaction', 'offer completed']

#### Check all offer types in Transcript

In [466]:
def get_offer_types(d, offer_types):
    try:
        for k, v in d.items():
            offer_types.append(k)
    except:
        print(f'failed - received {type(d)} item: {d}')
    return offer_types

offer_types = []
transcript['value'].apply(get_offer_types, args=[offer_types])
set(offer_types)

{'amount', 'offer id', 'offer_id', 'reward'}

### Split Transcript by Event
* There are 4 types of 'event': 'offer received', 'offer viewed', 'transaction', 'offer completed'

#### In the event 'offer_received', the 'value' column contains a dict with key 'offer id'.

In [467]:
offer_types = []
offer_received = transcript.copy()[transcript['event']=='offer received']
offer_received['value'].apply(get_offer_types, args=[offer_types])
offer_type = list(set(offer_types))[0]   # 'offer id'
offer_received[offer_type.replace(' ','_')] = offer_received['value'].apply(lambda d: d[offer_type])
offer_received.head()

Unnamed: 0,person,event,value,time,offer_id
0,78afa995795e4d85b5d9ceeca43f5fef,offer received,{'offer id': '9b98b8c7a33c4b65b9aebfe6a799e6d9'},0,9b98b8c7a33c4b65b9aebfe6a799e6d9
1,a03223e636434f42ac4c3df47e8bac43,offer received,{'offer id': '0b1e1539f2cc45b7b9fa7c272da2e1d7'},0,0b1e1539f2cc45b7b9fa7c272da2e1d7
2,e2127556f4f64592b11af22de27a7932,offer received,{'offer id': '2906b810c7d4411798c6938adc9daaa5'},0,2906b810c7d4411798c6938adc9daaa5
3,8ec6ce2a7e7949b1bf142def7d0e0586,offer received,{'offer id': 'fafdcd668e3743c1bb461111dcafc2a4'},0,fafdcd668e3743c1bb461111dcafc2a4
4,68617ca6246f4fbc85e91a2a49552598,offer received,{'offer id': '4d5c57ea9a6940dd891ad53e9dbe8da0'},0,4d5c57ea9a6940dd891ad53e9dbe8da0


#### In the event 'offer_viewed', the 'value' column contains a dict with key 'offer id'.

In [468]:
offer_viewed = transcript.copy()[transcript['event']=='offer viewed']
offer_types = []
offer_viewed['value'].apply(get_offer_types, args=[offer_types])
offer_type = list(set(offer_types))[0]     # offer id
offer_viewed[offer_type.replace(' ','_')] = offer_viewed['value'].apply(lambda d: d[offer_type])
offer_viewed.head()

Unnamed: 0,person,event,value,time,offer_id
12650,389bc3fa690240e798340f5a15918d5c,offer viewed,{'offer id': 'f19421c1d4aa40978ebb69ca19b0e20d'},0,f19421c1d4aa40978ebb69ca19b0e20d
12651,d1ede868e29245ea91818a903fec04c6,offer viewed,{'offer id': '5a8bc65990b245e5a138643cd4eb9837'},0,5a8bc65990b245e5a138643cd4eb9837
12652,102e9454054946fda62242d2e176fdce,offer viewed,{'offer id': '4d5c57ea9a6940dd891ad53e9dbe8da0'},0,4d5c57ea9a6940dd891ad53e9dbe8da0
12653,02c083884c7d45b39cc68e1314fec56c,offer viewed,{'offer id': 'ae264e3637204a6fb9bb56bc8210ddfd'},0,ae264e3637204a6fb9bb56bc8210ddfd
12655,be8a5d1981a2458d90b255ddc7e0d174,offer viewed,{'offer id': '5a8bc65990b245e5a138643cd4eb9837'},0,5a8bc65990b245e5a138643cd4eb9837


#### In the event 'transaction', the 'value' column contains a dict with key 'amount''.

In [469]:
transactions = transcript.copy()[transcript['event']=='transaction']
offer_types = []
transactions['value'].apply(get_offer_types, args=[offer_types])
offer_type = list(set(offer_types))[0]   # 'amount'
transactions[offer_type] = transactions['value'].apply(lambda d: d[offer_type])
transactions.head()

Unnamed: 0,person,event,value,time,amount
12654,02c083884c7d45b39cc68e1314fec56c,transaction,{'amount': 0.8300000000000001},0,0.83
12657,9fa9ae8f57894cc9a3b8a9bbe0fc1b2f,transaction,{'amount': 34.56},0,34.56
12659,54890f68699049c2a04d415abc25e717,transaction,{'amount': 13.23},0,13.23
12670,b2f1cd155b864803ad8334cdf13c4bd2,transaction,{'amount': 19.51},0,19.51
12671,fe97aa22dd3e48c8b143116a8403dd52,transaction,{'amount': 18.97},0,18.97


#### In the event 'offer_completed', the 'value' column contains a dict with key 'offer_id'


In [470]:
offer_completed = transcript.copy()[transcript['event']=='offer completed']
offer_types = []
offer_completed['value'].apply(get_offer_types, args=[offer_types])
offer_type = list(set(offer_types))[0]    # 'offer_id'
offer_completed[offer_type] = offer_completed['value'].apply(lambda d: d[offer_type])
offer_completed.head()

Unnamed: 0,person,event,value,time,offer_id
12658,9fa9ae8f57894cc9a3b8a9bbe0fc1b2f,offer completed,{'offer_id': '2906b810c7d4411798c6938adc9daaa5...,0,2906b810c7d4411798c6938adc9daaa5
12672,fe97aa22dd3e48c8b143116a8403dd52,offer completed,{'offer_id': 'fafdcd668e3743c1bb461111dcafc2a4...,0,fafdcd668e3743c1bb461111dcafc2a4
12679,629fc02d56414d91bca360decdfa9288,offer completed,{'offer_id': '9b98b8c7a33c4b65b9aebfe6a799e6d9...,0,9b98b8c7a33c4b65b9aebfe6a799e6d9
12692,676506bad68e4161b9bbaffeb039626b,offer completed,{'offer_id': 'ae264e3637204a6fb9bb56bc8210ddfd...,0,ae264e3637204a6fb9bb56bc8210ddfd
12697,8f7dd3b2afe14c078eb4f6e6fe4ba97d,offer completed,{'offer_id': '4d5c57ea9a6940dd891ad53e9dbe8da0...,0,4d5c57ea9a6940dd891ad53e9dbe8da0


In [471]:
event_type_df_lengths = [len(offer_received), len(offer_viewed), len(transactions), len(offer_completed)]
data = [
    go.Bar(x=event_types, y=event_type_df_lengths)
]
layout={
    'title':'Number of Rows for each Event Type'
}
fig = go.Figure(data, layout)
fig.show()

### Cleaning Transcript DF -> into 'transcript_df'
* rename all 'offer id' value keys as 'offer_id'
* Separating 'value' col of dicts into other columns by adding the 'value' dict keys as columns and 'value' dict values as values:
    * 'offer_id', 'amount', 'reward'
    * NaN if that event does not have this key
* removing the 'value' column afterward
* drop duplicates


In [472]:
transcript_df = transcript.copy()
transcript_df['event'] = transcript_df['event'].apply(lambda event: event.replace(' ',  '_'))

def rename_key_w_underscore(value_dict):
    renamed_dict = {}
    for k, v in value_dict.items():
        renamed_dict[k.replace(' ', '_')] = v
    return renamed_dict
        
    
transcript_df['value'] = transcript_df['value'].apply(rename_key_w_underscore)

def get_value_val(value_dict, value_key):
    try:
        return value_dict[value_key]
    except:
        return np.nan
for value_key in ['offer_id', 'amount', 'reward']:
    transcript_df[value_key] = transcript_df['value'].apply(get_value_val, args=[value_key])

transcript_df.drop(columns='value', inplace=True)
transcript_df.drop_duplicates(inplace=True)

In [473]:
# Re-pull event types now that text has been cleaned to replace spaces with underscores
event_types = transcript_df['event'].unique().tolist()
event_types

['offer_received', 'offer_viewed', 'transaction', 'offer_completed']

### Cleaning Portfolio Data
* Adding offer_num column to map offer ID to a number for visualizations below

In [474]:
portfolio['offer_num'] = portfolio.index+1
portfolio

Unnamed: 0,reward,channels,difficulty,duration,offer_type,id,offer_num
0,10,"[email, mobile, social]",10,7,bogo,ae264e3637204a6fb9bb56bc8210ddfd,1
1,10,"[web, email, mobile, social]",10,5,bogo,4d5c57ea9a6940dd891ad53e9dbe8da0,2
2,0,"[web, email, mobile]",0,4,informational,3f207df678b143eea3cee63160fa8bed,3
3,5,"[web, email, mobile]",5,7,bogo,9b98b8c7a33c4b65b9aebfe6a799e6d9,4
4,5,"[web, email]",20,10,discount,0b1e1539f2cc45b7b9fa7c272da2e1d7,5
5,3,"[web, email, mobile, social]",7,7,discount,2298d6c36e964ae4a3e7e9706d1fb8c2,6
6,2,"[web, email, mobile, social]",10,10,discount,fafdcd668e3743c1bb461111dcafc2a4,7
7,0,"[email, mobile, social]",0,3,informational,5a8bc65990b245e5a138643cd4eb9837,8
8,5,"[web, email, mobile, social]",5,5,bogo,f19421c1d4aa40978ebb69ca19b0e20d,9
9,2,"[web, email, mobile]",10,7,discount,2906b810c7d4411798c6938adc9daaa5,10


## Distribution of Offers in Sample Data

#### How offer events are distributed
* Offers 6 and 7 saw the most completions
* There were more users that completed offers 4 and 5 than users that viewed them

In [475]:
received_count = pd.DataFrame(transcript_df[transcript_df['event']=='offer_received'].groupby('offer_id')['event'].count())
received_count = received_count.rename(columns={'event':'count'})
received_count = received_count.merge(portfolio[['id', 'offer_num']], how='left', left_on='offer_id', right_on='id')

viewed_count = pd.DataFrame(transcript_df[transcript_df['event']=='offer_viewed'].groupby('offer_id')['event'].count())
viewed_count = viewed_count.rename(columns={'event':'count'})
viewed_count = viewed_count.merge(portfolio[['id', 'offer_num']], how='left', left_on='offer_id', right_on='id')

completed_count = pd.DataFrame(transcript_df[transcript_df['event']=='offer_completed'].groupby('offer_id')['event'].count())
completed_count = completed_count.rename(columns={'event':'count'})
completed_count = completed_count.merge(portfolio[['id', 'offer_num']], how='left', left_on='offer_id', right_on='id')


data =  [
    go.Bar(x=received_count['offer_num'], y=received_count['count'], name='offers received'),
    go.Bar(x=viewed_count['offer_num'], y=viewed_count['count'], name='offers viewed'),
    go.Bar(x=completed_count['offer_num'], y=completed_count['count'], name='offers completed'),
]
layout = dict(
    title='Number of Total Offers Received, Viewed, and Completed by Offer #',
    xaxis=dict(
        title='Offer #',
        tickmode='linear')    
)
    
fig = go.Figure(data, layout)
fig.show()

### Map offer views, completions, and transactions to each offer
* for each offer received, check if it was viewed and completed before expiration
* also check if transactions were made before expiration
    * for informational offers: if at least one transaction was made before expiration, mark this offer as completed
* We'd like to consider offers successful if transactions or offer completions were made for each offer received AND viewed (before they expired)

In [None]:
def check_for_event(row, event_type, transcript_df=transcript_df):
    # if looking for completed offers, first check if offer has been viewed
    if (event_type == 'offer_completed') and (np.isnan(row['offer_viewed'])):
        return np.nan
    # look for relevant events
    events = transcript_df[(transcript_df['person']==row['person']) & (transcript_df['time']>=row['time'])
                         & (transcript_df['time']<=row['end_time']) & (transcript_df['event']==event_type)]
    if len(events)>0:
        if event_type == 'transaction':
            # multiple transactions are possible in the duration of the offer - add event indexes to tuple
            return tuple(events.index.tolist())
        else:
            # return the event index of the relevant event
            return events.index[0]
    else:
        return np.nan

def create_offer_map(transcript_df=transcript_df, portfolio=portfolio):
    # get all offers received in transcript_df
    df = transcript_df.copy()
    df = df[df['event']=='offer_received']
    # add on offer data from portfolio df
    df = df.merge(portfolio, how='left', left_on='offer_id', right_on='id').rename(columns={'duration':'duration_days'})
    df.drop(columns=['id'], inplace=True)
    df['duration_hours'] = df['duration_days']*24
    df['end_time']  = df['time'] + df['duration_hours']

    for event in ['offer_viewed', 'transaction', 'offer_completed']:    # offer_viewed must be before offer_completed
        if event=='offer_received': continue
        print(f'Looking for event: {event}')
        df[event] = df.apply(check_for_event, args=[event], axis=1)
    
    return df

In [None]:
offer_map

In [None]:
# add transaction amounts to offer_map

def transactions_total(transaction_tuple):
    if np.isnan(transaction_tuple):
        return np.isnan
    total = 0
    for transaction_ix in transaction_tuple:
        try:
            total += transcript_df.loc[transaction_ix, 'amount']
        except:
            print(f'Amount for {transaction_ix} could not be added')
            continue
    return total

# get total spend for all offers that had related transactions
offer_map['transaction_total'] == offer_map['transaction'].apply(transactions_total)
offer_map

# save offer_map
offer_map.to_pickle(fp)

In [478]:
# if offer_map already exists, load it
fp = os.path.abspath('') + '/Data/offer_map.pkl'

try:
    offer_map = pd.read_pickle(fp)

# otherwise, create it
except:
    offer_map = create_offer_map()
    offer_map.to_pickle(fp)


offer_map

Unnamed: 0,person,event,time,offer_id,amount,reward_x,reward_y,channels,difficulty,duration_days,offer_type,offer_num,duration_hours,end_time,offer_viewed,transaction,offer_completed
0,78afa995795e4d85b5d9ceeca43f5fef,offer_received,0,9b98b8c7a33c4b65b9aebfe6a799e6d9,,,5,"[web, email, mobile]",5,7,bogo,4,168,168,15561.0,"(47582, 49502)",47583.0
1,a03223e636434f42ac4c3df47e8bac43,offer_received,0,0b1e1539f2cc45b7b9fa7c272da2e1d7,,,5,"[web, email]",20,10,discount,5,240,240,15562.0,"(90553,)",
2,e2127556f4f64592b11af22de27a7932,offer_received,0,2906b810c7d4411798c6938adc9daaa5,,,2,"[web, email, mobile]",10,7,discount,10,168,168,20283.0,,
3,8ec6ce2a7e7949b1bf142def7d0e0586,offer_received,0,fafdcd668e3743c1bb461111dcafc2a4,,,2,"[web, email, mobile, social]",10,10,discount,7,240,240,18067.0,,
4,68617ca6246f4fbc85e91a2a49552598,offer_received,0,4d5c57ea9a6940dd891ad53e9dbe8da0,,,10,"[web, email, mobile, social]",10,5,bogo,2,120,120,38221.0,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
495,3915674b75774fd8a5dccf50bc871d60,offer_received,0,2298d6c36e964ae4a3e7e9706d1fb8c2,,,3,"[web, email, mobile, social]",7,7,discount,6,168,168,15653.0,"(50491, 65974)",50492.0
496,614061b6bc56480f99271dd3b3402a69,offer_received,0,9b98b8c7a33c4b65b9aebfe6a799e6d9,,,5,"[web, email, mobile]",5,7,bogo,4,168,168,20356.0,"(24293, 65975)",24294.0
497,ddd92ad1b92f4a79be14e1067731a87d,offer_received,0,3f207df678b143eea3cee63160fa8bed,,,0,"[web, email, mobile]",0,4,informational,3,96,96,40826.0,,
498,2ad9a1463cca40ed994e2a0dde0ffb08,offer_received,0,ae264e3637204a6fb9bb56bc8210ddfd,,,10,"[email, mobile, social]",10,7,bogo,1,168,168,12758.0,"(35587, 42065)",


### Pairing Event Data to Profile Data
* Loop through each offer
    * Add columns to show events for each offer - one column each for:
        * \# offers received
        * \# of offers viewed during offer period
        * \# of offers completed during offer period
        * \# transactions made after receiving offer before expiration
            * total amount spent during offer periods

Keep this data in dataframe **profile_df**

In [None]:
def count_events(user_id, offer_id, col_type, offer_map=offer_map, portfolio=portfolio):
    user_offers = offer_map[(offer_map['person']==user_id) & (offer_map['offer_id']==offer_id)]
    if len(user_offers) == 0:
        return 0
    if col_type == 'transaction_total':
        return user_offers[col_type].sum()
    if col_type == 'offer_received':
        return len(user_offers)
    else:
        return user_offers[col_type].count()

In [None]:
dff = profile[:10].copy()

# add columns for 
for i in range(len(portfolio)):
    offer_id = portfolio.loc[i, 'id']
    columns_to_add = event_types
    columns_to_add.append('transaction_total')
    for col_type in columns_to_add:
        offer_num = portfolio.loc[i, 'offer_num']
        col_name = f'{offer_num}_{col_type}'
        # add column with data for this offer and event for each customer
        dff[col_name] = dff['id'].apply(count_events, args=[offer_id, col_type,])


# add customer's total spend 
dff

In [15]:
def get_event_times(person_id, event_time_df):
    try:
        return tuple(event_time_df[event_time_df['person']==person_id]['time'].tolist())
    except:
        return None
    
def add_event(event_type, offer_id = None, offer_num=None, profile=profile, transcript=transcript_df):
    
    event_time = transcript[(transcript['event']==event_type)&(transcript['offer_id']==offer_id)][['person', 'time']]
    col_name = f'{offer_num}_{event_type}'
    profile[col_name] = profile['id'].apply(get_event_times, args=[event_time])
    
    return profile

In [39]:
def add_events_to_profile_df(profile=profile, porfolio=portfolio, transcript_df=transcript_df, event_types=event_types):
    profile_df = profile.copy()
    
    # add offer events (offer received/viewed/completed), non-transaction 
    for i in range(len(portfolio)):
        print(f'\nAdding offer number:', portfolio.loc[i, 'offer_num'])
        for event_type in event_types:

            if event_type == 'transaction':
                continue
            
            profile_df = add_event(event_type, portfolio.loc[i, 'id'], portfolio.loc[i, 'offer_num'], profile_df, transcript_df)

    # add transactions
    print('Adding transactions')
    transactions = transcript_df[transcript_df['event']=='transaction']
    profile_df['transaction_time'] = profile_df['id'].apply(get_event_times, args=([transactions]))
    
    return profile_df

In [340]:
# if profile_df already exists, load it
fp = os.path.abspath('') + '/Data/profile_df.pkl'
try:
    profile_df = pd.read_pickle(fp)
# otherwise, create it
except:
    profile_df = add_events_to_profile_df()
    profile_df.to_pickle(fp)

profile_df

Unnamed: 0,gender,age,id,became_member_on,income,1_offer_received,1_offer_viewed,1_offer_completed,2_offer_received,2_offer_viewed,...,8_offer_received,8_offer_viewed,8_offer_completed,9_offer_received,9_offer_viewed,9_offer_completed,10_offer_received,10_offer_viewed,10_offer_completed,transaction_time
0,,118,68be06ca386d4c31939f3a4f0e3dd783,20170212,,(),(),(),(),(),...,(),(),(),(),(),(),"(168,)","(216,)",(),"(360, 414, 444, 510, 534, 552, 606, 630, 696)"
1,F,55,0610b486422d4921ae7d2bf64640c50b,20170715,112000.0,(),(),(),(),(),...,(),(),(),(),(),(),(),(),(),"(18, 144, 528)"
2,,118,38fe809add3b4fcf9315a9694bb96ff5,20180712,,(),(),(),(),(),...,"(576,)","(666,)",(),(),(),(),(),(),(),"(132, 348, 450, 474, 636, 696)"
3,F,75,78afa995795e4d85b5d9ceeca43f5fef,20170509,100000.0,"(408,)","(408,)","(510,)",(),(),...,"(168,)","(216,)",(),"(504,)","(582,)","(510,)",(),(),(),"(132, 144, 222, 240, 378, 510, 534)"
4,,118,a03223e636434f42ac4c3df47e8bac43,20170804,,(),(),(),(),(),...,"(408,)",(),(),(),(),(),(),(),(),"(234, 264, 612)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16995,F,45,6d5f3a774f3d4714ab0c092238f3a1d7,20180604,54000.0,(),(),(),"(336,)","(402,)",...,"(408,)","(462,)",(),(),(),(),(),(),(),"(60, 84, 192, 246, 324, 642, 690)"
16996,M,61,2cb4f97358b841b9a9773a7aa05a9d77,20180713,72000.0,(),(),(),(),(),...,"(0,)","(42,)",(),(),(),(),(),(),(),"(126, 270, 276, 354, 390, 420, 654)"
16997,M,49,01d26f638c274aa0b965d24cefe3183f,20170126,73000.0,(),(),(),(),(),...,"(336,)","(396,)",(),(),(),(),(),(),(),"(204, 372, 378, 396, 402, 480, 576, 672)"
16998,F,83,9dc1421481194dcd9400aec7c9ae6366,20160307,50000.0,"(576,)","(624,)","(594,)","(336,)","(342,)",...,(),(),(),(),(),(),(),(),(),"(24, 150, 228, 348, 360, 372, 414, 426, 504, 5..."


In [336]:
offer_map.copy()

Unnamed: 0,person,event,time,offer_id,amount,reward_x,reward_y,channels,difficulty,duration,offer_type,offer_num,end_time,offer_viewed,transaction,offer_completed
0,78afa995795e4d85b5d9ceeca43f5fef,offer_received,0,9b98b8c7a33c4b65b9aebfe6a799e6d9,,,5,"[web, email, mobile]",5,7,bogo,4,7,15561.0,,
1,a03223e636434f42ac4c3df47e8bac43,offer_received,0,0b1e1539f2cc45b7b9fa7c272da2e1d7,,,5,"[web, email]",20,10,discount,5,10,15562.0,,
2,e2127556f4f64592b11af22de27a7932,offer_received,0,2906b810c7d4411798c6938adc9daaa5,,,2,"[web, email, mobile]",10,7,discount,10,7,,,
3,8ec6ce2a7e7949b1bf142def7d0e0586,offer_received,0,fafdcd668e3743c1bb461111dcafc2a4,,,2,"[web, email, mobile, social]",10,10,discount,7,10,,,
4,68617ca6246f4fbc85e91a2a49552598,offer_received,0,4d5c57ea9a6940dd891ad53e9dbe8da0,,,10,"[web, email, mobile, social]",10,5,bogo,2,5,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
76272,d087c473b4d247ccb0abfef59ba12b0e,offer_received,576,ae264e3637204a6fb9bb56bc8210ddfd,,,10,"[email, mobile, social]",10,7,bogo,1,583,,,
76273,cb23b66c56f64b109d673d5e56574529,offer_received,576,2906b810c7d4411798c6938adc9daaa5,,,2,"[web, email, mobile]",10,7,discount,10,583,,,
76274,6d5f3a774f3d4714ab0c092238f3a1d7,offer_received,576,2298d6c36e964ae4a3e7e9706d1fb8c2,,,3,"[web, email, mobile, social]",7,7,discount,6,583,,,
76275,9dc1421481194dcd9400aec7c9ae6366,offer_received,576,ae264e3637204a6fb9bb56bc8210ddfd,,,10,"[email, mobile, social]",10,7,bogo,1,583,,,


In [438]:
def count_events(user_id, offer_id, event_type, offer_map=offer_map, portfolio=portfolio):
    user_offers = offer_map[(offer_map['person']==user_id) & (offer_map['offer_id']==offer_id)]
    if len(user_offers) == 0:
        return 0
    if event_type == 'offer_received':
        return len(user_offers)
    else:
        return user_offers[event_type].count()

In [440]:
dff = profile[:10].copy()

# iterate through each customer
# response = dff.apply(count_events, axis=1)
    


for i in range(len(portfolio)):
    offer_id = portfolio.loc[i, 'id']
    for event_type in event_types:
        offer_num = portfolio.loc[i, 'offer_num']
        col_name = f'{offer_num}_{event_type}'
        # add column name
        dff[col_name] = dff['id'].apply(count_events, args=[offer_id, event_type,])

# for event in event_types:

#     dff.apply(count_events, axis=1)



dff

Unnamed: 0,gender,age,id,became_member_on,income,1_offer_received,1_offer_viewed,1_transaction,1_offer_completed,2_offer_received,...,8_transaction,8_offer_completed,9_offer_received,9_offer_viewed,9_transaction,9_offer_completed,10_offer_received,10_offer_viewed,10_transaction,10_offer_completed
0,,118,68be06ca386d4c31939f3a4f0e3dd783,20170212,,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
1,F,55,0610b486422d4921ae7d2bf64640c50b,20170715,112000.0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,,118,38fe809add3b4fcf9315a9694bb96ff5,20180712,,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,F,75,78afa995795e4d85b5d9ceeca43f5fef,20170509,100000.0,1,1,0,0,0,...,0,0,1,0,0,0,0,0,0,0
4,,118,a03223e636434f42ac4c3df47e8bac43,20170804,,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,M,68,e2127556f4f64592b11af22de27a7932,20180426,70000.0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
6,,118,8ec6ce2a7e7949b1bf142def7d0e0586,20170925,,0,0,0,0,1,...,0,0,0,0,0,0,1,0,0,0
7,,118,68617ca6246f4fbc85e91a2a49552598,20171002,,1,0,0,0,1,...,0,0,0,0,0,0,1,0,0,0
8,M,65,389bc3fa690240e798340f5a15918d5c,20180209,53000.0,0,0,0,0,0,...,0,0,2,2,0,0,1,0,0,0
9,,118,8974fc5686fe429db53ddde067b88302,20161122,,0,0,0,0,1,...,0,0,1,0,0,0,0,0,0,0


In [442]:
dff.sum()

age                                                                 971
id                    68be06ca386d4c31939f3a4f0e3dd7830610b486422d49...
became_member_on                                              201726636
income                                                           335000
1_offer_received                                                      2
1_offer_viewed                                                        1
1_transaction                                                         0
1_offer_completed                                                     0
2_offer_received                                                      3
2_offer_viewed                                                        0
2_transaction                                                         0
2_offer_completed                                                     0
3_offer_received                                                      4
3_offer_viewed                                                  

In [452]:
completed_customers = set(transcript_df[transcript_df['event']=='offer_completed']['person'].tolist())
profile[profile['id'].isin(completed_customers)]


Unnamed: 0,gender,age,id,became_member_on,income
0,,118,68be06ca386d4c31939f3a4f0e3dd783,20170212,
1,F,55,0610b486422d4921ae7d2bf64640c50b,20170715,112000.0
3,F,75,78afa995795e4d85b5d9ceeca43f5fef,20170509,100000.0
5,M,68,e2127556f4f64592b11af22de27a7932,20180426,70000.0
8,M,65,389bc3fa690240e798340f5a15918d5c,20180209,53000.0
...,...,...,...,...,...
16990,F,70,79edb810789c447e8d212a324b44cc16,20160310,39000.0
16993,M,60,cb23b66c56f64b109d673d5e56574529,20180505,113000.0
16996,M,61,2cb4f97358b841b9a9773a7aa05a9d77,20180713,72000.0
16998,F,83,9dc1421481194dcd9400aec7c9ae6366,20160307,50000.0


In [462]:
transcript[(transcript['event']=='offer completed') & (transcript['person']=='68be06ca386d4c31939f3a4f0e3dd783')]

Unnamed: 0,person,event,value,time
237365,68be06ca386d4c31939f3a4f0e3dd783,offer completed,{'offer_id': 'fafdcd668e3743c1bb461111dcafc2a4...,552
237366,68be06ca386d4c31939f3a4f0e3dd783,offer completed,{'offer_id': '2298d6c36e964ae4a3e7e9706d1fb8c2...,552


In [454]:
offer_map[offer_map['person']=='68be06ca386d4c31939f3a4f0e3dd783']

Unnamed: 0,person,event,time,offer_id,amount,reward_x,reward_y,channels,difficulty,duration,offer_type,offer_num,end_time,offer_viewed,transaction,offer_completed
12650,68be06ca386d4c31939f3a4f0e3dd783,offer_received,168,2906b810c7d4411798c6938adc9daaa5,,,2,"[web, email, mobile]",10,7,discount,10,175,,,
25319,68be06ca386d4c31939f3a4f0e3dd783,offer_received,336,0b1e1539f2cc45b7b9fa7c272da2e1d7,,,5,"[web, email]",20,10,discount,5,346,,,
38030,68be06ca386d4c31939f3a4f0e3dd783,offer_received,408,fafdcd668e3743c1bb461111dcafc2a4,,,2,"[web, email, mobile, social]",10,10,discount,7,418,163374.0,"(167626,)",
50808,68be06ca386d4c31939f3a4f0e3dd783,offer_received,504,2298d6c36e964ae4a3e7e9706d1fb8c2,,,3,"[web, email, mobile, social]",7,7,discount,6,511,214274.0,"(218392,)",
63512,68be06ca386d4c31939f3a4f0e3dd783,offer_received,576,fafdcd668e3743c1bb461111dcafc2a4,,,2,"[web, email, mobile, social]",10,10,discount,7,586,262137.0,,


In [458]:
portfolio

Unnamed: 0,reward,channels,difficulty,duration,offer_type,id,offer_num
0,10,"[email, mobile, social]",10,7,bogo,ae264e3637204a6fb9bb56bc8210ddfd,1
1,10,"[web, email, mobile, social]",10,5,bogo,4d5c57ea9a6940dd891ad53e9dbe8da0,2
2,0,"[web, email, mobile]",0,4,informational,3f207df678b143eea3cee63160fa8bed,3
3,5,"[web, email, mobile]",5,7,bogo,9b98b8c7a33c4b65b9aebfe6a799e6d9,4
4,5,"[web, email]",20,10,discount,0b1e1539f2cc45b7b9fa7c272da2e1d7,5
5,3,"[web, email, mobile, social]",7,7,discount,2298d6c36e964ae4a3e7e9706d1fb8c2,6
6,2,"[web, email, mobile, social]",10,10,discount,fafdcd668e3743c1bb461111dcafc2a4,7
7,0,"[email, mobile, social]",0,3,informational,5a8bc65990b245e5a138643cd4eb9837,8
8,5,"[web, email, mobile, social]",5,5,bogo,f19421c1d4aa40978ebb69ca19b0e20d,9
9,2,"[web, email, mobile]",10,7,discount,2906b810c7d4411798c6938adc9daaa5,10


In [157]:
def add_events_to_profile_df_tmp(profile=profile, porfolio=portfolio, transcript_df=transcript_df, event_types=event_types):
    profile_df = profile.copy()
    
    # loop through each user
    for i in range(len(profile[:1])):
        user_id = profile.loc[i, 'id']
        
        # get all offers for this user:
        received = transcript_df[(transcript_df['person']==user_id)  & (transcript_df['event']=='offer_received')]
        received['']



#             if event_type == 'transaction':
#                 continue
            
#             profile_df = add_event(event_type, portfolio.loc[i, 'id'], portfolio.loc[i, 'offer_num'], profile_df, transcript_df)

#     # add transactions
#     print('Adding transactions')
#     transactions = transcript_df[transcript_df['event']=='transaction']
#     profile_df['transaction_time'] = profile_df['id'].apply(get_event_times, args=([transactions]))
    
#     return profile_df
add_events_to_profile_df_tmp()

                                  person           event  \
53174   68be06ca386d4c31939f3a4f0e3dd783  offer_received   
110828  68be06ca386d4c31939f3a4f0e3dd783  offer_received   
150596  68be06ca386d4c31939f3a4f0e3dd783  offer_received   
201570  68be06ca386d4c31939f3a4f0e3dd783  offer_received   
245122  68be06ca386d4c31939f3a4f0e3dd783  offer_received   

                                                   value  time  \
53174   {'offer_id': '2906b810c7d4411798c6938adc9daaa5'}   168   
110828  {'offer_id': '0b1e1539f2cc45b7b9fa7c272da2e1d7'}   336   
150596  {'offer_id': 'fafdcd668e3743c1bb461111dcafc2a4'}   408   
201570  {'offer_id': '2298d6c36e964ae4a3e7e9706d1fb8c2'}   504   
245122  {'offer_id': 'fafdcd668e3743c1bb461111dcafc2a4'}   576   

                                offer_id  amount  reward  
53174   2906b810c7d4411798c6938adc9daaa5     NaN     NaN  
110828  0b1e1539f2cc45b7b9fa7c272da2e1d7     NaN     NaN  
150596  fafdcd668e3743c1bb461111dcafc2a4     NaN     NaN  
201570

In [144]:
unique_viewed = transcript_df[(transcript_df['offer_id']=='0b1e1539f2cc45b7b9fa7c272da2e1d7')&(transcript_df['event']=='offer_viewed')]['person'].unique().tolist()
unique_completed = transcript_df[(transcript_df['offer_id']=='0b1e1539f2cc45b7b9fa7c272da2e1d7')&(transcript_df['event']=='offer_completed')]['person'].unique().tolist()
print(len(np.setdiff1d(unique_viewed, unique_completed)), 'people have viewed offer 5 but did not complete it')
print(len(np.setdiff1d(unique_completed, unique_viewed)), 'people have completed offer 5 but did not view it')

928 people have viewed offer 5 but did not complete it
1506 people have completed offer 5 but did not view it


Unnamed: 0,person,event,time,offer_id,amount,reward_x,reward_y,channels,difficulty,duration,offer_type,offer_num,end_time,offer_viewed,transaction,offer_completed
0,78afa995795e4d85b5d9ceeca43f5fef,offer_received,0,9b98b8c7a33c4b65b9aebfe6a799e6d9,,,5,"[web, email, mobile]",5,7,bogo,4,7,15561.0,,
1,a03223e636434f42ac4c3df47e8bac43,offer_received,0,0b1e1539f2cc45b7b9fa7c272da2e1d7,,,5,"[web, email]",20,10,discount,5,10,15562.0,,
2,e2127556f4f64592b11af22de27a7932,offer_received,0,2906b810c7d4411798c6938adc9daaa5,,,2,"[web, email, mobile]",10,7,discount,10,7,,,
3,8ec6ce2a7e7949b1bf142def7d0e0586,offer_received,0,fafdcd668e3743c1bb461111dcafc2a4,,,2,"[web, email, mobile, social]",10,10,discount,7,10,,,
4,68617ca6246f4fbc85e91a2a49552598,offer_received,0,4d5c57ea9a6940dd891ad53e9dbe8da0,,,10,"[web, email, mobile, social]",10,5,bogo,2,5,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
76272,d087c473b4d247ccb0abfef59ba12b0e,offer_received,576,ae264e3637204a6fb9bb56bc8210ddfd,,,10,"[email, mobile, social]",10,7,bogo,1,583,,,
76273,cb23b66c56f64b109d673d5e56574529,offer_received,576,2906b810c7d4411798c6938adc9daaa5,,,2,"[web, email, mobile]",10,7,discount,10,583,,,
76274,6d5f3a774f3d4714ab0c092238f3a1d7,offer_received,576,2298d6c36e964ae4a3e7e9706d1fb8c2,,,3,"[web, email, mobile, social]",7,7,discount,6,583,,,
76275,9dc1421481194dcd9400aec7c9ae6366,offer_received,576,ae264e3637204a6fb9bb56bc8210ddfd,,,10,"[email, mobile, social]",10,7,bogo,1,583,,,


In [296]:
offer_map[offer_map['offer_completed']>1]

Unnamed: 0,person,event,time,offer_id,amount,reward_x,reward_y,channels,difficulty,duration,offer_type,offer_num,end_time,offer_viewed,transaction,offer_completed
35,676506bad68e4161b9bbaffeb039626b,offer_received,0,ae264e3637204a6fb9bb56bc8210ddfd,,,10,"[email, mobile, social]",10,7,bogo,1,7,12690.0,"(12691,)",12692.0
36,9fa9ae8f57894cc9a3b8a9bbe0fc1b2f,offer_received,0,2906b810c7d4411798c6938adc9daaa5,,,2,"[web, email, mobile]",10,7,discount,10,7,12656.0,"(12657,)",12658.0
107,fe97aa22dd3e48c8b143116a8403dd52,offer_received,0,fafdcd668e3743c1bb461111dcafc2a4,,,2,"[web, email, mobile, social]",10,10,discount,7,10,15578.0,"(12671,)",12672.0
125,629fc02d56414d91bca360decdfa9288,offer_received,0,9b98b8c7a33c4b65b9aebfe6a799e6d9,,,5,"[web, email, mobile]",5,7,bogo,4,7,12677.0,"(12678,)",12679.0
207,62fb072537a647a89f99fb9ea66a7c00,offer_received,0,fafdcd668e3743c1bb461111dcafc2a4,,,2,"[web, email, mobile, social]",10,10,discount,7,10,12688.0,"(15600,)",15601.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
76062,4d116a23b9af4885834a3e717a5fa397,offer_received,576,ae264e3637204a6fb9bb56bc8210ddfd,,,10,"[email, mobile, social]",10,7,bogo,1,583,262071.0,"(265559,)",265560.0
76063,570f7cc3a63249d9b295d5fb8a7c1d73,offer_received,576,2298d6c36e964ae4a3e7e9706d1fb8c2,,,3,"[web, email, mobile, social]",7,7,discount,6,583,262072.0,"(265561,)",265562.0
76087,9a30ad4712a341e5be025cf0f96d7ee3,offer_received,576,2906b810c7d4411798c6938adc9daaa5,,,2,"[web, email, mobile]",10,7,discount,10,583,262079.0,"(265566,)",265567.0
76234,986ed6f6ef6a4a4eb34b41f06507e88a,offer_received,576,2298d6c36e964ae4a3e7e9706d1fb8c2,,,3,"[web, email, mobile, social]",7,7,discount,6,583,260148.0,"(263920,)",263921.0


In [494]:
(offer_map[['difficulty', 'reward_y']]>0) * 1
# offer_map[['difficulty', 'reward_y']]

Unnamed: 0,difficulty,reward_y
0,1,1
1,1,1
2,1,1
3,1,1
4,1,1
...,...,...
495,1,1
496,1,1
497,0,0
498,1,1


# TODO - For each user in 'profile', add columns:
# * Add # times that they received each offer
# * Add the number of times that they had viewed and completed each offer
# * Add the number of times that they made a transaction within the offer period

Unnamed: 0,gender,age,id,became_member_on,income


# Scenario #1 - Make offers to users expected to complete at least one
* Order data as users (row) and offers (columns)
* Identify all users that 'viewed+completed' at least one offer during the test period as good results
    * count users that had 'viewed + made transaction' 
* All users with good results for an offer should have a 1; else 0

In [500]:
transcript_df.index.dtype

dtype('int64')

In [504]:
transcript_df[200000:300000][transcript_df['event']=='offer_received']


Boolean Series key will be reindexed to match DataFrame index.



Unnamed: 0,person,event,time,offer_id,amount,reward
201570,68be06ca386d4c31939f3a4f0e3dd783,offer_received,504,2298d6c36e964ae4a3e7e9706d1fb8c2,,
201571,0610b486422d4921ae7d2bf64640c50b,offer_received,504,3f207df678b143eea3cee63160fa8bed,,
201572,78afa995795e4d85b5d9ceeca43f5fef,offer_received,504,f19421c1d4aa40978ebb69ca19b0e20d,,
201573,a03223e636434f42ac4c3df47e8bac43,offer_received,504,0b1e1539f2cc45b7b9fa7c272da2e1d7,,
201574,e2127556f4f64592b11af22de27a7932,offer_received,504,fafdcd668e3743c1bb461111dcafc2a4,,
...,...,...,...,...,...,...
257882,d087c473b4d247ccb0abfef59ba12b0e,offer_received,576,ae264e3637204a6fb9bb56bc8210ddfd,,
257883,cb23b66c56f64b109d673d5e56574529,offer_received,576,2906b810c7d4411798c6938adc9daaa5,,
257884,6d5f3a774f3d4714ab0c092238f3a1d7,offer_received,576,2298d6c36e964ae4a3e7e9706d1fb8c2,,
257885,9dc1421481194dcd9400aec7c9ae6366,offer_received,576,ae264e3637204a6fb9bb56bc8210ddfd,,


# Scenario #2 - Make offers to users expected to make 1+ transaction during offer period
* Order data as users (row) and offers (columns)
* Identify all users that have 'viewed + made at least one transaction' for an offer during the offer period as good results
* Reasoning: Viewing may incentivize users to make purchases, even if they do not complete the offer. We want to identify and encourage this behavior as well
* All users with good results for an offer should have a 1; else 0

In [None]:
scenario 1 - if offer completed in allotted time after viewed, 1. else 0
    cleanup:  informational offers need to check if transactions come within the allotted time afterward.
if time: scenario 2 -  if offer viewed and transaction, 1. else 0.
    need to check whether the transaction came after

#### save cleaned profile_df dataframe

In [None]:
1 - if offer received and offer viewed and offer completed within time
0 - if beforehand transaction and offer received and offer viewed
0 - if offer received and not offer viewed and *afterwards* transaction

In [19]:
offer_map.to_pickle(os.path.abspath('')+'/Data/offer_map_noAccidentalCompletions')

# Project Definition

##   Project Overview

Background information such as the problem domain, the project origin, and related data sets or input data is provided.

### Project Origin
Starbucks Capstone Challenge - combining transaction, demographic, and offer data to determine which demographic group responds best to which offer type.

### Data Sets
portfolio.json - containing offer ids and meta data about each offer (duration, type, etc.)
profile.json - demographic data for each customer
transcript.json - records for transactions, offers received, offers viewed, and offers completed

## Problem Statement
The problem which needs to be solved is clearly defined. A strategy for solving the problem, including discussion of the expected solution, has been made.

#### Demographics are not clearly defined
To group customers by demographic, i used K-Means Clustering to group customers into a number of clusters. 

* For more information on k-means clustering: https://towardsdatascience.com/understanding-k-means-clustering-in-machine-learning-6a6e67336aa1

#### Offer data is not uniform for each user
Users do not receive the same offers or the same number of total offers, and not all users will receive all offers.
Additionally, users can receive more than one of each offer, and may respond differently to the same offer given multiple times. Finally, not all completed offers are viewed - which indicates that the incentive was unnecessary.

To check if each offer was viewed before completed, I checked the transcript of events during which each offer was active for events showing that the offer was viewed or completed. This was used to create a dataset mapping each offer received to further interactions with the offer before it expired.

To understand how different demographics responded to each offer, I used the aforementioned clustering method on user profile data, then analyzed how likely each cluster was to complete an offer after viewing it. 

#### Since this challenge assumes one product, transaction data cannot be mapped to an offer
Multiple offers can be active at the same time, and can be completed out of order. This makes it difficult to attribute any transaction revenue to a single offer. 

## Metrics

Based on this analysis, I determined a set of rules stating which demographic clusters should receive each offer.

To measure the effectiveness of demographic clustering, I compared the following metrics for each offer in the sample data to the same metrics based on the rules determined by analyzing each cluster's interactions with the offer

#### (modified) Incremental response rate: 
*# purchased in offer group / # in offer group) - (# purchased in control group / # in control group)*
    * The 'control' group: all users who received the offer but did not view it
    * The 'offer' group: all users who received and viewed the offer


We'd like to maximize this metric:
    * We'd like to maximize the number of users that respond to this offer by completing it
    * We'd also like to minimize the number of users that receive the award without responding to the offer.  



#### (modified) Net incremental revenue per offer received:
*(# offers completed after viewed * (offer difficulty - offer reward)) - (# offers completed without viewed) / (# offers received)*
    * The offer difficulty is the amount needed to be spent to complete this offer
        * Since transactions cannot be mapped directly to each offer, the offer difficulty is used as a proxy for the amount spent while a customer is influenced by an offer. 
        
We'd like to maximize this metric.
    * Normalizing for number of offers received is necessary since using the offer 'rules' would reduce the number of offers given out
    * Note: this metrics does not work for for informational offers

# Analysis

## Data Exploration

#### Original Dataset  Definitions

**portfolio.json**
* id (string) - offer id
* offer_type (string) - type of offer ie BOGO, discount, informational
* difficulty (int) - minimum required spend to complete an offer
* reward (int) - reward given for completing an offer
* duration (int) - time for offer to be open, in days
* channels (list of strings)


**profile.json**
* age (int) - age of the customer
* became_member_on (int) - date when customer created an app account
* gender (str) - gender of the customer (note some entries contain 'O' for other rather than M or F)
* id (str) - customer id
* income (float) - customer's income


**transcript.json**
* event (str) - record description (ie transaction, offer received, offer viewed, etc.)
* person (str) - customer id
* time (int) - time in hours since start of test. The data begins at time t=0
* value - (dict of strings) - either an offer id or transaction amount depending on the record

#### Data Characteristics and Visualizations

There are 4 different event types that are logged in the transcript data:
* offer received: When the user received the offer. The user does not have control over this, and may have no knowledge of this eventm
* offer viewed: When the user viewed the offer. 
* offer completed: When the 'difficulty' level of this offer was reached with a transaction, recorded separately
* transaction: When the user made a transaction.

In [None]:
# note - show event types


event_type_df_lengths = [len(offer_received), len(offer_viewed), len(transactions), len(offer_completed)]
data = [
    go.Bar(x=event_types, y=event_type_df_lengths)
]
layout={
    'title':'Number of Rows for each Event Type'
}
fig = go.Figure(data, layout)
fig.show()

In [6]:
offer_map = pd.read_pickle(os.path.abspath('') + '/Data/offer_map.pkl')

In [10]:
transcript[transcript['person'] == profile.loc[0, 'id']]

Unnamed: 0,person,event,value,time
53174,68be06ca386d4c31939f3a4f0e3dd783,offer received,{'offer id': '2906b810c7d4411798c6938adc9daaa5'},168
85290,68be06ca386d4c31939f3a4f0e3dd783,offer viewed,{'offer id': '2906b810c7d4411798c6938adc9daaa5'},216
110828,68be06ca386d4c31939f3a4f0e3dd783,offer received,{'offer id': '0b1e1539f2cc45b7b9fa7c272da2e1d7'},336
130147,68be06ca386d4c31939f3a4f0e3dd783,offer viewed,{'offer id': '0b1e1539f2cc45b7b9fa7c272da2e1d7'},348
135224,68be06ca386d4c31939f3a4f0e3dd783,transaction,{'amount': 0.35000000000000003},360
150596,68be06ca386d4c31939f3a4f0e3dd783,offer received,{'offer id': 'fafdcd668e3743c1bb461111dcafc2a4'},408
163374,68be06ca386d4c31939f3a4f0e3dd783,offer viewed,{'offer id': 'fafdcd668e3743c1bb461111dcafc2a4'},408
167626,68be06ca386d4c31939f3a4f0e3dd783,transaction,{'amount': 0.74},414
182544,68be06ca386d4c31939f3a4f0e3dd783,transaction,{'amount': 1.8900000000000001},444
201570,68be06ca386d4c31939f3a4f0e3dd783,offer received,{'offer id': '2298d6c36e964ae4a3e7e9706d1fb8c2'},504
