<a href="https://colab.research.google.com/github/xmpuspus/Lectures/blob/master/notebooks/IntroCustomerLifetimeValue.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Customer Lifetime Value
Compute CLV from RFMA

We follow Starbucks' method of calculating CLV.


### Import Packages

In [0]:
!pip install xlrd
# import package
import pandas as pd
import datetime
import numpy as np

# suppress error warnings
import warnings
warnings.filterwarnings('ignore')



### Read Data

In [0]:
# load dataset
data = pd.read_excel('http://archive.ics.uci.edu/ml/machine-learning-databases/00352/Online%20Retail.xlsx')

# Define Sales Column
data['Sales'] = data['Quantity'] * data['UnitPrice']
data.head()




Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,Sales
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850.0,United Kingdom,15.3
1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom,20.34
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850.0,United Kingdom,22.0
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom,20.34
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom,20.34


### Create RFMA Segments

These are the the only 4 things we need to build our RFMA segments:
1. *customers* : feature that specifies your users,
2. *dates* : dates of transactions
3. *transactions* : transaction number
4. *prices* : price of amount sold

In [0]:
customers = 'CustomerID'
dates = 'InvoiceDate'
transactions = 'InvoiceNo'
prices = 'Sales'

In [0]:
data = data[data['Country'] == "United Kingdom"]

NOW = datetime.datetime.now()

data['transaction_date'] = data[dates]

rfmaTable = data.groupby(customers).agg({dates: lambda x: (NOW - x.max()).days, transactions: lambda x: len(x), prices: lambda x: x.sum(), 'transaction_date': lambda x: (x.max() - x.min()).days})
rfmaTable[dates] = rfmaTable[dates].astype(int)
rfmaTable.rename(columns={dates: 'recency', 
                         transactions: 'frequency', 
                         prices: 'monetary_value',
                        'transaction_date': 'age'}, inplace=True)

In [0]:
rfmaTable.head()

Unnamed: 0_level_0,recency,frequency,monetary_value,age
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
12346.0,2858,2,0.0,0
12747.0,2535,103,4196.01,366
12748.0,2533,4642,29072.1,372
12749.0,2536,231,3868.2,209
12820.0,2536,59,942.34,323


### CLV from RFM   
Compute CLV by computing the monetary spend divided by the number of days the customer has purchased from us.

In [0]:
rfmaTable['CLV'] = 365 * (rfmaTable['monetary_value'] / rfmaTable['age'])

In [0]:
rfmaTable.head()

Unnamed: 0_level_0,recency,frequency,monetary_value,age,CLV
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
12346.0,2858,2,0.0,0,
12747.0,2535,103,4196.01,366,4184.545492
12748.0,2533,4642,29072.1,372,28525.044355
12749.0,2536,231,3868.2,209,6755.4689
12820.0,2536,59,942.34,323,1064.873375


Average CLV

In [0]:
rfmaTable['CLV'].replace([np.inf, -np.inf], np.nan).dropna().mean()

7801.51890516493

Typically, this is going to be the ceiling of your customer acquisition cost for the following years.