**Customer Life Time Value Prediction by Using BG-NBD**

* Beta Geometric / Negative Binomial Distribution known as BG-NBD Model. 
* Also sometimes it comes up as “Buy Till You Die”. 
* It gives us the conditional expected number of transactions in the next period.

In [1]:
!pip install lifetimes
!pip install openpyxl

In [2]:
import pandas as pd
import datetime as dt
#Lifetimes can be used to analyze users based on alive and die
from lifetimes import BetaGeoFitter
from lifetimes import GammaGammaFitter
from lifetimes.plotting import plot_period_transactions

In [3]:
#The dataset contains data between 2009 and 2011. We’ll use the 2010–2011 sheet.
data = pd.read_excel('../input/uci-online-retail-ii-data-set/online_retail_II.xlsx',sheet_name='Year 2010-2011')
data.head()

In [4]:
data.isnull().sum()

In [5]:
#Drop null values
data.dropna(inplace=True)

In [6]:
data = data[data["Quantity"] > 0]
#To find out how much money has been earned, "quantity * price" must be made
data['TotalPrice'] = data['Price'] * data['Quantity']
data.head()

* Recency (R) of a customer: Days since the last purchase.
* Frequency (F) of the bookings/turnover of a customer: Number of purchases, e.g., in 6 months.
* Monetary (M) - The total turnover of a customer: Sum of sales, e.g., in 6 months.
* T is equal to the duration between a customer’s first purchase and the end of the period under study.

In [7]:
today_date = dt.datetime(2012, 1, 1)

cltv = data.groupby('Customer ID').agg({
    'InvoiceDate': [
        lambda x: (x.max() - x.min()).days,  #recency
        lambda x: (today_date - x.min()).days  #T
    ],
    'Invoice': lambda x: x.nunique(),  #frequency
    'TotalPrice': lambda x: x.sum()  #monetary
})

cltv.columns = cltv.columns.droplevel(0)
cltv.columns = ['recency', 'T', 'frequency', 'monetary']
cltv = cltv[cltv['monetary'] > 0]

#Monetary value is average earning per transaction
cltv['monetary'] = cltv['monetary'] / cltv['frequency']

#Transforming days to weeks
cltv['recency'] = cltv['recency'] / 7
cltv['T'] = cltv['T'] / 7

cltv = cltv[(cltv['frequency'] > 1)]

* Completely dependent on RFM variable for effective modeling

In [8]:
cltv

In [9]:
#For small samples sizes, the parameters can get implausibly large, so by adding an l2 penalty the likelihood, we can control how large these parameters can be.
bg_model = BetaGeoFitter(penalizer_coef=0.001)
bg_model.fit(cltv['frequency'], cltv['recency'], cltv['T'])

In [10]:
#Top 10 customers expected to make the most purchases in a week
bg_model.conditional_expected_number_of_purchases_up_to_time(1, #One Week
                                                        cltv['frequency'],
                                                        cltv['recency'],
                                                        cltv['T']).sort_values(ascending=False).head(10)

In [11]:
#The count of total transactions expected in next 4 months
bg_model.conditional_expected_number_of_purchases_up_to_time(4, #One Month
                                                        cltv['frequency'],
                                                        cltv['recency'],
                                                        cltv['T']).sum()

* Not very effective for long term prediction

In [12]:
plot_period_transactions(bg_model);

Plots a histogram and returns a matrix comparing the actual and expected number of customers who made a certain number of repeat transactions in the calibration period

In [13]:
ggf_model = GammaGammaFitter(penalizer_coef=0.01)
ggf_model.fit(cltv['frequency'], cltv['monetary'])

In [14]:
#The top 10 customers expected to be most valuable
ggf_model.conditional_expected_average_profit(cltv['frequency'],cltv['monetary']).sort_values(ascending=False).head(10)

In [15]:
#The customers' lifetime values expected to in the next 3 months
cltv['cltv_pred_3_months'] = ggf_model.customer_lifetime_value(bg_model,
                                   cltv['frequency'],
                                   cltv['recency'],
                                   cltv['T'],
                                   cltv['monetary'],
                                   time=3,  #3 months
                                   freq="W",  #frequency information of T. In this case we set week by using 'W'
                                   discount_rate=0.01)
cltv.head()