Question: will issue refunds for late order affect customers lifetime value (LTV)
reference: https://colab.research.google.com/drive/1QHi9egj3uXEqcD7_EtRxOVnxM26IzKAC?usp=sharing#scrollTo=0771d00e-1342-4d1f-888a-4d9e24263323



Model:
$
Y_i = \beta_0+\beta_1(R_i-c)+\beta_2 1_{R_i>=c}+\beta_3(R_i-c) 1_{R_i>=c}+\epsilon_i
$
where c is the cutoff,$R_i$ is order lateness (in minutes).

In [1]:
import pandas as pd
import numpy as np
np.random.seed(42)

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

import statsmodels.formula.api as smf

# data generation

In [2]:
# paramter values 
LOWER, CUTOFF, UPPER = 0, 30, 60
beta0, beta1, beta2, beta3 = 50, -0.8, 10, -0.1

In [8]:
def generate_data(n, std, b0=beta0,  b1=beta1, b2=beta2, b3=beta3, lower=LOWER,
                 upper=UPPER, cutoff=CUTOFF):
    
    #generate order lateness time
    min_late = np.random.uniform(lower,upper,n)
    
    # label the refunded order
    refunded = np.where(min_late < cutoff, 0, 1)
    
    # generate error term
    errors = np.random.normal(0, std, n)
    
    # predict LTV using the model
    
    ltv = (b0
           +b1 * (min_late - cutoff)
           +b2 * refunded
           +b3 * (min_late - cutoff) * refunded
           +errors
    )
    
    # create the dataset
    df = pd.DataFrame({'min_late': min_late, 'ltv': ltv})
    
    # center the min_late variable around cutoff
    df['min_late_centered'] = df['min_late'] - cutoff
    
    # create indicator for treatment (refund)
    df['refunded'] = df['min_late'].apply(lambda x: 1 if x>= cutoff else 0)
    
    return df
    
    

In [9]:
# generate data for 2000 late orders
df = generate_data(2000, 10)
df.head()

Unnamed: 0,min_late,ltv,min_late_centered,refunded
0,24.279325,45.723276,-5.720675,0
1,25.74288,45.679327,-4.25712,0
2,13.606173,42.905038,-16.393827,0
3,27.352955,53.817083,-2.647045,0
4,57.195291,33.915188,27.195291,1


In [10]:
# model fitting
model = smf.wls('ltv~min_late_centered * refunded',df).fit()

In [11]:
# model summary
model.summary()

0,1,2,3
Dep. Variable:,ltv,R-squared:,0.521
Model:,WLS,Adj. R-squared:,0.52
Method:,Least Squares,F-statistic:,722.4
Date:,"Wed, 06 Apr 2022",Prob (F-statistic):,6.16e-318
Time:,14:01:19,Log-Likelihood:,-7457.7
No. Observations:,2000,AIC:,14920.0
Df Residuals:,1996,BIC:,14950.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,49.6451,0.625,79.428,0.000,48.419,50.871
min_late_centered,-0.8230,0.036,-22.970,0.000,-0.893,-0.753
refunded,10.1037,0.898,11.249,0.000,8.342,11.865
min_late_centered:refunded,-0.0181,0.051,-0.353,0.724,-0.119,0.083

0,1,2,3
Omnibus:,1.02,Durbin-Watson:,1.995
Prob(Omnibus):,0.601,Jarque-Bera (JB):,0.999
Skew:,0.055,Prob(JB):,0.607
Kurtosis:,3.006,Cond. No.,90.7


The coeffcient of the treatment is the treatment effect: refunded customers has a increased LTV of 10.1 dollars