# Use historical sales data to predict future adoptions (Practice)

## Estimate $p,q, M$ 

Bass model prediction for new adopters
$$\hat{A}(t) = M\cdot\frac{1-exp(-(p+q)t)}{1+\frac{q}{p}exp(-(p+q)t)}$$ <br>
$$\hat{N}(t) = \hat{A}(t)-\hat{A}(t-1)$$ <br>


The GitHub link for the dataset: [3-2-2 BassModelEstimatePQM2.csv](https://github.com/zoutianxin1992/MarketingAnalyticsPython/blob/main/Marketing%20Analytics%20in%20Python/Bass%20model/Dataset/3-2-2%20BassModelEstimatePQM2.csv). The dataset has the product's historical sales for 14 periods. Assuming all sales come from new adoptions, predict and plot the new adoption curve until 30th period.

**Remember**: when loading data from GitHub, make sure the url links to the **raw dataset**!!

In [None]:
# input your code herer
# load packages and datasets, rename datasets

In [1]:
# import pandas as pd
# import numpy as np
# from matplotlib import pyplot as plt
# from scipy.optimize import least_squares                 # package to conduct Nonlinear least square 

# # import historical data 
# url = "https://raw.githubusercontent.com/zoutianxin1992/MarketingAnalyticsPython/main/Marketing%20Analytics%20in%20Python/Bass%20model/Dataset/3-2-2%20BassModelEstimatePQM2.csv"
# df = pd.read_csv(url) 
# # Rename the variables to t and N_t
# df.rename(columns = {df.columns[0]:"t",df.columns[1]:"N"}, inplace = True)  # "inplace" apply the name change to df itself 
# df.head()

In [None]:
# input your code here
# define A_hat(t) and N_hat(t)

In [4]:
# # define A_hat(t) and N_hat(t)

# def A_hat(t,p,q,M):  #t: time, params: the 1*3 array for (p,q,M)
#     return M * (1 - np.exp(-(p+q)*t))/(1 + q / p* np.exp(-(p+q)*t))

# # define N_hat(t) 
# def N_hat(t,p,q,M):  
#     return A_hat(t,p,q,M) - A_hat(t-1,p,q,M)  # We can use the A_hat function instead of manually typing the formula

The formulae for SSE is 
$$SSE = \sum_{t=1}^{T}(N(t)-\hat{N}(t))^2$$

Once we know the prediction errors $e(t) = N(t)-\hat{N}(t)$ , we can calculate $SSE$. Remember the errors are determined by the Bass parameters ($p,q,M$) and the historical sales data($t,N(t)$). We already know the historical sales data, so the moving parts are $p,q,M$. In other words, the prediction errors $(e(t=1),e(t=2),...e(t=T))$ are a **function** of $p,q,M$.<br> 

Python's NLS algorithm requires us to tell it how to calculate the prediction error for each period. So, we first construct the prediction errors as a function of $p,q,M$.

In [None]:
# your code here
# construct the prediction_error function

In [5]:
# # define prediction errors as a fucntion of p,q,M
# T = len(df["N"])   # number of periods for historical data


# def prediction_error(params):   # Note that we input p,q,M as a 1*3 array "params." This is required by Python's NLS solver we will use. 
#     p = params[0]
#     q = params[1]
#     M = params[2]
#     Nhat = [N_hat(t,p,q,M) for t in range(1,T+1)]            # Given p,q,M, generate Bass prediction for each period
#     return df["N"] - Nhat                                 # Prediction error for each period
                            

Our next task is to find $p,q,M$ that minimizes $SSE$ using NLS. We will use `scipy.optimize.least_squares` NLS solver. The input for the solver is `prediction_error`.   

In [None]:
# your code here
# run NLS to estimate p,q,M

In [6]:
# # estimate p,q,M using least_squares
# # Bass model requires 0<p, 0<q, M>0, so we need to add the constraints
# A_t = sum(df['N'])           # calculate already adopters until period t
# params0 = [0.01,0.16,3*A_t]  # initial guess for p,q,M. Required by least_squares
# estim_results= least_squares(prediction_error, params0, bounds = (0,np.Inf) )

# #########################
# # prediction_error: an array of prediction errors for each period
# # param0: initial guesses
# # bounds: The bounds for p,q,M. In our case p,q,M>0
# #########################
# estim_results
# # Success should be True
# # "x" is the estimated parameters (what we want).


In [None]:
# your code here
# store estimation results 

In [7]:
# p_estim = estim_results.x[0]
# q_estim = estim_results.x[1]
# M_estim = estim_results.x[2]


## Predict future sales for 30 periods

In [4]:
# your code here
# predict future sales for 30 periods

In [5]:
# T_pred = 30  # number of periods for prediction
# predictA = [A_hat(t,p_estim,q_estim,M_estim) for t in range(1,T_pred+1)]  # predict already adopters for T periods
# predictN = [N_hat(t,p_estim,q_estim,M_estim) for t in range(1,T_pred+1)]  # predict new adopters for T periods

In [6]:
# Your code here
# Plot predicted new adoptions in 30 periods

In [10]:
# # Plot the trajectory of new adopters

# plt.rcParams['figure.figsize'] = [12,8]  # set figure size to be 12*8 inch
# plt.plot(range(1,T_pred+1),predictN)
# plt.xticks(range(1,T_pred+1,2), fontsize = 18)
# plt.yticks(fontsize = 18)
# plt.ylabel("New adopters",fontsize = 18)
# plt.xlabel("time", fontsize = 18)