# SOLUTION Tutorial 7: Panel data and mixed logit

We will analyze a marketing dataset, of choice of brand of catsup (a.k.a. ketchup or 'tomato sauce' in Australia).

We have two famous brands of catsup and 3 different package sizes.

A description of the dataset can be found [here](https://www.tandfonline.com/doi/pdf/10.1080/07350015.1994.10524547?casa_token=r4LpjVvgDW4AAAAA:FVG8mEexsQ37tJ2bvk7oxZZ9K_jvvMJ2WxglLzBaHQD0_0REkXmKGsPPxXw_LRGwN3YHY8-L-k8U)

# Description of the dataset

* **id**: household identifiers,
* **choice**: one of heinz41, heinz32, heinz28, hunts32.
* **disp_x**: is there a display for brand X ?
* **feat_x**: is there a newspaper feature advertisement for brand x?
* **price_x**: price of brand x

---
---

# Preparing the environment
*The preparation and dataset loading code is given to the students*

In [None]:
!pip install biogeme

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting biogeme
  Downloading biogeme-3.2.10.tar.gz (1.8 MB)
[K     |████████████████████████████████| 1.8 MB 6.6 MB/s 
[?25hBuilding wheels for collected packages: biogeme
  Building wheel for biogeme (setup.py) ... [?25l[?25hdone
  Created wheel for biogeme: filename=biogeme-3.2.10-cp37-cp37m-linux_x86_64.whl size=4253804 sha256=2edbce30a47809201b98bf9c89bc4e00a20ef42f1adf4314ce34fec6775779b8
  Stored in directory: /root/.cache/pip/wheels/5b/92/9b/63caa7ad9b2cd582de77d3701d10f7e8d041466f4a9d07d554
Successfully built biogeme
Installing collected packages: biogeme
Successfully installed biogeme-3.2.10


Load the packages, feel free to change the names.

In [None]:
import pandas  as pd
import numpy as np
import matplotlib.pyplot as plt

import biogeme.database as db
import biogeme.biogeme as bio
import biogeme.models as models
import biogeme.expressions as exp
import biogeme.tools as tools
import biogeme.distributions as dist

# Load the dataset

In [None]:
path = 'https://raw.githubusercontent.com/pmontman/pub-choicemodels/main/data/catsup.csv'
catsup_pd = pd.read_csv(path)


In this case, notice the id variable, that identifies each household, so we have data from many choice situations for each household. We have also different amount of observations per household.

In [None]:
catsup_pd.head(25)

Unnamed: 0,id,disp_heinz41,disp_heinz32,disp_heinz28,disp_hunts32,feat_heinz41,feat_heinz32,feat_heinz28,feat_hunts32,price_heinz41,price_heinz32,price_heinz28,price_hunts32,choice
0,1,0,0,0,0,0,0,0,0,4.6,3.7,5.2,3.4,heinz28
1,1,0,0,0,0,0,0,0,0,4.6,4.3,5.2,4.4,heinz28
2,1,0,0,0,0,0,1,0,0,4.6,2.5,4.6,4.8,heinz28
3,1,0,0,0,0,0,0,0,0,4.6,3.7,5.2,3.4,heinz28
4,1,0,0,0,0,0,0,1,0,4.6,3.0,4.6,4.8,heinz28
5,1,0,0,0,0,0,0,0,0,5.0,3.0,4.7,3.0,heinz28
6,1,0,0,0,1,0,0,0,1,5.1,3.1,4.6,4.1,heinz28
7,1,0,0,0,0,0,0,0,0,4.6,3.4,4.7,3.1,heinz41
8,1,0,0,0,0,0,0,0,0,5.0,3.4,4.7,3.1,heinz28
9,1,0,0,0,1,0,0,0,0,5.0,3.4,5.0,2.8,heinz28


# Auxiliary function

In [None]:
def qbus_update_globals_bgm(pd_df):
   globals().update(db.Database('tmp_bg_bgm_for_glob', pd_df).variables)

# Data cleaning: Preparing the dataset for Biogeme

Encode the choice variable (a string) into numbers with the `factorize` function.
We take a look at the codetable to know how the numbers are mapped to the alternatives. The order of the codetable indicates the association,
0: heinz28, 1: heinz41 and so on.

In [None]:
catsup_pd['choice'], codetable = catsup_pd['choice'].factorize()

In [None]:
codetable

Index(['heinz28', 'heinz41', 'heinz32', 'hunts32'], dtype='object')

In [None]:
catsup_pd['choice']

0       0
1       0
2       0
3       0
4       0
       ..
2793    1
2794    1
2795    3
2796    0
2797    0
Name: choice, Length: 2798, dtype: int64

#Capturing agent effect through mixed logit

In [None]:

# Define level of verbosity
import biogeme.messaging as msg
logger = msg.bioMessage()
logger.setSilent()


We will have to use some additional functionalities of biogeme so the auxiliary functions cannot be used. So we have to create the biogeme database manually.

In [None]:
database = db.Database("catsup", catsup_pd)

This line of code tells biogeme to use the variable id in the dataset as the identifier of the individuals to treat them as panel data (if not, it will take observations as if they were independent).

In [None]:
database.panel("id")

We now declare the coefficients in our model, we will make a simple model,
just the alternative-specif constants and the variables disp, feat and price.

In [None]:
ASC_heinz41 = exp.Beta('ASC_heinz41',0,None,None,0)
ASC_heinz32  = exp.Beta('ASC_heinz32 ',0,None,None,0)
ASC_heinz28 = exp.Beta('ASC_heinz28',0,None,None,0)
ASC_hunts32 = exp.Beta('ASC_hunts32',0,None,None,1)

B_disp = exp.Beta('B_disp',0,None,None,0)
B_feat = exp.Beta('B_feat',0,None,None,0)
B_price = exp.Beta('B_price',0,None,None,0)

Now comes the **important part**, the definition of the random effects in the model!

We want to consider a simple constant agent effect.
Recall the definition:

$$V_{jit} = \beta X_{jit} + \alpha_{ij}$$
and we have to specifiy the distribition of the $\alpha$. In this case, we will use the normal, so $\alpha_{ij} \in N(\mu_j, \sigma_j^2)$.
The ingredients are:
 1) Probability distrbitution: We set it ourselves, the normal (we could have set others, such as the uniform, lognormal and so on.
 2) Parameters of the probability distribution (the $\mu$ and $\sigma$): They will be estimated from the data.

 If we think for a moment, the $\mu$ of this distribution will also depend on the values of the alternative-specific constants (ASC), since changing $mu$ essential mean adding a constant to all values coming for that distribution. We can think that the mean of that distribution will be 'absorbed' by the ASCs.
Again this is just another convention!

What we end up doing in biogeme is that we will only declare the parameter for the standard deviation, so each normal distribution will be mean 0 and std.dev to determine from the data.


In terms of code, we declare the std.devs (the $\sigma_j$} just as any other parameter. We have to set on of them to 0 to act as reference (remember that changes in scale do not affect utility).

In [None]:

SIGMA_heinz41 = exp.Beta('SIGMA_heinz41',0,0,None,0)
SIGMA_heinz32 = exp.Beta('SIGMA_heinz32',0,0,None, 1)
SIGMA_heinz28 = exp.Beta('SIGMA_heinz28',0,0,None,0)
SIGMA_hunts32 = exp.Beta('SIGMA_hunts32',0,0,None,0)


And the following code is how we tell biogeme that the parameters are random.
The `EC_`s are the agent effects. EC stands for Error Component. The key function that indicates randomness is the biogeme function `exp.bioDraws`, that indicates that they are drawn from a probability distribution. The second argument specifies the distribution, some possible values are `'NORMAL'` `'UNIFORMSYM'`.

In [None]:

# Define random parameters, normally distributed across individuals,
# designed to be used for Monte-Carlo simulation
EC_heinz41 = SIGMA_heinz41 * exp.bioDraws('EC_heinz41','NORMAL')
EC_heinz32 = SIGMA_heinz32 * exp.bioDraws('EC_heinz32','NORMAL')
EC_heinz28 = SIGMA_heinz28 * exp.bioDraws('EC_heinz28','NORMAL')
EC_hunts32 = SIGMA_hunts32 * exp.bioDraws('EC_hunts32','NORMAL')

An this is how the specification of the utility functions looks like in the end.
The Betas (disp, feat and price) and ASCsare fixed effect, while the EC are random.

In [None]:
globals().update(database.variables)
# Definition of the utility functions
V_heinz41 = ASC_heinz41 + B_disp *disp_heinz41 + B_feat * feat_heinz41 + B_price * price_heinz41 + EC_heinz41
V_heinz32 = ASC_heinz32 + B_disp *disp_heinz32 + B_feat * feat_heinz32 + B_price * price_heinz32 + EC_heinz32
V_heinz28 = ASC_heinz28 + B_disp *disp_heinz28 + B_feat * feat_heinz28 + B_price * price_heinz28 + EC_heinz28
V_hunts32 = ASC_hunts32 + B_disp *disp_hunts32 + B_feat * feat_hunts32 + B_price * price_hunts32 + EC_hunts32

We create the dictionary that maps to the alternatives. **Remember to be careful here.** The numbers should match the alternatives as we indicated by the factorize() transformation at the beginning of the notebook.
Availabilities are not considered here, we set them to 1.
Finaly, we specify the logit model, as usual.
These steps are common to the multinomial logit.

In [None]:
# Associate utility functions with the numbering of alternatives
V = {0: V_heinz28,
     1: V_heinz41,
     2: V_heinz32,
     3: V_hunts32}

av = {0: 1,
     1: 1,
     2: 1,
     3: 1}

# Conditional to the random variables, the likelihood of one observation is
# given by the logit model (called the kernel)
obsprob = models.logit(V,av, choice)


The difference from the usual declaration of the MNL comes now.

We have to do two new steps:

1. Tell biogeme to consider the panel nature of the data.
2. Tell biogeme to calculate the choice probabilities by simulation. This is how we deal with the random parameters. We simulate for the distribution, and the we calculate the likelihood for that distribution.

Step 1: we can do it by modifying the model with the expression `exp.PanelLikelihoodTrajectory`.

In [None]:
condprobIndiv = exp.PanelLikelihoodTrajectory(obsprob)

And Step 2 we take the model and the modifyi it by the expresion `exp.MonteCarlo`. The final log it to take the loglikelihood.

In [None]:
logprob = exp.log(exp.MonteCarlo(condprobIndiv))

We we are using simulation, we have to tell biogeme how many draws from the distribution are we going to generate. The more draws, the more accurate estimation, but it is compuationally costly.

We also set up a seed, so we can get the same results if the run the notebook again (setting up a seed is a good habit in general)

In [None]:

# Create the Biogeme object
biogeme  = bio.BIOGEME(database,logprob,numberOfDraws=250, seed=1)




Estimation and results as usual.

In [None]:

# Estimate the parameters.
results = biogeme.estimate()

We take a look at the results,

In [None]:
results.getEstimatedParameters()

Unnamed: 0,Value,Rob. Std err,Rob. t-test,Rob. p-value
ASC_heinz28,3.801648,0.232028,16.384436,0.0
ASC_heinz32,2.453806,0.214897,11.418533,0.0
ASC_heinz41,2.346258,0.277187,8.464534,0.0
B_disp,1.075613,0.138652,7.757636,8.65974e-15
B_feat,1.23168,0.154484,7.972841,1.554312e-15
B_price,-1.936178,0.100835,-19.201538,0.0
SIGMA_heinz28,1.80512,0.120974,14.921575,0.0
SIGMA_heinz41,1.951914,0.198682,9.824297,0.0
SIGMA_hunts32,2.142731,0.185798,11.532568,0.0


Interestingly, the simulation of panel data is not implemented! No problem
we will still be able to do simulations, by setting up a scenario that does not consider the panel data.

In [None]:
#biogeme.simulate(results.getBetaValues())

# Compare Panel vs Not using the panel information

We will just compare the results that we get if the just ignore the panel information. We can recreate this by not using `exp.PanelLikelihoodTrajectory` in the model.

In math, this would be the specification:

$$V_{jit} = \beta X_{jit} + \alpha_{ij \color{red}{t}}$$

As opposed to the panel specification
$$V_{jit} = \beta X_{jit} + \alpha_{ij}$$

Notice the difference subindex $t$. Each row is identified by the indices $i$ and $t$ together. When we remove the $t$ information, we are grouping the rows per individual (household in our dataset).

In [None]:
database_nonpanel = db.Database("catsup", catsup_pd)



# We integrate over the random variables using Monte-Carlo
logprob_nonpanel = exp.log(exp.MonteCarlo(obsprob))


# Create the Biogeme object
biogeme_nonpanel  = bio.BIOGEME(database_nonpanel,logprob_nonpanel,numberOfDraws=250, seed=1)


# Estimate the parameters.
results_nonpanel = biogeme_nonpanel.estimate()


In [None]:
results_nonpanel.getEstimatedParameters()

Unnamed: 0,Value,Rob. Std err,Rob. t-test,Rob. p-value
ASC_heinz28,2.781881,0.168652,16.494795,0.0
ASC_heinz32,1.685863,0.086775,19.428079,0.0
ASC_heinz41,1.078731,0.383199,2.815068,0.004876699
B_disp,1.137851,0.146708,7.755866,8.881784e-15
B_feat,1.360532,0.200471,6.786691,1.147349e-11
B_price,-2.170979,0.158542,-13.69338,0.0
SIGMA_heinz28,2.476644,0.285309,8.680576,0.0
SIGMA_heinz41,2.161251,0.461285,4.685286,2.795695e-06
SIGMA_hunts32,0.134805,0.178947,0.753323,0.4512558


Simulation of the choice probabilities is similar to the MNL, we just have
to

In [None]:
tgt_nonpanel = {
    0: exp.MonteCarlo(models.logit(V, av, 0)),
    1: exp.MonteCarlo(models.logit(V, av, 1)),
    2: exp.MonteCarlo(models.logit(V, av, 2)),
    3: exp.MonteCarlo(models.logit(V, av, 3))
    }


In [None]:

sim_nonpanel = bio.BIOGEME(database_nonpanel, tgt_nonpanel, numberOfDraws=250, seed=1)
preds = sim_nonpanel.simulate(theBetaValues=results_nonpanel.getBetaValues())
preds

Unnamed: 0,0,1,2,3
0,0.204173,0.111835,0.502622,0.181371
1,0.313457,0.227677,0.399662,0.059205
2,0.044618,0.010198,0.944877,0.000306
3,0.176578,0.116181,0.523072,0.184169
4,0.319196,0.053934,0.624516,0.002354
...,...,...,...,...
2793,0.232625,0.246194,0.383579,0.137602
2794,0.482266,0.049863,0.378509,0.089361
2795,0.104730,0.020722,0.120057,0.754491
2796,0.176490,0.048615,0.333622,0.441273


We can approximate a simulation for the panel data by using the betas estimated in our panel model *on the nonpanel* database.

In [None]:
sim_panel = bio.BIOGEME(database_nonpanel, tgt_nonpanel, numberOfDraws=250, seed=1)
preds = sim_panel.simulate(theBetaValues=results.getBetaValues())
preds

Unnamed: 0,0,1,2,3
0,0.200879,0.151947,0.447840,0.199333
1,0.310939,0.262697,0.333597,0.092767
2,0.047984,0.019909,0.930378,0.001729
3,0.183895,0.153179,0.484741,0.178185
4,0.359196,0.079345,0.547679,0.013780
...,...,...,...,...
2793,0.230007,0.294237,0.327943,0.147814
2794,0.500670,0.064819,0.311110,0.123401
2795,0.157138,0.057858,0.271537,0.513468
2796,0.204456,0.082957,0.381347,0.331240


# Compare to the Multinomial Logit (fixed effects) without agent effect

Just remove the random effect when specifying the utility functions.

In [None]:
V_heinz41_mnl = ASC_heinz41 + B_disp *disp_heinz41 + B_feat * feat_heinz41 + B_price * price_heinz41 #+ EC_heinz41
V_heinz32_mnl = ASC_heinz32 + B_disp *disp_heinz32 + B_feat * feat_heinz32 + B_price * price_heinz32 #+ EC_heinz32
V_heinz28_mnl = ASC_heinz28 + B_disp *disp_heinz28 + B_feat * feat_heinz28 + B_price * price_heinz28 #+ EC_heinz28
V_hunts32_mnl = ASC_hunts32 + B_disp *disp_hunts32 + B_feat * feat_hunts32 + B_price * price_hunts32 #+ EC_hunts32

In [None]:
V_mnl = {0: V_heinz28_mnl,
     1: V_heinz41_mnl,
     2: V_heinz32_mnl,
     3: V_hunts32_mnl}

In [None]:
logprob = models.loglogit (V_mnl , av , choice )
bgm_model = bio.BIOGEME ( database_nonpanel, logprob )
results_mnl = bgm_model.estimate()

In [None]:
results_mnl.getEstimatedParameters()

Unnamed: 0,Value,Rob. Std err,Rob. t-test,Rob. p-value
ASC_heinz28,2.42598,0.090696,26.748619,0.0
ASC_heinz32,1.501277,0.063949,23.476326,0.0
ASC_heinz41,1.353676,0.114964,11.774747,0.0
B_disp,0.875589,0.102257,8.562616,0.0
B_feat,0.908575,0.120177,7.560316,4.019007e-14
B_price,-1.402389,0.056094,-25.000504,0.0


In [None]:
tgt_mnl = {
    0: (models.logit(V_mnl, av, 0)),
    1: (models.logit(V_mnl, av, 1)),
    2: (models.logit(V_mnl, av, 2)),
    3: (models.logit(V_mnl, av, 3))
    }

sim_mnl = bio.BIOGEME(database_nonpanel, tgt_mnl)
preds_mnl = sim_mnl.simulate(theBetaValues=results_mnl.getBetaValues())
preds_mnl

Unnamed: 0,0,1,2,3
0,0.162657,0.129124,0.528756,0.179463
1,0.288464,0.228995,0.404244,0.078296
2,0.049712,0.017012,0.929956,0.003319
3,0.162657,0.129124,0.528756,0.179463
4,0.374181,0.051618,0.564130,0.010071
...,...,...,...,...
2793,0.213049,0.257590,0.395221,0.134140
2794,0.472546,0.052663,0.377899,0.096891
2795,0.089003,0.030458,0.251466,0.629073
2796,0.174481,0.059711,0.428469,0.337340


---
---

# Exercise: Capturing dynamics: Add last choice as additional variable (assume that data was observed in order), add it as fixed parameter and  estimate a mixed logit.
Basically we add a new variable and repeat the process for estimating the mixed logit.

The first step is given to us: In the following cells we are going to create a new dataset that has an additional covariate representing the alternative that was chosen before each choice situation.


In [None]:
catsup_past = catsup_pd.copy()

This functions takes a column, removes the last observation and adds a -1 at the begginning. This is how we create the lagged variable.

In [None]:
def last_choice(x):
  return pd.Series([-1]).append(x[:-1])

We apply the function `last_choice` to the dataset, but we group the dataset by the id of the household.

In [None]:
lchoice =  catsup_past.groupby('id')['choice'].apply(last_choice).reset_index()#.head(25)
catsup_past['last_choice'] = lchoice['choice']
catsup_past.head(17)

Unnamed: 0,id,disp_heinz41,disp_heinz32,disp_heinz28,disp_hunts32,feat_heinz41,feat_heinz32,feat_heinz28,feat_hunts32,price_heinz41,price_heinz32,price_heinz28,price_hunts32,choice,last_choice
0,1,0,0,0,0,0,0,0,0,4.6,3.7,5.2,3.4,0,-1
1,1,0,0,0,0,0,0,0,0,4.6,4.3,5.2,4.4,0,0
2,1,0,0,0,0,0,1,0,0,4.6,2.5,4.6,4.8,0,0
3,1,0,0,0,0,0,0,0,0,4.6,3.7,5.2,3.4,0,0
4,1,0,0,0,0,0,0,1,0,4.6,3.0,4.6,4.8,0,0
5,1,0,0,0,0,0,0,0,0,5.0,3.0,4.7,3.0,0,0
6,1,0,0,0,1,0,0,0,1,5.1,3.1,4.6,4.1,0,0
7,1,0,0,0,0,0,0,0,0,4.6,3.4,4.7,3.1,1,0
8,1,0,0,0,0,0,0,0,0,5.0,3.4,4.7,3.1,0,1
9,1,0,0,0,1,0,0,0,0,5.0,3.4,5.0,2.8,0,0


But now it is up to you how the new variable is added to the model!
Transformations? Dummy encoding? Per-alternative parameters?.

We could dummy encode last_choice and add them as additive per-alternative coefficients to the utilities. If you think about it, it last_choice is actually a characteristic, so it should be modelled per alternative (or via an interaction). This is for example, if last choice is heinz then the current choice is more likely to be heinz, so heinz utility.

In [None]:
pd.get_dummies(catsup_past['last_choice'],prefix='lc')

Unnamed: 0,lc_-1,lc_0,lc_1,lc_2,lc_3
0,1,0,0,0,0
1,0,1,0,0,0
2,0,1,0,0,0
3,0,1,0,0,0
4,0,1,0,0,0
...,...,...,...,...,...
2793,0,0,0,0,1
2794,0,0,1,0,0
2795,0,0,1,0,0
2796,0,0,0,0,1


In [None]:
catsup_past = pd.concat([catsup_past, pd.get_dummies(catsup_past['last_choice'],prefix='lc')], axis=1)
catsup_past

Unnamed: 0,id,disp_heinz41,disp_heinz32,disp_heinz28,disp_hunts32,feat_heinz41,feat_heinz32,feat_heinz28,feat_hunts32,price_heinz41,price_heinz32,price_heinz28,price_hunts32,choice,last_choice,lc_-1,lc_0,lc_1,lc_2,lc_3
0,1,0,0,0,0,0,0,0,0,4.6,3.7,5.2,3.4,0,-1,1,0,0,0,0
1,1,0,0,0,0,0,0,0,0,4.6,4.3,5.2,4.4,0,0,0,1,0,0,0
2,1,0,0,0,0,0,1,0,0,4.6,2.5,4.6,4.8,0,0,0,1,0,0,0
3,1,0,0,0,0,0,0,0,0,4.6,3.7,5.2,3.4,0,0,0,1,0,0,0
4,1,0,0,0,0,0,0,1,0,4.6,3.0,4.6,4.8,0,0,0,1,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2793,300,0,0,0,0,0,0,0,0,3.5,3.3,4.4,3.0,1,3,0,0,0,0,1
2794,300,0,0,0,0,0,0,0,0,5.1,3.8,4.3,3.7,1,1,0,0,1,0,0
2795,300,0,0,0,1,0,0,0,0,5.0,3.6,5.0,2.5,3,1,0,0,1,0,0
2796,300,0,0,0,0,0,0,0,0,5.0,3.7,5.0,2.8,0,3,0,0,0,0,1


Now we create a mixed logit with lastchoice variables, in this case there are a few variables to add, we will add them block by block. It is a bit cumbersome but essentialy we will be adding coefficients-per-alternative to the reference mixed logit model.

In [None]:
db_lastchoice = db.Database("catsup_lastchoice", catsup_past)
globals().update(db_lastchoice.variables)

We start adding the last_choice_was_0 (zero means heinz28) variable to the model

In [None]:
B_lc_zero_h28 = exp.Beta('B_lc_zero_h28',0,None,None,0)
B_lc_zero_h41 = exp.Beta('B_lc_zero_h41',0,None,None,0)
B_lc_zero_h32 = exp.Beta('B_lc_zero_h32',0,None,None,0)
B_lc_zero_hunts32 = exp.Beta('B_lc_zero_hunts32',0,None,None,0)

V_lc_heinz41 = B_lc_zero_h41 * lc_0
V_lc_heinz32 = B_lc_zero_h32 * lc_0
V_lc_heinz28 = B_lc_zero_h28 * lc_0
V_lc_hunts32 = B_lc_zero_hunts32 * lc_0

We add the last_choice_was_1 (1 means heinz41) variable to the model





In [None]:
B_lc_one_h28 = exp.Beta('B_lc_one_h28',0,None,None,0)
B_lc_one_h41 = exp.Beta('B_lc_one_h41',0,None,None,0)
B_lc_one_h32 = exp.Beta('B_lc_one_h32',0,None,None,0)
B_lc_one_hunts32 = exp.Beta('B_lc_one_hunts32',0,None,None,0)

V_lc_heinz41 = V_lc_heinz41 + B_lc_one_h41 * lc_1
V_lc_heinz32 = V_lc_heinz32 + B_lc_one_h32 * lc_1
V_lc_heinz28 = V_lc_heinz28 + B_lc_one_h28 * lc_1
V_lc_hunts32 = V_lc_hunts32 + B_lc_one_hunts32 * lc_1

We add the last_choice_was_2 (2 means heinz33) variable to the model

In [None]:
B_lc_two_h28 = exp.Beta('B_lc_two_h28',0,None,None,0)
B_lc_two_h41 = exp.Beta('B_lc_two_h41',0,None,None,0)
B_lc_two_h32 = exp.Beta('B_lc_two_h32',0,None,None,0)
B_lc_two_hunts32 = exp.Beta('B_lc_two_hunts32',0,None,None,0)

V_lc_heinz41 = V_lc_heinz41 + B_lc_two_h41 * lc_2
V_lc_heinz32 = V_lc_heinz32 + B_lc_two_h32 * lc_2
V_lc_heinz28 = V_lc_heinz28 + B_lc_two_h28 * lc_2
V_lc_hunts32 = V_lc_hunts32 + B_lc_two_hunts32 * lc_2

We add the last_choice_was_3 (3 means hunts32) variable to the model

In [None]:
B_lc_three_h28 = exp.Beta('B_lc_three_h28',0,None,None,0)
B_lc_three_h41 = exp.Beta('B_lc_three_h41',0,None,None,0)
B_lc_three_h32 = exp.Beta('B_lc_three_h32',0,None,None,0)
B_lc_three_hunts32 = exp.Beta('B_lc_three_hunts32',0,None,None,0)

V_lc_heinz41 = V_lc_heinz41 + B_lc_three_h41 * lc_3
V_lc_heinz32 = V_lc_heinz32 + B_lc_three_h32 * lc_3
V_lc_heinz28 = V_lc_heinz28 + B_lc_three_h28 * lc_3
V_lc_hunts32 = V_lc_hunts32 + B_lc_three_hunts32 * lc_3

Finally, we add the ASCs and price and feature variables

In [None]:
V_lc_heinz41 = V_lc_heinz41 + ASC_heinz41 + B_disp *disp_heinz41 + B_feat * feat_heinz41 + B_price * price_heinz41 #+ EC_heinz41
V_lc_heinz32 = V_lc_heinz32 + ASC_heinz32 + B_disp *disp_heinz32 + B_feat * feat_heinz32 + B_price * price_heinz32 #+ EC_heinz32
V_lc_heinz28 = V_lc_heinz28 + ASC_heinz28 + B_disp *disp_heinz28 + B_feat * feat_heinz28 + B_price * price_heinz28 #+ EC_heinz28
V_lc_hunts32 = V_lc_hunts32 + ASC_hunts32 + B_disp *disp_hunts32 + B_feat * feat_hunts32 + B_price * price_hunts32

In [None]:
V_lc = {0: V_lc_heinz28,
     1: V_lc_heinz41,
     2: V_lc_heinz32,
     3: V_lc_hunts32}

In [None]:
logprob_lc = models.loglogit (V_lc , av , choice )
bgm_model_lc = bio.BIOGEME ( db_lastchoice, logprob_lc )
results_lc = bgm_model_lc.estimate()

There are quite some variables, but the last_choice variables corresponding with each altenative (zero for heinz28, one for heinz41, two for heinz32 and three for hunts32) should be **very positive** meaning that is last choice was heinz28, then this choice is much more likely to be heinz28 again.

In [None]:
results_lc.getEstimatedParameters()

Unnamed: 0,Value,Rob. Std err,Rob. t-test,Rob. p-value
ASC_heinz28,2.748118,0.231804,11.855339,0.0
ASC_heinz32,1.354018,0.173756,7.792658,6.661338e-15
ASC_heinz41,-0.391002,0.531958,-0.735025,0.4623242
B_disp,0.876904,0.109217,8.029034,8.881784e-16
B_feat,1.033625,0.133609,7.736218,1.021405e-14
B_lc_one_h28,-0.154153,0.228291,-0.675248,0.4995182
B_lc_one_h32,-0.634252,0.21162,-2.997128,0.002725366
B_lc_one_h41,3.319333,0.420747,7.889146,3.108624e-15
B_lc_one_hunts32,-0.56878,0.272321,-2.08864,0.03674015
B_lc_three_h28,-0.761364,0.223385,-3.408308,0.0006536713


In [None]:
def qbus_likeli_ratio_test_bgm(results_complex, results_reference, signif_level):
  return tools.likelihood_ratio_test( (results_complex.data.logLike, results_complex.data.nparam),
                                     (results_reference.data.logLike, results_reference.data.nparam), signif_level)

We also compare the likelihoods of the basic model without last choices and the one with last choices, it is much better. We can reject even at very small p-values.

In [None]:
qbus_likeli_ratio_test_bgm(results_lc, results_mnl, 0.001)

LRTuple(message='H0 can be rejected at level 0.1%', statistic=526.9212248203894, threshold=39.252354790768464)