#Types of questions for the final exam

We will see:
 * Nested logit
 * Ordered logit
 * Panel data and Mixed logit



---
---

# Preparing the environment
*The preparation and dataset loading code is given to the students*

In [None]:
!pip install biogeme



Load the packages, feel free to change the names.

In [None]:
import pandas  as pd
import numpy as np
import matplotlib.pyplot as plt

import biogeme.database as db
import biogeme.biogeme as bio
import biogeme.models as models
import biogeme.expressions as exp
import biogeme.tools as tools
import biogeme.distributions as dist

# Load the dataset

In [None]:
path = 'https://raw.githubusercontent.com/pmontman/pub-choicemodels/main/data/fishing.csv'
data_pd = pd.read_csv(path)

A simple look at the dataset.

In [None]:
data_pd.head(5)

Unnamed: 0,mode,price_beach,price_pier,price_boat,price_charter,catch_beach,catch_pier,catch_boat,catch_charter,income
0,4,157.93,157.93,157.93,182.93,0.0678,0.0503,0.2601,0.5391,7083.3317
1,4,15.114,15.114,10.534,34.534,0.1049,0.0451,0.1574,0.4671,1249.9998
2,3,161.874,161.874,24.334,59.334,0.5333,0.4522,0.2413,1.0266,3749.9999
3,2,15.134,15.134,55.93,84.93,0.0678,0.0789,0.1643,0.5391,2083.3332
4,3,106.93,106.93,41.514,71.014,0.0678,0.0503,0.1082,0.324,4583.332


---
---

# Auxiliary functions

The first function takes the dictionary of utilities, a pandas dataframe, and the name of the variable that contains the variable with the results of the choice. It returns the biogeme object with the model and the estimated 'results' object (the one we get the values, likelihoods, etc.)
We have added the dictionary with the utilities to the biogeme object, in case we use it later.

In [None]:
def qbus_estimate_bgm(V, pd_df, tgtvar_name, modelname='bgmdef'):
 av_auto = V.copy()
 for key, value in av_auto.items():
   av_auto[key] = 1
 bgm_db = db.Database(modelname + '_db', pd_df)
 globals().update(bgm_db.variables)
 logprob = models.loglogit (V , av_auto , bgm_db.variables[tgtvar_name] )
 bgm_model = bio.BIOGEME ( bgm_db, logprob )
 bgm_model.utility_dic = V.copy()
 return bgm_model, bgm_model.estimate()

The next function will calculate the predictions for a given biogeme object that was estimated with `qbus_estimate_bgm`. The output is the array with the choice probabilities. From the choice probabilities, this can be used to calculate accuracies, confusion matrices and the output of what-if scenarios.

In [None]:
def qbus_simulate_bgm(qbus_bgm_model, betas, pred_pd_df):
  av_auto = None
  targets = None
  if hasattr(qbus_bgm_model, 'ord_probs'):
    av_auto = qbus_bgm_model.ord_probs.copy()
    targets = qbus_bgm_model.ord_probs.copy()
  else:
    av_auto = qbus_bgm_model.utility_dic.copy()
    targets = qbus_bgm_model.utility_dic.copy()

  for key, value in av_auto.items():
    av_auto[key] = 1



  for key, value in targets.items():
    if hasattr(qbus_bgm_model, 'nest_tuple'):
      targets[key] = models.nested(qbus_bgm_model.utility_dic, av_auto, qbus_bgm_model.nest_tuple, key)
    else:
      if hasattr(qbus_bgm_model, 'ord_probs'):
       0
       #targets[key] = qbus_bgm_model.ord_probs[key]
      else:
       targets[key] = models.logit(qbus_bgm_model.utility_dic, av_auto, key)

  bgm_db = db.Database('simul', pred_pd_df)
  globals().update(bgm_db.variables)
  bgm_pred_model = bio.BIOGEME(bgm_db, targets)
  simulatedValues = bgm_pred_model.simulate(betas)
  return simulatedValues

The function `qbus_calc_accu_confusion` calculates the accuracies given the choice probability predictions a pandas dataset and the specification of the name that contains the actual choices in the input dataset.

In [None]:
def qbus_calc_accu_confusion(sim_probs, pd_df, choice_var):
  which_max = sim_probs.idxmax(axis=1)
  data = {'y_Actual':   pd_df[choice_var],
          'y_Predicted': which_max
        }

  df = pd.DataFrame(data, columns=['y_Actual','y_Predicted'])
  confusion_matrix = pd.crosstab(df['y_Actual'], df['y_Predicted'], rownames=['Actual'], colnames=['Predicted'])
  accu = np.mean(which_max == pd_df[choice_var])
  return accu, confusion_matrix

The next function calculates the likelihood ratio test having to write a bit less code that the default biogeme function. The arguments are the results objects of the two models to be compared. The first is the more complex and the second is the reference model (**the order is important!**). The third argument is the significance level for the test.

In [None]:
def qbus_likeli_ratio_test_bgm(results_complex, results_reference, signif_level):
  return tools.likelihood_ratio_test( (results_complex.data.logLike, results_complex.data.nparam),
                                     (results_reference.data.logLike, results_reference.data.nparam), signif_level)

The next function just updates the globals so we can use it

In [None]:
def qbus_update_globals_bgm(pd_df):
   globals().update(db.Database('tmp_bg_bgm_for_glob', pd_df).variables)

The next function calculates the nested logit version. Similar to the multinomial logit

In [None]:
def qbus_estimate_nested_bgm(V, pd_df, nests,  tgtvar_name, modelname='bgmdef'):
 av_auto = V.copy()
 for key, value in av_auto.items():
   av_auto[key] = 1
 bgm_db = db.Database(modelname + '_db', pd_df)
 globals().update(bgm_db.variables)
 logprobnest = models.lognested (V, av_auto , nests , bgm_db.variables[tgtvar_name] )
 #logprob = models.loglogit (V , av_auto , bgm_db.variables[tgtvar_name] )
 bgm_model = bio.BIOGEME ( bgm_db, logprobnest )
 bgm_model.utility_dic = V.copy()
 bgm_model.nest_tuple = nests
 return bgm_model, bgm_model.estimate()

The auxiliary function for the ordered logit. The use is slightly different from the basic multinomial logit!
* The `V` argument is just the expression of a utility function, not the dictionary mapping alternative ids to the utility functions.
* The argument `ord_alt_ids` is a list with the ids of the alternatives **in the order that we want to impose**.The parameter to know about.

Then the rest of the arguments are used as usual `pd_df` the pandas dataframe, `tgt_varname` the name of the variable with the choices, and an optional `modelname`.

In [None]:
def qbus_estimate_ordered_bgm(V_ord, ord_alt_ids, pd_df, tgtvar_name, modelname='ord_bgm'):
 bgm_db = db.Database(modelname + '_db', pd_df)
 globals().update(bgm_db.variables)

 taus_map = {ord_alt_ids[0]: exp.Beta('tau1', -1, None, None, 0) }
 i = 1
 for id in ord_alt_ids[1:-1]:
  taus_map[id] = taus_map[ ord_alt_ids[i-1] ] + exp.Beta('delta_'+ str(i + 1), i, 0, None, 0)
  i = i + 1

 alt_probs_map = {ord_alt_ids[0]: dist.logisticcdf( taus_map[ord_alt_ids[0] ] - V_ord) }

 i = 1
 for id in ord_alt_ids[1:-1]:
  alt_probs_map[id] = dist.logisticcdf( taus_map[id] - V_ord) - dist.logisticcdf( taus_map[ ord_alt_ids[i-1] ] - V_ord)
  i = i + 1

 alt_probs_map[ord_alt_ids[i] ] = 1 - dist.logisticcdf( taus_map[ord_alt_ids[i-1]] - V_ord)

 logprob = exp.log(exp.Elem(alt_probs_map, bgm_db.variables[tgtvar_name]))

 #logprob = models.loglogit (V , av_auto , bgm_db.variables[tgtvar_name] )
 bgm_model = bio.BIOGEME ( bgm_db, logprob )
 bgm_model.utility_dic = V_ord
 bgm_model.ord_probs = alt_probs_map.copy()
 return bgm_model, bgm_model.estimate()

The mixed logit with panel data



In [None]:
def qbus_estimate_mixed_bgm(V, pd_df, tgtvar_name, panelvar_name=None, n_draws=50, seed=1, modelname='bgmdef'):
 do_panel = not (panelvar_name==None)

 av_auto = V.copy()
 for key, value in av_auto.items():
   av_auto[key] = 1
 bgm_db = db.Database(modelname + '_db', pd_df)
 if (do_panel):
   bgm_db.panel(panelvar_name)

 globals().update(bgm_db.variables)
 #logprob = models.loglogit (V , av_auto , bgm_db.variables[tgtvar_name] )
 obsprob = models.logit(V, av_auto, bgm_db.variables[tgtvar_name])
 if (do_panel):
  condprobIndiv = exp.PanelLikelihoodTrajectory(obsprob)
 else:
  condprobIndiv = obsprob
 logprob = exp.log(exp.MonteCarlo(condprobIndiv))
 bgm_model  = bio.BIOGEME(bgm_db,logprob,numberOfDraws=n_draws, seed=seed)
 bgm_model.utility_dic = V.copy()
 return bgm_model, bgm_model.estimate()

---
---

# Estimating a baseline model

In [None]:
ASC_beach = exp.Beta ( 'ASC_beach' ,0, None , None ,0)
ASC_pier = exp.Beta ( 'ASC_pier' ,0, None , None ,0)
ASC_boat = exp.Beta ( 'ASC_boat' ,0, None , None ,0)
ASC_charter = exp.Beta ( 'ASC_charter' ,0, None , None ,1)
B_price = exp.Beta ( 'B_price' ,0, None , None ,0)
B_catch = exp.Beta ( 'B_catch' ,0, None , None ,0)

In [None]:
qbus_update_globals_bgm(data_pd)

In [None]:
V_beach = ASC_beach + B_price*price_beach + B_catch*catch_beach
V_pier = ASC_pier + B_price*price_pier + B_catch*catch_pier
V_boat = ASC_boat + B_price*price_boat + B_catch*catch_boat
V_charter = ASC_charter + B_price*price_charter + B_catch*catch_charter

V_base = {1: V_beach,
     2: V_pier,
     3: V_boat,
     4: V_charter}

In [None]:
model_base, results_base = qbus_estimate_bgm(V_base, data_pd, 'mode', 'fish')



In [None]:
results_base.getEstimatedParameters()

Unnamed: 0,Value,Rob. Std err,Rob. t-test,Rob. p-value
ASC_beach,-1.498888,0.129703,-11.556271,0.0
ASC_boat,-0.627513,0.116828,-5.37125,7.819277e-08
ASC_pier,-1.191833,0.128328,-9.287409,0.0
B_catch,0.377169,0.119247,3.162911,0.001561999
B_price,-0.02479,0.002329,-10.645095,0.0


---
---

# Nested logit Question: Estimate the model
Nested logit, we declare the nest parameter, the parameter that will represent the 'strength' of the nesting.

We use the biogeme expression (same as for the Beta coefficients). In biogeme
the larger the parameter, the stronger the nesting. Its minimum value should be 1, the maximum can be a large enough number such as 100. This comes from the $\lambda$ in the theory part that is between 0 and 1, biogeme use a $\mu = 1/ \lambda$ so it should be a number larger than 1 if $\lambda$ is between 0 and 1.

In [None]:
MU_nest_A = exp.Beta('MU_nest_A', 1, 1, 100, 0)

And the nest structure. In biogeme, this is done via a python tuple of two elements:
1. The first element is the the parameter
2. The second parameter is a python list of ids of the alternatives that are affected by the grouping. **Remember to map the ids to the alternatives correctly!**

In the following cell, we create a grouping for beach (alt. 1), pier (alt. 2) and private boat (alt. 3). This is the grouping that will be affected by the `MU_NONCHART`. Other group, in this case it is only formed by the charter boat.
Since there is only one altenative in the group, we can set its grouping parameter to 1.
Finaly we put all tuples together to create the full grouping specification that is passed to biogeme.

In [None]:
nest_A = MU_nest_A, [1, 2, 3]
nest_B = 1.0, [4]
nests_struct = nest_A, nest_B

Then we will estimate and check the model parameters.

In [None]:
model_nest, results_nest = qbus_estimate_nested_bgm(V_base, data_pd, nests_struct, 'mode', 'data_nest' )



The estimation of the model gives similar coefficients to the paper.

In [None]:
results_nest.getEstimatedParameters()

Unnamed: 0,Value,Rob. Std err,Rob. t-test,Rob. p-value
ASC_beach,-0.391807,0.210031,-1.865477,0.06211454
ASC_boat,-0.063101,0.144733,-0.43598,0.6628515
ASC_pier,-0.241667,0.188909,-1.279281,0.200798
B_catch,0.414444,0.098224,4.21935,2.450072e-05
B_price,-0.014049,0.002377,-5.909227,3.437167e-09
MU_nest_A,2.280203,0.494235,4.613601,3.957525e-06


To check is the nested model is a statistically significant improvement over the multinomial logit version, we can do a likelihood ratio test, as usual, when the complex model is nested. If the only thing that changes is the inclusion of the 'nesting parameter' we test if nesting is useful. We can check several modifications at once, but then we would not be able to sepearate the nesting part form the rest.

In [None]:
qbus_likeli_ratio_test_bgm(results_nest, results_base, 0.05)

LRTuple(message='H0 can be rejected at level 5.0%', statistic=27.92948273622096, threshold=3.841458820694124)

---
---

# Nested logit question: Find/Discuss the nest structure in the data

Apart from knowing how to estimate the nested logit and interpret the results, we need to be able to compare different nesting structures that might be applicable to the data, to pick the one that is better. In practice, sometimes there are several 'reasonable' nesting structures .

In [None]:
MU_nest_B = exp.Beta('MU_nest_B', 1, 1, 100, 0)

In [None]:
alt_nest_A = MU_nest_A, [1, 2, 4]
alt_nest_B = MU_nest_B, [3]
alt_nests_struct = alt_nest_A, alt_nest_B

In [None]:
alt_model_nest, alt_results_nest = qbus_estimate_nested_bgm(V_base, data_pd, alt_nests_struct, 'mode', 'data_nest' )



In [None]:
alt_results_nest.getEstimatedParameters()

Unnamed: 0,Value,Active bound,Rob. Std err,Rob. t-test,Rob. p-value
ASC_beach,-1.234326,0.0,0.3087089,-3.998351,6.378526e-05
ASC_boat,-0.602353,0.0,0.1108114,-5.43584,5.453882e-08
ASC_pier,-0.976821,0.0,0.259727,-3.760953,0.000169267
B_catch,0.375118,0.0,0.1110467,3.378018,0.0007301037
B_price,-0.021707,0.0,0.004288243,-5.061904,4.150904e-07
MU_nest_A,1.209214,0.0,0.2791121,4.332362,1.475183e-05
MU_nest_B,1.0,1.0,1.797693e+308,5.562685e-309,1.0


What happens when comparing the nested logit is that we cannot always use the likelihood ratio test, because the models to compare are not always nested (one is not always is included in the other). When models are very similar, just differing on the nested structure and the same number of parameters, we can compare the likelihoods directly (the larger the better). This will not give us a nice p-value but it can help decide among nest structures.

**However,** it more general if we just use the  Akaike information criteria directly (AIC). Remember that lower means a better model, and in practice is very conservative (adding parameter has to improve the model by 'a lot')..

In [None]:
#Likelihood ratio test is rarely applicable to compare between nests
#qbus_likeli_ratio_test_bgm(alt_results_nest, results_nest, 0.95)

In [None]:
#alt_results_nest.data.logLike

In [None]:
#results_nest.data.logLike

We should go directly for the Akaike information criteria (the lower the better)..

In [None]:
alt_results_nest.data.akaike

2474.0549750251025

In [None]:
results_nest.data.akaike

2445.638178094785

The results from loglik and Akaike are consistent in this case

In [None]:
results_base.data.akaike

2471.567660831006

---
---

# Ordered Logit question: Estimate the model and interpretation

The ordered logit considers only one utility function, not one per alternative.
The main 'challenge' in the ordered logit is realizing that the problem we are dealing with can be modeled with the ordered logit, that the alternatives have some sort of 'natural' order.

In [None]:
ASC_ord = exp.Beta('ASC_ord', 0, None, None, 0)

In [None]:
V_ord = ASC_ord + B_price*price_beach + B_catch*catch_beach + B_price*price_pier + B_catch*catch_pier

We use the auxiliary function `qbus_estimate_ordered_bgm`, notice the argument `[1,3,2,4]` specifying the desired order of the alternatives. In this case, it means we think that the order is:

 beach < private boat < pier < charterboat.

 **Please pay attention to the order that we specify! Otherwise we are incorrectly specify the model.**



In [None]:
qord_model, qord_results = qbus_estimate_ordered_bgm(V_ord, [1,3,2,4], data_pd, 'mode', )



In [None]:
qord_results.getEstimatedParameters()

Unnamed: 0,Value,Rob. Std err,Rob. t-test,Rob. p-value
ASC_ord,0.455413,0.060284,7.554519,4.196643e-14
B_catch,-0.046513,0.17475,-0.26617,0.7901083
B_price,0.000794,0.000289,2.744395,0.006062262
delta_2,1.921623,0.087543,21.950682,0.0
delta_3,0.609231,0.04297,14.177971,0.0
tau1,-1.455413,0.060284,-24.142783,0.0


The interpretation of the coefficients is slightly different because there is only one utility, therefoe, positive numbers that increase utility make the alternatives later in our specific order more likely. The interpretation of the cutoff points tau1 would be the fist cutoff, then tau2 would be tau1+delta_2, tau3 = tau2 + delta_3 and so on. Given the expression of the logit, it is difficult to get some clear intuition on the differences here, it is better to look at the predicions/simuations directly.

In [None]:
qbus_simulate_bgm(qord_model, qord_results.getBetaValues(), data_pd)

Unnamed: 0,1,3,2,4
0,0.103759,0.337879,0.150960,0.407403
1,0.126988,0.371454,0.147900,0.353657
2,0.106971,0.343087,0.150741,0.399201
3,0.126968,0.371428,0.147904,0.353699
4,0.111536,0.350158,0.150301,0.388005
...,...,...,...,...
1177,0.108750,0.345887,0.150587,0.394777
1178,0.096313,0.325031,0.151135,0.427521
1179,0.122575,0.365765,0.148710,0.362950
1180,0.127509,0.372105,0.147799,0.352587


---
---

# Ordered Logit question: Finding the order

The situation when several possible orders seem 'reasonable' we can compare them. It is a bit far-fetched, but it is also simple to test. We can use Akaike.

In [None]:
qord_model_na, qord_results_na = qbus_estimate_ordered_bgm(V_ord, [2,4,3,1], data_pd, 'mode')



In [None]:
qord_results_na.getGeneralStatistics()

{'Number of estimated parameters': GeneralStatistic(value=6, format=''),
 'Sample size': GeneralStatistic(value=1182, format=''),
 'Excluded observations': GeneralStatistic(value=0, format=''),
 'Init log likelihood': GeneralStatistic(value=-1778.7769617307715, format='.7g'),
 'Final log likelihood': GeneralStatistic(value=-1486.9546396886644, format='.7g'),
 'Likelihood ratio test for the init. model': GeneralStatistic(value=583.6446440842142, format='.7g'),
 'Rho-square for the init. model': GeneralStatistic(value=0.16405784891556074, format='.3g'),
 'Rho-square-bar for the init. model': GeneralStatistic(value=0.1606847447383164, format='.3g'),
 'Akaike Information Criterion': GeneralStatistic(value=2985.909279377329, format='.7g'),
 'Bayesian Information Criterion': GeneralStatistic(value=3016.359058565125, format='.7g'),
 'Final gradient norm': GeneralStatistic(value=2.0708735125967265e-07, format='.4E'),
 'Nbr of threads': GeneralStatistic(value=2, format='')}

We see the new order is better than the original, but both orders are really bad ideas...

In [None]:
qord_results_na.data.akaike

2985.909279377329

In [None]:
qord_results.data.akaike

2998.5399576804766

In [None]:
results_base.data.akaike

2471.567660831006

---
---


# Panel and mixed logit questions

We do not have auxiliary functions for the mixed model (yet) so we have to estimate it manually. We will go back to the Swissmetro dataset because it shows a nice result.

# Panel and mixed logit questions: Panel vs Nonpanel

What happens when we include the panel information?

Toggle `do_PANEL` and the Runtime -> Run After from here to check the differences.

**IMPORTANT: We subsample the first 500 rows of the dataset, otherwise we get into numerical problems in the optimization process!**

In [None]:
do_PANEL = True

In [None]:

data_pd = pd.read_csv('http://transp-or.epfl.ch/data/swissmetro.dat', sep='\t').head(500)

#data_pd

The basic cleanup of the Swissmetro dataset, remove invalid choices (0s in the CHOICE variable)

In [None]:
data_pd = data_pd[ data_pd['CHOICE'] > 0]
data_bgm = db.Database("swiss", data_pd)

In [None]:
globals().update(data_bgm.variables)

In [None]:
if (do_PANEL):
 data_bgm.panel("ID")

In [None]:
ASC_CAR = exp.Beta ( 'ASC_CAR' ,0, None , None ,0)
ASC_TRAIN = exp.Beta ( 'ASC_TRAIN' ,0, None , None ,0)
ASC_SM = exp.Beta ( 'ASC_SM' ,0, None , None ,1)
B_TIME = exp.Beta ( 'B_TIME' ,0, None , None ,0)
B_COST = exp.Beta ( 'B_COST' ,0, None , None ,0)

In [None]:
V1 = ASC_TRAIN + B_TIME * TRAIN_TT + B_COST * TRAIN_CO
V2 = ASC_SM + B_TIME * SM_TT + B_COST * SM_CO
V3 = ASC_CAR + B_TIME * CAR_TT + B_COST * CAR_CO

In [None]:
V = {1: V1 ,
2: V2 ,
3: V3 }

In [None]:
av = {1: TRAIN_AV,
2: SM_AV,
3: CAR_AV }

In [None]:
logprob = None
if (do_PANEL):
  obsprob = models.logit(V,av, CHOICE)
  condprobIndiv = exp.PanelLikelihoodTrajectory(obsprob)
  logprob = exp.log((condprobIndiv))
else:
  logprob = models.loglogit (V , av , CHOICE )

bgm_model = bio.BIOGEME ( data_bgm, logprob )
results = bgm_model.estimate()



In [None]:
results.getEstimatedParameters()

Unnamed: 0,Value,Rob. Std err,Rob. t-test,Rob. p-value
ASC_CAR,-2.842717,0.825747,-3.442599,0.0005761535
ASC_TRAIN,-1.694028,0.22997,-7.366289,1.754152e-13
B_COST,-0.000131,0.000239,-0.550126,0.5822333
B_TIME,0.001569,0.000739,2.122723,0.03377707


---
---

# Mixed logit question: Estimation

In [None]:
SIGMA_B_COST = exp.Beta('SIGMA_B_COST',0.5,0.00001,None,0)

In [None]:
EC_B_COST = SIGMA_B_COST * exp.bioDraws('EC_B_COST','NORMAL')

In [None]:
m_V1 = ASC_TRAIN + B_TIME * TRAIN_TT + (B_COST + EC_B_COST )  * TRAIN_CO
m_V2 = ASC_SM + B_TIME * SM_TT + (B_COST + EC_B_COST ) * SM_CO
m_V3 = ASC_CAR + B_TIME * CAR_TT + (B_COST+ EC_B_COST ) * CAR_CO

In [None]:
m_V = {1: m_V1, 2:m_V2, 3:m_V3}

In [None]:
m_obsprob = models.logit(m_V, av, CHOICE)
m_condprobIndiv = None

In [None]:
if (do_PANEL):
 m_condprobIndiv = exp.PanelLikelihoodTrajectory(m_obsprob)
else:
 m_condprobIndiv = m_obsprob

In [None]:
#condprobIndiv = exp.PanelLikelihoodTrajectory(obsprob)

And Step 2 we take the model and then modify it by the expresion `exp.MonteCarlo`. The final log it to take the loglikelihood.

In [None]:
m_logprob = exp.log(exp.MonteCarlo(m_condprobIndiv))

We  are using simulation,  so we have to tell biogeme how many draws from the distribution are we going to generate. The more draws, the more accurate estimation, but it is compuationally costly.

We also set up a seed, so we can get the same results if the run the notebook again (setting up a seed is a good habit in general)

In [None]:

# Create the Biogeme object
m_biogeme  = bio.BIOGEME(data_bgm,m_logprob,numberOfDraws=50, seed=1)




In [None]:
results_mixed = m_biogeme.estimate()



In [None]:
results_mixed.getEstimatedParameters()

Unnamed: 0,Value,Rob. Std err,Rob. t-test,Rob. p-value
ASC_CAR,-3.099354,0.724967,-4.275167,1.9e-05
ASC_TRAIN,-2.02906,0.236163,-8.591762,0.0
B_COST,-0.00346,0.001993,-1.736249,0.08252
B_TIME,0.001224,0.000872,1.404523,0.160163
SIGMA_B_COST,0.033324,0.021567,1.545143,0.122312


---
---

# Mixed Logit question: Interpretation of the distribution of a parameter

* The main interpretation is the understanding that parameters are 'random' so the output is an estimation of the probability distribution of the parameters, in this case COST follows a normal (approximately, for the panel version) $N(-0.000318, 0.001195^2)$.

* Secondary interpretations are doing 'something' with the distribution.
 For example: **What percentage of people in the population have a cost parameter that is positive?**

In [None]:
from scipy.stats import norm

1 - norm.cdf(0, results_mixed.getBetaValues()['B_COST'], results_mixed.getBetaValues()['SIGMA_B_COST'])

0.45864864585512044