# CIV1538 Tutorial: Ordered Regression Models
### Sanjana Hossain, Patrick Loa, Felita Ong

February 6, 2024

## Example of an ordered logit model 

**Dataset:** uber_survey_data.csv \
**Dependent variable:** RHFreq

Biogeme website: https://biogeme.epfl.ch/

1. Load the required packages

In [1]:
import pandas as pd                                                    
import biogeme.database as db                                          
import biogeme.biogeme as bio                                         
import biogeme.distributions as dist                                  
from biogeme.expressions import Beta, Variable, log, Elem

2. Loading (and manipulating) the data

In [2]:
# Read the data from the csv file
df = pd.read_csv("uber_survey_data.csv") 

# Prepare the database for biogeme
data = db.Database("uber_survey_data",df)   

In [3]:
# Allow the names of the columns in the dataset to be treated as variable names
globals().update(data.variables)

3. Specify the parameters to be estimated

In [4]:
B_age = Beta('B_age', 0, None, None, 0)
B_male = Beta('B_male', 0, None, None, 0)

# Parameters for the ordered logit
mu1 = Beta('mu1', -1, None, 0, 0)

delta2 = Beta('delta2', 2, 0, None, 0)
mu2 = mu1 + delta2

delta3 = Beta('delta3', 4, 0, None, 0)
mu3 = mu1 + delta2 + delta3

delta4 = Beta('delta4', 6, 0, None, 0)
mu4 = mu1 + delta2 + delta3 + delta4

delta5 = Beta('delta5', 7, 0, None, 0)
mu5 = mu1 + delta2 + delta3 + delta4 + delta5

4. Definining the utility function

In [5]:
U = B_age * Age + B_male * Male

5. Associate each discrete indicator with an interval

| RHFreq | Interval |
|--------|-------------|
| 0 | - ∞ → mu1 |
| 1 | mu1 → mu2 |
| 2 | mu2 → mu3 |
| 3 | mu3 → mu4 |
| 4 | mu4 → mu5 |
| 5 | mu5 → + ∞ |

In [6]:
#   0: -infinity -> mu1
#   1: mu1 -> mu2
#   2: mu2 -> mu3
#   3: mu3 -> mu4
#   4: mu4 -> mu5
#   5: mu5 -> +infinity
ChoiceProba = {
    0: 1 - dist.logisticcdf(U - mu1),
    1: dist.logisticcdf(U - mu1) - dist.logisticcdf(U - mu2),
    2: dist.logisticcdf(U - mu2) - dist.logisticcdf(U - mu3),
    3: dist.logisticcdf(U - mu3) - dist.logisticcdf(U - mu4),
    4: dist.logisticcdf(U - mu4) - dist.logisticcdf(U - mu5),
    5: dist.logisticcdf(U - mu5)}

6. Define the choice probability and the contribution of each observation to the likelihood function

In [7]:
logprob = log(Elem(ChoiceProba, RHFreq))

7. Estimate the parameters

In [8]:
# Create the Biogeme object
the_biogeme = bio.BIOGEME(data, logprob)

# Name the model
the_biogeme.modelName = 'Ordered Logit Model'

# Apply the estimate() method
results = the_biogeme.estimate()

8. Print the outputs

In [9]:
# Display the estimated parameters
results.getEstimatedParameters(onlyRobust=False)

Unnamed: 0,Value,Std err,t-test,p-value,Rob. Std err,Rob. t-test,Rob. p-value
B_age,-0.053224,0.004539,-11.726415,0.0,0.004462,-11.927598,0.0
B_male,0.33119,0.131689,2.514932,0.01190553,0.133495,2.480909,0.01310477
delta2,0.352545,0.042577,8.280194,2.220446e-16,0.042518,8.291761,2.220446e-16
delta3,0.750639,0.060899,12.325939,0.0,0.061091,12.287284,0.0
delta4,1.189586,0.088207,13.486317,0.0,0.088714,13.409203,0.0
delta5,1.426991,0.156735,9.104495,0.0,0.156652,9.109318,0.0
mu1,-2.346957,0.206241,-11.37967,0.0,0.206484,-11.36627,0.0


In [10]:
# results.data.betaNames
# results.data.betas[0].tTest

In [11]:
# Print the goodness-of-fit statistics 
summary = results.getGeneralStatistics()

for key, value in summary.items():
    print(key, ":", value[0])

Number of estimated parameters : 7
Sample size : 860
Excluded observations : 0
Init log likelihood : -1218.477425585327
Final log likelihood : -1218.477425585327
Likelihood ratio test for the init. model : -0.0
Rho-square for the init. model : 0.0
Rho-square-bar for the init. model : -0.005744874589397764
Akaike Information Criterion : 2450.954851170654
Bayesian Information Criterion : 2484.2533778953866
Final gradient norm : 0.0009487598229823916
Nbr of threads : 8


## Example of an ordered probit model 

**Dataset:** uber_survey_data.csv \
**Dependent variable:** RHFreq

1. Load the required packages

In [12]:
import pandas as pd                                                    
import biogeme.database as db                                          
import biogeme.biogeme as bio                                         
import biogeme.distributions as dist                                  
from biogeme.expressions import Beta, log, Elem, bioNormalCdf 

2. Loading (and manipulating) the data

In [13]:
# Read the data from the csv file
df = pd.read_csv("uber_survey_data.csv") 

# Prepare the database for biogeme
data = db.Database("uber_survey_data",df) 

In [14]:
# Allow the names of the columns in the dataset to be treated as variable names
globals().update(data.variables)

3. Specify the models to be estimated

In [15]:
B_age = Beta('B_age', 0, None, None, 0)
B_male = Beta('B_male', 0, None, None, 0)

# Parameters for the ordered logit
mu1 = Beta('mu1', -1, None, 0, 0)

delta2 = Beta('delta2', 2, 0, None, 0)
mu2 = mu1 + delta2

delta3 = Beta('delta3', 4, 0, None, 0)
mu3 = mu1 + delta2 + delta3

delta4 = Beta('delta4', 6, 0, None, 0)
mu4 = mu1 + delta2 + delta3 + delta4

delta5 = Beta('delta5', 7, 0, None, 0)
mu5 = mu1 + delta2 + delta3 + delta4 + delta5

4. Definining the utility function

In [16]:
U = B_age * Age + B_male * Male

5. Associate each discrete indicator with an interval

| RHFreq | Interval |
|--------|-------------|
| 0 | - ∞ → mu1 |
| 1 | mu1 → mu2 |
| 2 | mu2 → mu3 |
| 3 | mu3 → mu4 |
| 4 | mu4 → mu5 |
| 5 | mu5 → + ∞ |

In [17]:
#   0: -infinity -> mu1
#   1: mu1 -> mu2
#   2: mu2 -> mu3
#   3: mu3 -> mu4
#   4: mu4 -> mu5
#   5: mu5 -> +infinity
ChoiceProba = {
    0: 1 - bioNormalCdf(U - mu1),
    1: bioNormalCdf(U - mu1) - bioNormalCdf(U - mu2),
    2: bioNormalCdf(U - mu2) - bioNormalCdf(U - mu3),
    3: bioNormalCdf(U - mu3) - bioNormalCdf(U - mu4),
    4: bioNormalCdf(U - mu4) - bioNormalCdf(U - mu5),
    5: bioNormalCdf(U - mu5)}

6. Define the choice probability and the contribution of each observation to the likelihood function

In [18]:
logprob = log(Elem(ChoiceProba, RHFreq))

7. Estimate the parameters

In [19]:
# Create the Biogeme object
biogeme = bio.BIOGEME(data, logprob)

# Name the model
biogeme.modelName = 'Ordered Probit Model'

# Apply the estimate() method
results = biogeme.estimate()

8. Print the outputs

In [20]:
# Display the estimated parameters
results.getEstimatedParameters(onlyRobust=False)

Unnamed: 0,Value,Std err,t-test,p-value,Rob. Std err,Rob. t-test,Rob. p-value
B_age,-0.031751,0.002621,-12.115185,0.0,0.002611,-12.158936,0.0
B_male,0.200478,0.077413,2.589729,0.009605,0.078137,2.565715,0.010296
delta2,0.213466,0.025657,8.319855,0.0,0.025638,8.326099,0.0
delta3,0.454249,0.036433,12.468023,0.0,0.036533,12.433778,0.0
delta4,0.688732,0.049427,13.934245,0.0,0.049998,13.775299,0.0
delta5,0.722229,0.073629,9.809084,0.0,0.074023,9.756767,0.0
mu1,-1.394786,0.119809,-11.64171,0.0,0.120693,-11.556433,0.0


In [21]:
# Print the goodness-of-fit statistics 
summary = results.getGeneralStatistics()

for key, value in summary.items():
    print(key, ":", value[0])

Number of estimated parameters : 7
Sample size : 860
Excluded observations : 0
Init log likelihood : -1219.0759815246795
Final log likelihood : -1219.0759815246795
Likelihood ratio test for the init. model : -0.0
Rho-square for the init. model : 0.0
Rho-square-bar for the init. model : -0.005742053904831401
Akaike Information Criterion : 2452.151963049359
Bayesian Information Criterion : 2485.4504897740917
Final gradient norm : 0.00448312491069844
Nbr of threads : 8
