In [1]:
# Eventuellement faire !pip install -r ../requirements.txt ou !pip install biogeme

**Model 0**


In [9]:
import biogeme.database as db
import biogeme.biogeme as bio
from biogeme import models
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from biogeme.expressions import Beta, Variable, log, exp
from biogeme.results_processing import get_pandas_estimated_parameters, html_output

blablabla

In [3]:
DATA_FOLDER = 'data/'

In [4]:
# Load the .dat file into a pandas DataFrame
data = pd.read_csv(DATA_FOLDER + 'lpmc09.dat', sep='\t')

In [8]:
data.head()

Unnamed: 0,trip_id,household_id,person_n,trip_n,travel_mode,purpose,fueltype,faretype,bus_scale,survey_year,...,dur_pt_access,dur_pt_rail,dur_pt_bus,dur_pt_int,pt_interchanges,dur_driving,cost_transit,cost_driving_fuel,cost_driving_ccharge,driving_traffic_percent
0,13,1,1,1,4,3,1,5,0.0,1,...,0.241389,0.0,0.122222,0.0,0,0.132222,0.0,0.5,0.0,0.065126
1,43,10,0,0,3,3,6,1,1.0,1,...,0.072778,0.0,0.344722,0.120556,1,0.167778,3.0,0.44,0.0,0.145695
2,46,12,0,0,4,3,1,5,0.0,1,...,0.136389,0.0,0.070278,0.0,0,0.072222,0.0,0.24,0.0,0.107692
3,53,12,1,3,4,3,2,1,1.0,1,...,0.0825,0.0,0.061944,0.0,0,0.0625,1.5,0.17,0.0,0.124444
4,65,13,1,3,3,5,1,5,0.0,1,...,0.050833,0.216667,0.590556,0.237778,2,0.863889,0.0,2.6,0.0,0.675884


# **Part 1** - Model 0 [1.5 points]

#### Develop a model specification that includes alternative specific constant, cost and travel time for each alternative. Cost and travel time are associated with generic parameters. Present both the specification (i.e., the utility functions) and the estimation results (parameter values, t-tests or p-values, null and final log-likelihoods). [1 point]

In [10]:
# Create the Biogeme database
database = db.Database("lpmc09", data)

# Define the variables
travel_mode = Variable('travel_mode')
dur_walking = Variable('dur_walking')
dur_cycling = Variable('dur_cycling')
dur_pt_bus = Variable('dur_pt_bus')
dur_pt_access = Variable('dur_pt_access')
dur_pt_int = Variable('dur_pt_int')
dur_driving = Variable('dur_driving')
cost_transit = Variable('cost_transit')
cost_driving_fuel = Variable('cost_driving_fuel')
cost_driving_ccharge = Variable('cost_driving_ccharge')

In [11]:
# Create new variables
dur_pt_tot = dur_pt_bus + dur_pt_access + dur_pt_int
cost_drive = cost_driving_fuel + cost_driving_ccharge

In [12]:
# Define the ASC to be estimated
asc_pt = Beta('asc_pt', 0, None, None, 0)
asc_cycling = Beta('asc_cycling', 0, None, None, 0)
asc_driving = Beta('asc_driving', 0, None, None, 0)

# Define the Betas to be estimated
beta_cost = Beta('beta_cost', 0, None, None, 0)
beta_time = Beta('beta_time', 0, None, None, 0)

In [13]:
# Define the utility functions
v_walking = dur_walking * beta_time
v_cycling = asc_cycling + dur_cycling * beta_time
v_pt = asc_pt + dur_pt_tot * beta_time + cost_transit * beta_cost
v_drive = asc_driving + dur_driving * beta_time + cost_drive * beta_cost

In [14]:
# Define the association between alternatives and utility functions
V = {1: v_walking,
     2: v_cycling,
     3: v_pt,
     4: v_drive}
logprob = models.loglogit(V, None, travel_mode)

In [15]:
# Initialisation of the Biogeme object
biogeme = bio.BIOGEME(database, logprob)
biogeme.model_name = 'model_0'

File biogeme.toml has been created


In [16]:
results = biogeme.estimate()
print(results.print_general_statistics())

Number of estimated parameters             5
Sample size                                5000
Excluded observations                      0
Init log likelihood                        -6931.472
Final log likelihood                       -4552.633
Likelihood ratio test for the init. model  4757.678
Rho-square for the init. model             0.343
Rho-square-bar for the init. model         0.342
Akaike Information Criterion               9115.265
Bayesian Information Criterion             9147.851
Final gradient norm                        1.8474E-05
Bootstrapping time                         None


In [17]:
get_pandas_estimated_parameters(estimation_results=results)

Unnamed: 0,Name,Value,Robust std err.,Robust t-stat.,Robust p-value
0,beta_time,-4.374888,0.164549,-26.587163,0.0
1,asc_cycling,-3.351221,0.099973,-33.521236,0.0
2,asc_pt,-0.641355,0.058471,-10.968706,0.0
3,beta_cost,-0.137783,0.013934,-9.888477,0.0
4,asc_driving,-0.735068,0.067722,-10.854174,0.0


#### Comment on the estimation results (statistical significance and sign of all parameters). [0.5 point]


#### **Overall Model Fit**
- **Log-likelihood**: The initial log-likelihood is **-6931.472**, and the final log-likelihood is **-4552.633**. The improvement suggests the model fits the data better after estimation.
- **Likelihood Ratio Test**: The value is **4757.678**, which is high, indicating a significant improvement in model fit compared to the null model (no predictors).
- **Rho-square (Pseudo R²)**: Both the **Rho-square for the init. model (0.343)** and the **Rho-square-bar for the init. model (0.342)** suggest that about 34% of the variation in the dependent variable is explained by the model, which is a moderate level of explanatory power.
- **AIC and BIC**: The **Akaike Information Criterion (9115.265)** and **Bayesian Information Criterion (9147.851)** are provided for model comparison. Lower values indicate better fit, but without comparison to other models, their absolute values are less interpretable.

#### **Interpretation of Parameter Signs**
- **beta_time (-4.37)**: The negative sign indicates that as the time variable increases, the utility decreases. This is intuitive: people generally prefer options that take less time.
- **asc_cycling (-3.35)**: The negative sign for the alternative-specific constant (ASC) for cycling suggests that, all else being equal, individuals have a lower inherent preference for cycling compared to the base alternative.
- **asc_pt (-0.64)**: The negative ASC for public transport (PT) indicates a lower inherent preference for public transport compared to the base alternative.
- **beta_cost (-0.14)**: The negative sign for cost means that as the cost increases, the utility decreases. This is expected, as people prefer cheaper options.
- **asc_driving (-0.74)**: The negative ASC for driving suggests a lower inherent preference for driving compared to the base alternative.

#### **Statistical Significance**
- **Robust p-values**: All parameters have **p-values of 0.0**, which means they are **statistically significant** at any conventional level.
- **Robust t-statistics**: All t-statistics are far from zero (absolute values much greater than 2), further confirming the statistical significance of each parameter.