A JAX-first Python package for hierarchical model fitting and metaheuristic-driven model structure search. Supports count, CMF, duration, and linear models with random parameters, latent classes, zero-inflation, and heterogeneity in means — all with one unified API.
- Install
- Notebooks — Start Here
- Quick Start
- What the package does
- Data loaders
- ExperimentBuilder API
- ModelConstraints API
- Role codes
- Search algorithms
- Model families
- Latent class models
- Output and saving results
- Help system
- Running on HPC clusters
pip install metacountregressor
pip install jax jaxlib jaxopt # JAX backendQuick import check:
python -c "from metacountregressor import __version__, load_example16_3_raw_data; print(__version__, load_example16_3_raw_data().shape)"The fastest way to learn the package is to open the notebooks in order. Each one builds on the previous and uses the bundled Example 16-3 crash-frequency data so you can run everything without sourcing your own dataset.
| # | Notebook | What you learn |
|---|---|---|
| 00 | 00_quickstart.ipynb | Install, load data, first search run in under 10 minutes |
| 01 | 01_crash_frequency_search.ipynb | Mixed Negative Binomial search — constraints, roles, re-fit |
| 02 | 02_latent_class_fc_validation.ipynb | 2-class LC model — fit, extract class probabilities, validate against FC |
| 03 | 03_cmf_aadt_search.ipynb | CMF model — baseline + AADT-interaction structure search |
| 04 | 04_linear_speed_prediction.ipynb | Gaussian linear model search (speed prediction) |
| 05 | 05_batch_script_tutorial.ipynb | Batch scripts, parallel runs, PBS/SLURM HPC job templates |
Tip: open with
jupyter lab metacountregressor/metacountregressor/templates/to browse all notebooks together.
import numpy as np
from metacountregressor import (
ExperimentBuilder,
ModelConstraints,
SearchOutputConfig,
load_example16_3_model_data,
get_help,
)
# ── 1. Load the bundled crash-frequency dataset ──────────────────────────────
df = load_example16_3_model_data()
exposure = df['LENGTH'] * df['AADT'] * 365 / 1e8
df['OFFSET'] = np.log(exposure.clip(lower=1e-9))
# ── 2. Build constraints ──────────────────────────────────────────────────────
c = (
ModelConstraints()
.force_include('OFFSET')
.no_zi('LENGTH', 'CURVES', 'WIDTH', 'SLOPE')
.no_random('URB')
.allow_random('CURVES', distributions=['lognormal'])
)
# ── 3. Create the experiment ──────────────────────────────────────────────────
builder = ExperimentBuilder(df, id_col='ID', y_col='FREQ', offset_col='OFFSET')
builder.describe() # print data summary
get_help('crash_frequency') # print end-to-end workflow guide
# ── 4. Build the structure evaluator ─────────────────────────────────────────
evaluator = builder.build_evaluator(
variables=['AADT', 'LENGTH', 'SPEED', 'CURVES', 'URB', 'AVEPRE'],
constraints=c,
default_roles=[0, 1, 2, 3, 5],
max_latent_classes=1,
R=200,
)
# ── 5. Run the search ─────────────────────────────────────────────────────────
result = builder.run(
evaluator,
algo='sa', # 'sa' | 'de' | 'hs'
max_iter=1000,
seed=42,
output_config=SearchOutputConfig(output_dir='results', experiment_name='demo'),
)
print('Best BIC:', result.best_score)
print('Saved to:', result.saved_to)
# ── 6. Re-fit with more draws ─────────────────────────────────────────────────
fit = builder.fit_manual_model(manual_spec=result.best_spec, model='nb', R=500)
print(fit)metacountregressor solves two related problems:
-
Structure search — automatically discover which variables to include, whether each coefficient should be fixed or random, and whether the model needs latent classes, zero-inflation, or heterogeneity in means. The search is driven by metaheuristic algorithms (SA, DE, HS) that minimise BIC.
-
Model estimation — fit the discovered (or manually specified) model structure using JAX-accelerated simulation-based maximum likelihood with Halton draws.
The same API handles crash-frequency count models, CMF (Crash Modification Factor) models, duration models, and linear (Gaussian) models.
All loaders return a pandas.DataFrame.
from metacountregressor import (
load_example16_3_raw_data, # Example 16-3: original 31 columns
load_example16_3_model_data, # + OFFSET, FC_ENCODED, FC_LABEL
load_example_crash_data, # alias for load_example16_3_model_data
load_example_duration_data, # synthetic duration target from Ex 16-3
load_example_linear_data, # synthetic linear target from Ex 16-3
load_example_platform_speed_data, # speed relative to platform
load_example_platform_gap_duration_data, # time until next speeding event
load_example_panel_data, # panel-structure example
)load_example16_3_raw_data() returns the original source columns:
| Group | Columns |
|---|---|
| Identifiers | ID |
| Outcome | FREQ |
| Geometry | LENGTH, WIDTH, INCLANES, DECLANES, MEDWIDTH, MIMEDSH, MXMEDSH |
| Speed / grade | SPEED, MIGRADE, MXGRADE, MXGRDIFF, SLOPE |
| Traffic | AADT, SINGLE, DOUBLE, TRAIN, PEAKHR, ADTLANE |
| Road class | URB, FC, ACCESS, TANGENT, CURVES, MINRAD, GRADEBR |
| Friction / weather | FRICTION, INTECHAG, AVEPRE, AVESNOW |
load_example16_3_model_data() adds OFFSET, FC_ENCODED, FC_LABEL.
from metacountregressor import ExperimentBuilder
builder = ExperimentBuilder(
df=df,
id_col='ID', # required — observation identifier
y_col='FREQ', # required — outcome variable
offset_col='OFFSET', # optional — log-exposure offset (count models)
group_id_col='FC', # optional — group/panel identifier
)| Method | Purpose |
|---|---|
builder.describe() |
Print data summary: N, outcome stats, variable types |
builder.suggest_config(max_latent_classes=2) |
Print recommended ExperimentBuilder settings |
builder.build_evaluator(...) |
Build a structure evaluator (see below) |
builder.build_count_evaluator(...) |
Shortcut for count models |
builder.run(evaluator, algo, max_iter, seed, ...) |
Run metaheuristic search |
builder.run_search(evaluator, ...) |
Alias for run() |
builder.make_manual_spec(...) |
Build a model spec dict manually |
builder.fit_manual_model(manual_spec, model, R) |
Fit a manually specified structure |
builder.compute_latent_class_probabilities(fit, true_class_col) |
Get class membership probabilities |
ExperimentBuilder.get_family_capabilities() |
Static: list supported model families |
ExperimentBuilder.get_search_argument_guide() |
Static: full argument documentation |
evaluator = builder.build_evaluator(
variables=['AADT', 'LENGTH', 'SPEED', 'CURVES'], # candidate columns
constraints=c, # ModelConstraints object
model_family='count', # 'count' | 'cmf' | 'duration' | 'linear'
default_roles=[0, 1, 2, 3, 5], # roles the search may assign
max_latent_classes=2, # 1 = standard, 2 = allow LC
mode='single', # 'single' = minimise BIC
R=200, # Halton simulation draws
# CMF-only arguments:
aadt_col='AADT',
baseline_vars=['URB', 'ACCESS'],
local_vars=['CURVES', 'WIDTH'],
# Duration-only:
budget_col='AADT',
)ModelConstraints restricts which roles and distributions each variable may take.
All methods return self for chaining.
from metacountregressor import ModelConstraints
c = (
ModelConstraints()
.force_include('OFFSET') # cannot be excluded
.force_fixed('AADT') # only fixed or excluded
.no_zi('LENGTH', 'CURVES', 'SLOPE', 'WIDTH') # cannot be ZI term
.no_random('URB', 'GRADEBR') # no random parameter
.allow_random('CURVES', distributions=['lognormal']) # restrict distribution
.membership_only('FC_ENCODED') # drives class prob only
.allow_membership('SPEED') # may also enter membership
.outcome_only('AADT') # no membership role
.exclude('YEAR', 'ID') # removed from search
.set_roles('WIDTH', [0, 1, 2]) # low-level override
)
print(c) # display all constraints
c.summary() # same as print(c)Get detailed API documentation:
from metacountregressor import get_help
get_help('constraints')| Code | Name | Description |
|---|---|---|
0 |
Excluded | Variable not in the model |
1 |
Fixed | Same coefficient for every observation |
2 |
Random (ind.) | Individual random effect, independent draws |
3 |
Random (corr.) | Individual random effect, correlated with others |
4 |
Grouped | Group-level random effect (shared within group) |
5 |
Heterogeneity | Explains variation in random-parameter means |
6 |
Zero Inflation | Enters the zero-inflation probability equation |
7 |
Membership only | Drives latent-class probability — not the outcome |
8 |
Membership + Fixed | Drives class membership AND has class-specific outcome effect |
Random-parameter distributions: normal, lognormal, triangular, uniform.
get_help('roles') # full reference with examples| Alias | Algorithm | Best for |
|---|---|---|
'sa' |
Simulated Annealing | Robust default — escapes local minima via cooling schedule |
'de' |
Differential Evolution | Thorough population-based search — use when SA converges early |
'hs' |
Harmony Search | Fast initial convergence — good for a quick first pass |
# Run the same evaluator with different algorithms
result_sa = builder.run(evaluator, algo='sa', max_iter=2000, seed=42)
result_de = builder.run(evaluator, algo='de', max_iter=2000, seed=42)
result_hs = builder.run(evaluator, algo='hs', max_iter=2000, seed=42)get_help('metaheuristics') # full parameter referenceevaluator = builder.build_count_evaluator(
variables=['AADT', 'LENGTH', 'SPEED', 'CURVES', 'URB', 'AVEPRE'],
constraints=c,
default_roles=[0, 1, 2, 3, 5],
max_latent_classes=1,
R=200,
)
result = builder.run(evaluator, algo='sa', max_iter=2000, seed=42)
fit = builder.fit_manual_model(manual_spec=result.best_spec, model='nb', R=500)Manual spec:
spec = builder.make_manual_spec(
fixed_terms=['AADT', 'LENGTH', 'SPEED'],
rdm_terms=['CURVES:normal'],
rdm_cor_terms=['TANGENT:normal', 'SLOPE:lognormal'],
hetro_in_means=['AVEPRE'],
zi_terms=['ACCESS'],
membership_terms=['URB'],
dispersion=1,
latent_classes=2,
)
fit = builder.fit_manual_model(manual_spec=spec, model='nb', R=200)from metacountregressor import CMFExperimentBuilder
cmf = CMFExperimentBuilder(
df=df,
y_col='FREQ',
aadt_col='AADT',
baseline_vars=['URB', 'ACCESS', 'GRADEBR', 'CURVES'],
local_vars=['CURVES', 'WIDTH'],
)
# Route A: full JAX flexibility (random params, LC, ZI)
builder_jax, evaluator_jax, meta = cmf.build_jax_count_evaluator(
id_col='ID', offset_col='OFFSET', constraints=c, max_latent_classes=1, R=200)
result = builder_jax.run(evaluator_jax, algo='sa', max_iter=500, seed=42)
# Route B: classic GA search (fast, two-component structure)
search = cmf.run_search(R=200)
fit = cmf.fit_best_model(search, final_R=500)
cmf.print_report(search, fit)get_help('cmf') # full workflow guidefrom metacountregressor import load_example_duration_data
duration_df = load_example_duration_data()
duration_builder = ExperimentBuilder(
df=duration_df, id_col='ID', y_col='DURATION', group_id_col='FC')
evaluator = duration_builder.build_evaluator(
variables=['WIDTH', 'CURVES', 'SLOPE', 'URB', 'FC_ENCODED'],
model_family='duration',
default_roles=[0, 1, 2, 3],
max_latent_classes=1, R=200,
)
result = duration_builder.run(evaluator, algo='sa', max_iter=500, seed=42)
fit = duration_builder.fit_manual_model(manual_spec=result.best_spec,
model='lognormal', R=500)from metacountregressor import load_example_platform_speed_data
speed_df = load_example_platform_speed_data()
speed_builder = ExperimentBuilder(
df=speed_df, id_col='PLATFORM_ID', y_col='SPEED', offset_col=None)
evaluator = speed_builder.build_evaluator(
variables=['DIST_TO_PLATFORM', 'POSTED_SPEED', 'APPROACH_ACCEL',
'PLATFORM_HEIGHT', 'PLATFORM_WIDTH'],
model_family='linear',
default_roles=[0, 1, 2, 3], # no ZI for linear
max_latent_classes=1, R=200,
)
result = speed_builder.run(evaluator, algo='sa', max_iter=500, seed=42)
fit = speed_builder.fit_manual_model(manual_spec=result.best_spec,
model='gaussian', R=500)# 1. Constrain FC_ENCODED to drive class membership only
c = (
ModelConstraints()
.membership_only('FC_ENCODED')
.force_include('OFFSET')
.no_zi('LENGTH', 'CURVES', 'WIDTH', 'SLOPE')
.no_random('URB', 'GRADEBR')
)
# 2. Build LC evaluator (max_latent_classes=2, include roles 7 & 8)
evaluator = builder.build_evaluator(
variables=['URB', 'ACCESS', 'GRADEBR', 'CURVES', 'LENGTH',
'SPEED', 'WIDTH', 'SLOPE', 'AVEPRE', 'FC_ENCODED'],
constraints=c,
default_roles=[0, 1, 2, 3, 5, 7, 8],
max_latent_classes=2,
R=150,
)
# 3. Run search
result = builder.run(evaluator, algo='sa', max_iter=500, seed=1)
# 4. Manually fit a specific structure
spec = builder.make_manual_spec(
fixed_terms=['AADT', 'SPEED', 'LENGTH'],
rdm_cor_terms=['CURVES:normal', 'SLOPE:normal'],
hetro_in_means=['AVEPRE'],
membership_terms=['URB', 'ACCESS', 'GRADEBR'],
dispersion=1, latent_classes=2,
)
fit = builder.fit_manual_model(manual_spec=spec, model='nb', R=200)
# 5. Extract class membership probabilities
class_probs = builder.compute_latent_class_probabilities(
fit, true_class_col='FC_ENCODED')
print(class_probs.head())
# 6. Compare predicted class vs actual FC
class_probs['predicted'] = (
class_probs[['class_1_prob', 'class_2_prob']].to_numpy().argmax(axis=1))
agreement = (class_probs['predicted'] == class_probs['FC_ENCODED']).mean()
print(f'Agreement with FC: {agreement:.1%}')Pre-specified reference model:
from metacountregressor import (
load_book_latent_class_spec, describe_book_latent_class_spec)
describe_book_latent_class_spec()
spec = load_book_latent_class_spec()
fit = builder.fit_manual_model(manual_spec=spec, model='nb', R=200)get_help('latent_class') # full workflow guidefrom metacountregressor import SearchOutputConfig
output_config = SearchOutputConfig(
output_dir='results',
experiment_name='example16_3_count',
search_description='NB count model search on Example 16-3',
save_json=True,
)
result = builder.run(evaluator, algo='sa', max_iter=2000,
output_config=output_config)
print('Saved to:', result.saved_to)Each saved JSON contains: experiment name, description, model family, algorithm, best BIC, and the best structural specification.
Collect results from multiple runs:
import json, pathlib
results = sorted(
[json.load(open(f)) for f in pathlib.Path('results').glob('*.json')],
key=lambda r: r.get('best_score', float('inf'))
)
print('Best BIC:', results[0]['best_score'])
print('Algorithm:', results[0]['algorithm'])The package includes a built-in interactive help system:
from metacountregressor import get_help
get_help() # list all topics
get_help('roles') # role code table + distributions
get_help('constraints') # ModelConstraints API
get_help('metaheuristics') # algorithm comparison and parameters
get_help('crash_frequency') # count model workflow
get_help('latent_class') # latent class workflow
get_help('cmf') # CMF workflow
get_help('linear') # linear model workflow
get_help('duration') # duration model workflow
get_help('batch') # batch script and HPC guideOn PBS/Torque or SLURM, the package reads the scheduler walltime automatically and uses it as a max_time limit — the search stops cleanly before the job is killed.
| Scheduler | Environment variable | Format |
|---|---|---|
| PBS/Torque | PBS_WALLTIME |
HH:MM:SS |
| SLURM | SLURM_TIME_LIMIT |
seconds or HH:MM:SS |
Set manually for local testing:
result = builder.run(evaluator, algo='sa', max_iter=99999, max_time=3600)#!/bin/bash
#PBS -N metacount_sa
#PBS -l nodes=1:ppn=4
#PBS -l walltime=04:00:00
#PBS -l mem=16gb
#PBS -j oe
#PBS -o logs/sa_seed42.log
module load python/3.11
cd $PBS_O_WORKDIR
source venv/bin/activate
# Walltime auto-detected from PBS_WALLTIME
python run_experiment.py sa 42 200 99999#!/bin/bash
#SBATCH --job-name=metacount
#SBATCH --nodes=1
#SBATCH --ntasks=4
#SBATCH --time=04:00:00
#SBATCH --mem=16G
#SBATCH --output=logs/%j.log
#SBATCH --array=1-10
module load python/3.11
source venv/bin/activate
python run_experiment.py sa $SLURM_ARRAY_TASK_ID 200 99999See 05_batch_script_tutorial.ipynb for a complete worked example including a reusable run_experiment.py template and result-collection scripts.
get_help('batch') # inline guide