# Estimating Auto Ownership

This notebook illustrates how to re-estimate ActivitySim's auto ownership model.  The steps in the process are:
  - Run ActivitySim in estimation mode to read household travel survey files, run the households through the auto ownership model step, and write an estimation data bundle (EDB) that contains the model utility specifications, coefficients, chooser data, and alternatives data.
  - Read and transform the EDB into the format required by the model estimation package [larch](https://larch.newman.me) and then re-estimate the model coefficients.  No changes to the model specification will be made.
  - Update the ActivitySim model coefficients and re-run the model in simulation mode.
  
The basic estimation workflow is shown below and explained in the next steps.

![estimation workflow](https://github.com/RSGInc/activitysim/raw/develop/docs/images/estimation_example.jpg)

# Load libraries

In [1]:
import os
import larch  # !conda install larch #for estimation
import pandas as pd
import larch.util.activitysim as larch_asim  

# Required Inputs

In addition to a working ActivitySim model setup, estimation mode requires an ActivitySim format household travel survey.  An ActivitySim format household travel survey is very similar to ActivitySim's simulation model tables:

 - households
 - persons
 - tours
 - joint_tour_participants
 - trips (not yet implemented)

Examples of the ActivitySim format household travel survey are included in the [example_estimation data folders](https://github.com/RSGInc/activitysim/tree/develop/activitysim/examples/example_estimation).  The user is responsible for formatting their household travel survey into the appropriate format.  

After creating an ActivitySim format household travel survey, the `scripts/infer.py` script is run to append additional calculated fields.  An example of an additional calculated field is the `household:joint_tour_frequency`, which is calculated based on the `tours` and `joint_tour_participants` tables.  

The input survey files are below.

### Survey households

In [2]:
pd.read_csv("../data_sf/survey_data/override_households.csv")

Unnamed: 0,household_id,TAZ,income,hhsize,HHT,auto_ownership,num_workers,joint_tour_frequency
0,2223759,16,144100,2,1,0,2,1_Main
1,990869,134,48000,2,1,2,2,0_tours
2,125886,113,25900,1,4,1,1,0_tours
3,727893,8,26100,2,1,0,1,0_tours
4,2741769,150,121600,4,1,2,1,0_tours
...,...,...,...,...,...,...,...,...
1995,663493,110,19180,1,6,1,1,0_tours
1996,569375,20,7400,1,6,1,0,0_tours
1997,1445193,17,75000,1,4,0,1,0_tours
1998,2833455,69,0,1,0,0,0,0_tours


### Survey persons

In [3]:
pd.read_csv("../data_sf/survey_data/override_persons.csv")

Unnamed: 0,person_id,household_id,age,PNUM,sex,pemploy,pstudent,ptype,school_taz,workplace_taz,free_parking_at_work,cdap_activity,mandatory_tour_frequency,_escort,_shopping,_othmaint,_othdiscr,_eatout,_social,non_mandatory_tour_frequency
0,166,166,54,1,2,3,3,4,-1,-1,False,N,,0,0,0,0,1,0,4
1,197,197,46,1,2,3,3,4,-1,-1,False,N,,0,1,0,0,0,0,16
2,268,268,46,1,1,3,3,4,-1,-1,False,N,,0,0,1,1,0,0,9
3,375,375,54,1,2,3,3,4,-1,-1,False,N,,0,0,1,0,0,0,8
4,387,387,44,1,2,3,3,4,-1,-1,False,N,,1,0,0,1,0,0,33
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4401,7554799,2863464,93,1,2,3,3,5,-1,-1,False,N,,0,0,0,1,0,0,1
4402,7554818,2863483,68,1,1,3,3,5,-1,-1,False,N,,0,0,1,1,0,0,9
4403,7555141,2863806,93,1,2,3,3,5,-1,-1,False,N,,0,2,0,1,0,0,17
4404,7555853,2864518,71,1,1,3,3,5,-1,-1,False,N,,0,0,0,0,0,1,2


### Survey tours

In [4]:
pd.read_csv("../data_sf/survey_data/override_tours.csv")

Unnamed: 0,tour_id,survey_tour_id,person_id,household_id,tour_type,tour_category,destination,origin,start,end,tour_mode,survey_parent_tour_id,parent_tour_id,composition,tdd,atwork_subtour_frequency
0,25820,258200,629,629,school,mandatory,133.0,131.0,12.0,15.0,WALK,,,,115,
1,52265,522650,1274,1274,school,mandatory,188.0,166.0,9.0,15.0,WALK_LOC,,,,76,
2,1117937,11179370,27266,27266,school,mandatory,133.0,9.0,17.0,18.0,WALK_HVY,,,,163,
3,1148523,11485230,28012,28012,school,mandatory,12.0,10.0,17.0,22.0,WALK_LRF,,,,167,
4,1208547,12085470,29476,29476,school,mandatory,13.0,16.0,8.0,15.0,WALK_LOC,,,,61,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5768,302942627,3029426270,7388844,2750003,maint,atwork,5.0,7.0,14.0,14.0,WALK,3.029426e+09,302942643.0,,135,
5769,305120465,3051204650,7441962,2758909,maint,atwork,110.0,2.0,12.0,13.0,SHARED2FREE,3.051205e+09,305120481.0,,113,
5770,308000655,3080006550,7512211,2820876,eat,atwork,14.0,1.0,12.0,13.0,WALK,3.080007e+09,308000690.0,,113,
5771,308073840,3080738400,7513996,2822661,eat,atwork,69.0,107.0,8.0,16.0,SHARED3FREE,3.080739e+09,308073875.0,,62,


### Survey joint tour participants

In [5]:
pd.read_csv("../data_sf/survey_data/survey_joint_tour_participants.csv")

Unnamed: 0,participant_id,tour_id,household_id,person_id,participant_num
0,22095828301,220958283,2223759,5389226,1
1,22095828302,220958283,2223759,5389227,2
2,14429508701,144295087,1606646,3519392,1
3,14429508702,144295087,1606646,3519393,2
4,28367651801,283676518,2628704,6918939,1
...,...,...,...,...,...
226,16297928102,162979281,1769918,3975105,2
227,16297928103,162979281,1769918,3975106,3
228,16297928104,162979281,1769918,3975107,4
229,26353054902,263530549,2519358,6427575,1


# Example Setup if Needed

To avoid duplication of inputs, especially model settings and expressions, the `example_estimation` depends on the `example`.  The following commands create an example setup and then an example estimation setup for use.  The location of these example setups (i.e. the folders) are important because the paths are referenced in this notebook.  

Make sure to add skims.omx from the [mtc box account](https://mtcdrive.app.box.com/v/activitysim/folder/7484860689) for the SF county example to the data_sf folder before running the estimation example.  This large file is not included in the repository.

In [6]:
assert os.path.exists("../data_sf/skims.omx")

In [7]:
# create examples
!activitysim create -e example_mtc -d test

copying files from example_mtc...
copied! new project files are in /Users/jeffnewman/Git/activitysim/activitysim/examples/example_estimation/notebooks/test/example_mtc


# Run the Estimation Example

The next step is to run the model with an `estimation.yaml` settings file with the following settings in order to output the EDB for auto ownership:

In [8]:
with open("../configs/auto_ownership/estimation.yaml", 'rt') as f: print(f.read())


enable: True

bundles:
  - auto_ownership

survey_tables:
  households:
    file_name: survey_data/override_households.csv
    index_col: household_id
  persons:
    file_name:  survey_data/override_persons.csv
    index_col: person_id
  tours:
    file_name:  survey_data/override_tours.csv
  joint_tour_participants:
    file_name:  survey_data/override_joint_tour_participants.csv
  
estimation_table_recipes:
  simple_simulate:
    omnibus_tables:
      values_combined:
        - choices
        - override_choices
        - expression_values
        - choosers
    omnibus_tables_append_columns: [values_combined]

model_estimation_table_types:
  auto_ownership: simple_simulate



This enables the estimation mode functionality, identifies which models to run and their output estimation data bundles (EDBs), and the input survey tables, which include the override settings for each model choice.  

With this setup, the model will output an EBD with the folling tables:
  - model settings - auto_ownership_model_settings.yaml
  - coefficients - auto_ownership_coefficients.csv
  - utilities specification - auto_ownership_SPEC.csv
  - chooser and alternatives data - auto_ownership_values_combined.csv
  
The following code runs the software in estimation mode, inheriting the settings from the simulation setup and using the San Francisco county data setup.  It produces the auto_ownership model EDB but runs all the model steps identified in the inherited settings file.  

In [9]:
# run from the notebook folder
!activitysim run -c ../configs/auto_ownership -c test/configs -d ../data_sf -d test/data -o ../output

Configured logging using basicConfig
INFO:activitysim:Configured logging using basicConfig
INFO:activitysim.cli.run:using configs_dir: ['../configs/auto_ownership', 'test/configs']
INFO:activitysim.cli.run:using data_dir: ['../data_sf', 'test/data']
INFO:activitysim.cli.run:using output_dir: ['../output']
INFO - Read logging configuration from: ../configs/auto_ownership/logging.yaml
INFO - setting households_sample_size: 0
INFO - setting chunk_size: 0
INFO - setting multiprocess: None
INFO - setting num_processes: None
INFO - setting resume_after: None
INFO - run single process simulation
INFO - open_pipeline
INFO - Set random seed base to 0
INFO - Time to execute open_pipeline : 0.014 seconds (0.0 minutes)
INFO - preload_injectables
INFO - Time to execute preload_injectables : 0.0 seconds (0.0 minutes)
INFO - Reading CSV file ../data_sf/land_use.csv
INFO - renaming columns: {'ZONE': 'TAZ', 'COUNTY': 'county_id'}
INFO - keeping columns: ['DISTRICT', 'SD', 'county_id', 'TOTHH', 'TOTPOP'

DEBUG - auto_ownership: write_omnibus_choosers: ../output/estimation_data_bundle/auto_ownership/auto_ownership_values_combined.csv
INFO - auto_ownership: end estimation
INFO - auto_ownership top 10 value counts:
1    886
0    616
2    387
3     74
4     37
Name: auto_ownership, dtype: int64
INFO - Running free_parking with 2583 persons
INFO - Running chunk 1 of 1 size 2583
INFO - Time to execute  eval_utilities : 0.002 seconds (0.0 minutes)
INFO - free_parking top 10 value counts:
False    4273
True      133
Name: free_parking_at_work, dtype: int64
INFO - Pre-building cdap specs
INFO - Time to execute build_cdap_spec hh_size 2 : 0.154 seconds (0.0 minutes)
INFO - Time to execute build_cdap_spec hh_size 3 : 0.468 seconds (0.0 minutes)
INFO - Time to execute build_cdap_spec hh_size 4 : 1.038 seconds (0.0 minutes)
INFO - Time to execute build_cdap_spec hh_size 5 : 2.175 seconds (0.0 minutes)
INFO - Running cdap_simulate with 4406 persons
INFO - Running chunk 1 of 1 with 4406 persons
INFO 

INFO - Running segment 'shopping' of 19 joint_tours 190 alternatives
INFO - running non_mandatory_tour_destination.sample.shopping with 19 tours
INFO - Running chunk 1 of 1 size 19
INFO - Running eval_interaction_utilities on 3610 rows
INFO - Running non_mandatory_tour_destination.logsums.shopping with 458 rows
INFO - Running chunk 1 of 1 size 458
INFO - Time to execute  eval_utilities : 0.361 seconds (0.0 minutes)
INFO - Running tour_destination_simulate with 19 persons
INFO - Running chunk 1 of 1 size 19
INFO - Running eval_interaction_utilities on 458 rows
INFO - Running segment 'othmaint' of 26 joint_tours 190 alternatives
INFO - running non_mandatory_tour_destination.sample.othmaint with 26 tours
INFO - Running chunk 1 of 1 size 26
INFO - Running eval_interaction_utilities on 4940 rows
INFO - Running non_mandatory_tour_destination.logsums.othmaint with 635 rows
INFO - Running chunk 1 of 1 size 635
INFO - Time to execute  eval_utilities : 0.331 seconds (0.0 minutes)
INFO - Running 

INFO - Running non_mandatory_tour_destination.logsums.escort with 9761 rows
INFO - Running chunk 1 of 1 size 9761
INFO - Time to execute  eval_utilities : 0.503 seconds (0.0 minutes)
INFO - Running tour_destination_simulate with 404 persons
INFO - Running chunk 1 of 1 size 404
INFO - Running eval_interaction_utilities on 9761 rows
INFO - Running non_mandatory_tour_scheduling with 5323 tours
DEBUG - @inject timetable
INFO - non_mandatory_tour_scheduling.vectorize_tour_scheduling.tour_1 schedule_tours running 1679 tour choices
INFO - Running chunk 1 of 1 size 1679
INFO - non_mandatory_tour_scheduling.vectorize_tour_scheduling.tour_1 schedule_tours running 1679 tour choices
INFO - Running chunk 1 of 1 size 1679
INFO - Running eval_interaction_utilities on 240940 rows
INFO - non_mandatory_tour_scheduling.vectorize_tour_scheduling.tour_2 schedule_tours running 562 tour choices
INFO - Running chunk 1 of 1 size 562
INFO - non_mandatory_tour_scheduling.vectorize_tour_scheduling.tour_2 schedule

INFO - Running eval_interaction_utilities on 161 rows
INFO - Running atwork_subtour_mode_choice with 451 subtours
INFO - atwork_subtour_mode_choice tour_type top 10 value counts:
eat         346
maint        53
business     52
Name: tour_type, dtype: int64
INFO - Running chunk 1 of 1 size 451
INFO - Time to execute  eval_utilities : 0.425 seconds (0.0 minutes)
INFO - atwork_subtour_mode_choice choices top 10 value counts:
WALK              223
SHARED2FREE        68
DRIVEALONEFREE     63
SHARED3FREE        38
TNC_SINGLE         19
WALK_LOC           11
BIKE                9
WALK_LRF            7
TNC_SHARED          6
TAXI                5
Name: tour_mode, dtype: int64
INFO - Time to execute run_model (24 models) : 69.654 seconds (1.2 minutes)
INFO - close_pipeline
INFO - Time to execute all models : 69.676 seconds (1.2 minutes)


# Read the EDB

The next step is to read the EDB, including the coefficients, model settings, utilities specification, and chooser and alternative data.

In [10]:
edb_directory = "../output/estimation_data_bundle/auto_ownership/"

def read_csv(filename, **kwargs):
    return pd.read_csv(os.path.join(edb_directory, filename), **kwargs)

In [11]:
coefficients = read_csv("auto_ownership_coefficients.csv", index_col='coefficient_name')
spec = read_csv("auto_ownership_SPEC.csv")
chooser_data = read_csv("auto_ownership_values_combined.csv")

### Coefficients

In [12]:
coefficients

Unnamed: 0_level_0,value,constrain
coefficient_name,Unnamed: 1_level_1,Unnamed: 2_level_1
coef_cars1_drivers_2,0.0000,T
coef_cars1_drivers_3,0.0000,T
coef_cars1_persons_16_17,0.0000,T
coef_cars234_asc_marin,0.0000,T
coef_cars1_persons_25_34,0.0000,T
...,...,...
coef_cars4_drivers_3,5.2080,F
coef_cars3_drivers_3,5.5131,F
coef_cars2_drivers_4_up,6.3662,F
coef_cars3_drivers_4_up,8.5148,F


#### Utility specification

In [13]:
spec

Unnamed: 0,Label,Description,Expression,cars0,cars1,cars2,cars3,cars4
0,util_drivers_2,2 Adults (age 16+),num_drivers==2,,coef_cars1_drivers_2,coef_cars2_drivers_2,coef_cars3_drivers_2,coef_cars4_drivers_2
1,util_drivers_3,3 Adults (age 16+),num_drivers==3,,coef_cars1_drivers_3,coef_cars2_drivers_3,coef_cars3_drivers_3,coef_cars4_drivers_3
2,util_drivers_4_up,4+ Adults (age 16+),num_drivers>3,,coef_cars1_drivers_4_up,coef_cars2_drivers_4_up,coef_cars3_drivers_4_up,coef_cars4_drivers_4_up
3,util_persons_16_17,Persons age 16-17,num_children_16_to_17,,coef_cars1_persons_16_17,coef_cars2_persons_16_17,coef_cars34_persons_16_17,coef_cars34_persons_16_17
4,util_persons_18_24,Persons age 18-24,num_college_age,,coef_cars1_persons_18_24,coef_cars2_persons_18_24,coef_cars34_persons_18_24,coef_cars34_persons_18_24
5,util_persons_25_34,Persons age 35-34,num_young_adults,,coef_cars1_persons_25_34,coef_cars2_persons_25_34,coef_cars34_persons_25_34,coef_cars34_persons_25_34
6,util_presence_children_0_4,Presence of children age 0-4,num_young_children>0,,coef_cars1_presence_children_0_4,coef_cars234_presence_children_0_4,coef_cars234_presence_children_0_4,coef_cars234_presence_children_0_4
7,util_presence_children_5_17,Presence of children age 5-17,(num_children_5_to_15+num_children_16_to_17)>0,,coef_cars1_presence_children_5_17,coef_cars2_presence_children_5_17,coef_cars34_presence_children_5_17,coef_cars34_presence_children_5_17
8,util_num_workers_clip_3,"Number of workers, capped at 3",@df.num_workers.clip(upper=3),,coef_cars1_num_workers_clip_3,coef_cars2_num_workers_clip_3,coef_cars3_num_workers_clip_3,coef_cars4_num_workers_clip_3
9,util_hh_income_0_30k,"Piecewise Linear household income, $0-30k","@df.income_in_thousands.clip(0, 30)",,coef_cars1_hh_income_0_30k,coef_cars2_hh_income_0_30k,coef_cars3_hh_income_0_30k,coef_cars4_hh_income_0_30k


### Chooser and alternatives data

In [14]:
chooser_data

Unnamed: 0,household_id,model_choice,override_choice,util_drivers_2,util_drivers_3,util_drivers_4_up,util_persons_16_17,util_persons_18_24,util_persons_25_34,util_presence_children_0_4,...,OPRKCST,area_type,HSENROLL,COLLFTE,COLLPTE,TOPOLOGY,TERMINAL,household_density,employment_density,density_index
0,166,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.00000,2,0.0,0.00000,0.00000,1,3.21263,24.783133,31.566265,13.883217
1,197,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,116.00000,2,0.0,0.00000,0.00000,1,3.68156,56.783784,10.459459,8.832526
2,268,1,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.00000,1,0.0,3598.08521,0.00000,1,3.29100,11.947644,45.167539,9.448375
3,375,1,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,68.00000,1,0.0,0.00000,0.00000,1,4.11499,73.040169,28.028350,20.255520
4,387,1,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.00000,3,0.0,227.78223,41.22827,1,3.83527,26.631579,45.868421,16.848945
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1995,2863464,1,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,314.01431,0,0.0,72.14684,0.00000,1,5.52555,38.187500,978.875000,36.753679
1996,2863483,1,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,225.00000,1,0.0,0.00000,0.00000,3,3.99027,39.838272,71.693001,25.608291
1997,2863806,1,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,202.24750,2,0.0,0.00000,0.00000,1,4.27539,51.675676,47.216216,24.672699
1998,2864518,1,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.00000,1,0.0,0.00000,0.00000,1,25.52083,15.938148,551.353820,15.490363


# Data Processing and Estimation Setup

The next step is to transform the EDB for larch for model re-estimation.  

In [15]:
from larch import P, X

altnames = list(spec.columns[3:])
altcodes = range(len(altnames))

In [16]:
m = larch.Model()

One of the alternatives is coded as 0, so
we need to explicitly initialize the MNL nesting graph
and set to root_id to a value other than zero.

In [17]:
m.initialize_graph(alternative_codes=altcodes, root_id=99)

### Utility specifications

In [18]:
m.utility_co = larch_asim.dict_of_linear_utility_from_spec(
    spec, 'Label', dict(zip(altnames,altcodes)),
)
m.utility_co

alt,formula
0,<Empty LinearFunction_C>
1,P.coef_cars1_drivers_2 * X.util_drivers_2 + P.coef_cars1_drivers_3 * X.util_drivers_3 + P.coef_cars1_drivers_4_up * X.util_drivers_4_up + P.coef_cars1_persons_16_17 * X.util_persons_16_17 + P.coef_cars1_persons_18_24 * X.util_persons_18_24 + P.coef_cars1_persons_25_34 * X.util_persons_25_34 + P.coef_cars1_presence_children_0_4 * X.util_presence_children_0_4 + P.coef_cars1_presence_children_5_17 * X.util_presence_children_5_17 + P.coef_cars1_num_workers_clip_3 * X.util_num_workers_clip_3 + P.coef_cars1_hh_income_0_30k * X.util_hh_income_0_30k + P.coef_cars1_hh_income_30_up * X.util_hh_income_30_75k + P.coef_cars1_hh_income_30_up * X.util_hh_income_75k_up + P.coef_cars1_density_0_10_no_workers * X.util_density_0_10_no_workers + P.coef_cars1_density_10_up_no_workers * X.util_density_10_up_no_workers + P.coef_cars1_density_0_10_no_workers * X.util_density_0_10_workers + P.coef_cars1_density_10_up_workers * X.util_density_10_up_workers + P.coef_cars1_asc * X.util_asc + P.coef_cars1_asc_san_francisco * X.util_asc_san_francisco + P.coef_cars1_asc_county * X.util_asc_solano + P.coef_cars1_asc_county * X.util_asc_napa + P.coef_cars1_asc_county * X.util_asc_sonoma + P.coef_cars1_asc_marin * X.util_asc_marin + P.coef_retail_auto_no_workers * X.util_retail_auto_no_workers + P.coef_retail_auto_workers * X.util_retail_auto_workers + P.coef_retail_transit_no_workers * X.util_retail_transit_no_workers + P.coef_retail_transit_workers * X.util_retail_transit_workers + P.coef_retail_non_motor * X.util_retail_non_motor_no_workers + P.coef_retail_non_motor * X.util_retail_non_motor_workers + P.coef_cars1_auto_time_saving_per_worker * X.util_auto_time_saving_per_worker
2,P.coef_cars2_drivers_2 * X.util_drivers_2 + P.coef_cars2_drivers_3 * X.util_drivers_3 + P.coef_cars2_drivers_4_up * X.util_drivers_4_up + P.coef_cars2_persons_16_17 * X.util_persons_16_17 + P.coef_cars2_persons_18_24 * X.util_persons_18_24 + P.coef_cars2_persons_25_34 * X.util_persons_25_34 + P.coef_cars234_presence_children_0_4 * X.util_presence_children_0_4 + P.coef_cars2_presence_children_5_17 * X.util_presence_children_5_17 + P.coef_cars2_num_workers_clip_3 * X.util_num_workers_clip_3 + P.coef_cars2_hh_income_0_30k * X.util_hh_income_0_30k + P.coef_cars2_hh_income_30_up * X.util_hh_income_30_75k + P.coef_cars2_hh_income_30_up * X.util_hh_income_75k_up + P.coef_cars2_density_0_10_no_workers * X.util_density_0_10_no_workers + P.coef_cars2_density_10_up_no_workers * X.util_density_10_up_no_workers + P.coef_cars2_density_0_10_no_workers * X.util_density_0_10_workers + P.coef_cars2_density_10_up_no_workers * X.util_density_10_up_workers + P.coef_cars2_asc * X.util_asc + P.coef_cars2_asc_san_francisco * X.util_asc_san_francisco + P.coef_cars2_asc_county * X.util_asc_solano + P.coef_cars2_asc_county * X.util_asc_napa + P.coef_cars2_asc_county * X.util_asc_sonoma + P.coef_cars234_asc_marin * X.util_asc_marin + P.coef_retail_auto_no_workers * X.util_retail_auto_no_workers + P.coef_retail_auto_workers * X.util_retail_auto_workers + P.coef_retail_transit_no_workers * X.util_retail_transit_no_workers + P.coef_retail_transit_workers * X.util_retail_transit_workers + P.coef_retail_non_motor * X.util_retail_non_motor_no_workers + P.coef_retail_non_motor * X.util_retail_non_motor_workers + P.coef_cars2_auto_time_saving_per_worker * X.util_auto_time_saving_per_worker
3,P.coef_cars3_drivers_2 * X.util_drivers_2 + P.coef_cars3_drivers_3 * X.util_drivers_3 + P.coef_cars3_drivers_4_up * X.util_drivers_4_up + P.coef_cars34_persons_16_17 * X.util_persons_16_17 + P.coef_cars34_persons_18_24 * X.util_persons_18_24 + P.coef_cars34_persons_25_34 * X.util_persons_25_34 + P.coef_cars234_presence_children_0_4 * X.util_presence_children_0_4 + P.coef_cars34_presence_children_5_17 * X.util_presence_children_5_17 + P.coef_cars3_num_workers_clip_3 * X.util_num_workers_clip_3 + P.coef_cars3_hh_income_0_30k * X.util_hh_income_0_30k + P.coef_cars3_hh_income_30_up * X.util_hh_income_30_75k + P.coef_cars3_hh_income_30_up * X.util_hh_income_75k_up + P.coef_cars34_density_0_10_no_workers * X.util_density_0_10_no_workers + P.coef_cars34_density_10_up_no_workers * X.util_density_10_up_no_workers + P.coef_cars34_density_0_10_no_workers * X.util_density_0_10_workers + P.coef_cars34_density_10_up_no_workers * X.util_density_10_up_workers + P.coef_cars3_asc * X.util_asc + P.coef_cars34_asc_san_francisco * X.util_asc_san_francisco + P.coef_cars34_asc_county * X.util_asc_solano + P.coef_cars34_asc_county * X.util_asc_napa + P.coef_cars34_asc_county * X.util_asc_sonoma + P.coef_cars234_asc_marin * X.util_asc_marin + P.coef_retail_auto_no_workers * X.util_retail_auto_no_workers + P.coef_retail_auto_workers * X.util_retail_auto_workers + P.coef_retail_transit_no_workers * X.util_retail_transit_no_workers + P.coef_retail_transit_workers * X.util_retail_transit_workers + P.coef_retail_non_motor * X.util_retail_non_motor_no_workers + P.coef_retail_non_motor * X.util_retail_non_motor_workers + P.coef_cars3_auto_time_saving_per_worker * X.util_auto_time_saving_per_worker
4,P.coef_cars4_drivers_2 * X.util_drivers_2 + P.coef_cars4_drivers_3 * X.util_drivers_3 + P.coef_cars4_drivers_4_up * X.util_drivers_4_up + P.coef_cars34_persons_16_17 * X.util_persons_16_17 + P.coef_cars34_persons_18_24 * X.util_persons_18_24 + P.coef_cars34_persons_25_34 * X.util_persons_25_34 + P.coef_cars234_presence_children_0_4 * X.util_presence_children_0_4 + P.coef_cars34_presence_children_5_17 * X.util_presence_children_5_17 + P.coef_cars4_num_workers_clip_3 * X.util_num_workers_clip_3 + P.coef_cars4_hh_income_0_30k * X.util_hh_income_0_30k + P.coef_cars4_hh_income_30_up * X.util_hh_income_30_75k + P.coef_cars4_hh_income_30_up * X.util_hh_income_75k_up + P.coef_cars34_density_0_10_no_workers * X.util_density_0_10_no_workers + P.coef_cars34_density_10_up_no_workers * X.util_density_10_up_no_workers + P.coef_cars34_density_0_10_no_workers * X.util_density_0_10_workers + P.coef_cars34_density_10_up_no_workers * X.util_density_10_up_workers + P.coef_cars4_asc * X.util_asc + P.coef_cars34_asc_san_francisco * X.util_asc_san_francisco + P.coef_cars34_asc_county * X.util_asc_solano + P.coef_cars34_asc_county * X.util_asc_napa + P.coef_cars34_asc_county * X.util_asc_sonoma + P.coef_cars234_asc_marin * X.util_asc_marin + P.coef_retail_auto_no_workers * X.util_retail_auto_no_workers + P.coef_retail_auto_workers * X.util_retail_auto_workers + P.coef_retail_transit_no_workers * X.util_retail_transit_no_workers + P.coef_retail_transit_workers * X.util_retail_transit_workers + P.coef_retail_non_motor * X.util_retail_non_motor_no_workers + P.coef_retail_non_motor * X.util_retail_non_motor_workers + P.coef_cars4_auto_time_saving_per_worker * X.util_auto_time_saving_per_worker


In [19]:
larch_asim.apply_coefficients(coefficients, m)

### Coefficients

In [20]:
m.pf

Unnamed: 0,value,initvalue,nullvalue,minimum,maximum,holdfast,note
coef_cars1_asc,1.1865,0.0,0.0,,,0,
coef_cars1_asc_county,-0.5660,0.0,0.0,,,0,
coef_cars1_asc_marin,-0.2434,0.0,0.0,,,0,
coef_cars1_asc_san_francisco,0.4259,0.0,0.0,,,0,
coef_cars1_auto_time_saving_per_worker,0.4707,0.0,0.0,,,0,
...,...,...,...,...,...,...,...
coef_retail_auto_no_workers,0.0626,0.0,0.0,,,0,
coef_retail_auto_workers,0.1646,0.0,0.0,,,0,
coef_retail_non_motor,-0.0300,0.0,0.0,,,1,
coef_retail_transit_no_workers,-0.3053,0.0,0.0,,,0,


In [21]:
d = larch.DataFrames(
    co=chooser_data,
    alt_codes=altcodes,
    alt_names=altnames,
    av=True,
)

In [22]:
m.dataservice = d

### Survey choice

In [23]:
m.choice_co_code = 'override_choice'

# Estimate

With the model setup for estimation, the next step is to estimate the model coefficients.  Make sure to use a sufficiently large enough household sample and set of zones to avoid an over-specified model, which does not have a numerically stable likelihood maximizing solution.

In [24]:
m.estimate()

req_data does not request avail_ca or avail_co but it is set and being provided


Unnamed: 0,value,initvalue,nullvalue,minimum,maximum,holdfast,note,best
coef_cars1_asc,4.501130,0.0,0.0,,,0,,4.501130
coef_cars1_asc_county,-0.566000,0.0,0.0,,,0,,-0.566000
coef_cars1_asc_marin,-0.243397,0.0,0.0,,,0,,-0.243397
coef_cars1_asc_san_francisco,3.740529,0.0,0.0,,,0,,3.740529
coef_cars1_auto_time_saving_per_worker,1.278630,0.0,0.0,,,0,,1.278630
...,...,...,...,...,...,...,...,...
coef_retail_auto_no_workers,-0.579288,0.0,0.0,,,0,,-0.579288
coef_retail_auto_workers,-0.609870,0.0,0.0,,,0,,-0.609870
coef_retail_non_motor,-0.030000,0.0,0.0,,,1,,-0.030000
coef_retail_transit_no_workers,-0.349321,0.0,0.0,,,0,,-0.349321


  m.estimate()
  m.estimate()


Unnamed: 0,0
coef_cars1_asc,4.501130
coef_cars1_asc_county,-0.566000
coef_cars1_asc_marin,-0.243397
coef_cars1_asc_san_francisco,3.740529
coef_cars1_auto_time_saving_per_worker,1.278630
coef_cars1_density_0_10_no_workers,0.000000
coef_cars1_density_10_up_no_workers,-0.006883
coef_cars1_density_10_up_workers,-0.015798
coef_cars1_drivers_2,0.000000
coef_cars1_drivers_3,0.000000

Unnamed: 0,0
coef_cars1_asc,4.50113
coef_cars1_asc_county,-0.566
coef_cars1_asc_marin,-0.243397
coef_cars1_asc_san_francisco,3.740529
coef_cars1_auto_time_saving_per_worker,1.27863
coef_cars1_density_0_10_no_workers,0.0
coef_cars1_density_10_up_no_workers,-0.006883
coef_cars1_density_10_up_workers,-0.015798
coef_cars1_drivers_2,0.0
coef_cars1_drivers_3,0.0


In [25]:
m.parameter_summary()

Unnamed: 0,Value,Std Err,t Stat,Signif,Like Ratio,Null Value,Constrained
coef_cars1_asc,4.5,2.68,1.68,,,0.0,
coef_cars1_asc_county,-0.566,,,[],0.0,0.0,
coef_cars1_asc_marin,-0.243,,,[],0.0,0.0,
coef_cars1_asc_san_francisco,3.74,2.68,1.39,,,0.0,
coef_cars1_auto_time_saving_per_worker,1.28,0.652,1.96,*,,0.0,
coef_cars1_density_0_10_no_workers,0.0,,,,,0.0,fixed value
coef_cars1_density_10_up_no_workers,-0.00688,0.00515,-1.34,,,0.0,
coef_cars1_density_10_up_workers,-0.0158,0.00391,-4.04,***,,0.0,
coef_cars1_drivers_2,0.0,,,,,0.0,fixed value
coef_cars1_drivers_3,0.0,,,,,0.0,fixed value


In [26]:
m.estimation_statistics()

Statistic,Aggregate,Per Case
Number of Cases,2000,2000
Log Likelihood at Convergence,-1723.45,-0.86
Log Likelihood at Null Parameters,-3172.89,-1.59
Rho Squared w.r.t. Null Parameters,0.457,0.457


# Output Estimation Results

In [27]:
est_names = [j for j in coefficients.index if j in m.pf.index]
coefficients.loc[est_names,'value'] = m.pf.loc[est_names, 'value']

In [28]:
# Write out replacement coefficients file and model summaries
os.makedirs(os.path.join(edb_directory,'estimated'), exist_ok=True)

### Write the re-estimated coefficients file

In [29]:
coefficients.reset_index().to_csv(
    os.path.join(edb_directory,'estimated',"auto_ownership_coefficients_revised.csv"), 
    index=False,
)

### Write the model estimation report, including coefficient t-statistic and log likelihood

In [30]:
m.to_xlsx(
    os.path.join(edb_directory,'estimated',"auto_ownership_model_estimation.xlsx"), 
)

<larch.util.excel.ExcelWriter at 0x7fae180508b0>

# Next Steps

At this stage, the user should review the model results shown in this notebook or in the estimation report,
and confirm that they are satisfied with the new parameters, and that there are no anomalies that need
to be addressed.  If the results are indeed satisfactory, 
the final step is to either manually or automatically copy the `auto_ownership_coefficients_revised.csv` file to the configs folder, rename it to `auto_ownership_coeffs.csv`, and run ActivitySim in simulation mode.

In [31]:
pd.read_csv(os.path.join(edb_directory,'estimated',"auto_ownership_coefficients_revised.csv"))

Unnamed: 0,coefficient_name,value,constrain
0,coef_cars1_drivers_2,0.000000,T
1,coef_cars1_drivers_3,0.000000,T
2,coef_cars1_persons_16_17,0.000000,T
3,coef_cars234_asc_marin,0.000000,T
4,coef_cars1_persons_25_34,0.000000,T
...,...,...,...
62,coef_cars4_drivers_3,548.564223,F
63,coef_cars3_drivers_3,5.144248,F
64,coef_cars2_drivers_4_up,6.946767,F
65,coef_cars3_drivers_4_up,8.287967,F
