# Real estate price model workflow

Sam Maurer, Feb 2018  
Python 3.6, intended to be backward compatible with 2.7

In [1]:
import orca
from __future__ import print_function

## Bootstrap Orca with some legacy registrations

This exercise starts from a point where data is already registered in Orca. Eventually, the vision is that data will be loaded based on config files in the 'data' directory.

For now, the 'legacy' directory contains some code from Paul Sohn's [urbansim_parcels](https://github.com/urbansim/urbansim_parcels) project. Importing 'datasources.py' and 'models.py' registers a handful of Orca objects.

In [2]:
import os; os.chdir('../legacy')
import datasources
import models

## Explore the Orca registrations

In [3]:
orca.list_tables()

['households', 'buildings', 'parcels', 'jobs']

In [4]:
orca.list_columns()

[('households', 'node_id'), ('buildings', 'node_id'), ('jobs', 'node_id')]

In [5]:
orca.list_broadcasts()

[('parcels', 'buildings'), ('buildings', 'households'), ('buildings', 'jobs')]

In [6]:
orca.list_injectables()

['settings', 'store', 'net_store']

In [7]:
orca.list_steps()

['build_networks', 'neighborhood_vars']

## Explore the data

Orca doesn't execute code to load the registered objects until it needs to

In [8]:
orca.get_table('households').to_frame().describe()

Unnamed: 0,building_id,tenure,persons,workers,age_of_head,income,children,race_id,cars,base_luz,segmentation_col,node_id
count,58671.0,58671.0,58671.0,58671.0,58671.0,58671.0,58671.0,58671.0,58671.0,58671.0,58671.0,58671.0
mean,370371.030339,2.406913,2.156057,1.156534,44.336742,64154.87,0.434099,2.257282,1.363859,92.911353,1.0,43541.001227
std,79639.958079,0.916539,1.299009,0.798054,16.097489,67859.93,0.876846,1.478598,0.865866,5.21015,0.0,4304.886766
min,5120.0,0.0,1.0,0.0,16.0,-9999.0,0.0,1.0,0.0,88.0,1.0,36354.0
25%,352274.5,1.0,1.0,1.0,31.0,24000.0,0.0,1.0,1.0,89.0,1.0,40118.0
50%,363553.0,3.0,2.0,1.0,41.0,45000.0,0.0,2.0,1.0,93.0,1.0,42831.0
75%,380838.5,3.0,3.0,2.0,55.0,82500.0,0.0,2.0,2.0,93.0,1.0,46564.0
max,679716.0,4.0,11.0,5.0,93.0,1125300.0,6.0,8.0,4.0,108.0,1.0,52612.0


In [9]:
print(len(orca.get_table('households').local_columns))  # native columns only
print(len(orca.get_table('households').to_frame().columns))  # native plus virtual

11
12


## Generate accessibility measures for the price model

The network accessibility metrics are not stored on disk; for now we'll generate them using legacy code.

In [10]:
orca.run(['build_networks'])

Running step 'build_networks'
Time to execute step 'build_networks': 0.29 s
Total time to execute iteration 1 with iteration value None: 0.29 s


In [12]:
%%capture
orca.run(['neighborhood_vars'])

In [13]:
orca.list_tables()

['households', 'buildings', 'parcels', 'jobs', 'nodes']

In [14]:
orca.get_table('nodes').to_frame().columns.tolist()

['ave_parcel_size',
 'mean_nonres_rent_2000m',
 'jobs_1500m',
 'jobs_800m',
 'jobs_400m',
 'ave_income',
 'ave_age_of_head_1500m',
 'ave_children_1500m',
 'ave_year_built_1500m',
 'population_400m',
 'jobs_3000m',
 'households_3000m',
 'residential_units_3000m',
 'residential_units_1500m',
 'residential_units_800m']