Skip to content
Spatial Analysis and Data Extraction
Branch: master
Clone or download
Latest commit 2f485b1 May 31, 2018
Type Name Latest commit message Commit time
Failed to load latest commit information.
config Update example.cfg Dec 7, 2014
examples Update code to use index_col instead of index_name Nov 24, 2014
spandex update pandas: set_value DEPR May 30, 2018
.coveragerc Move rastertoolz under deprecated module Nov 6, 2014
.travis.yml remove slack notifications and use latest miniconda Jul 23, 2015
LICENSE.txt Add template to include ez_setup in sdist Sep 25, 2014 update README travis badge Sep 10, 2015
setup.cfg Add setup.cfg with pep8 command options. Sep 2, 2014


Build Status

Spatial Analysis and Data Extraction

Data processing and imputation steps for UrbanSim. Transform messy urban data into spatial insight!

Spandex is a library of functions that operate on pandas dataframes and spatial data. Also includes functions for data loading from common data sources. A sequence of spandex functions can form a data ingestion/cleaning/imputation pipeline. The spandex library aspires to give urban modelers the power to compose complex, powerful data cleaning pipelines with succinct, readable code. Once an interoperable set of core functions is in place, they can be applied and composed together in a variety of different ways depending on the needs of each region and model.

Every region developing the data for an UrbanSim model goes through a similar process. Large amounts of raw data must be processed into a form usable for model estimation and simulation (i.e. an ETL step). Most data problems encountered are common across regions, and present solutions are often ad hoc and difficult to replicate. As part of the spandex library, we propose to develop an approach to assembling and cleaning urban data for modeling that will be generally useful. Automating and streamlining the data processing stage is the first step towards an end-to-end urban analytics pipeline: process data, estimate models, calibrate/validate models, make predictions.

The spandex library consists of functions for conveniently:

  • Loading common data formats
  • Cleaning and correcting geometry
  • Applying spatial operations typical in UrbanSim data development
  • Applying data imputation operations typical in UrbanSim data development
  • Auto-detection and correction of errors
  • Data verification/QA/validation/reconciliation
  • Export to UrbanSim HDF5, synchronize with UrbanCanvas database, export to other common formats

Motivating principles include:

  • The ability to reproduce results should be a key element of an urban modeling workflow. All data manipulation and analysis should be repeatable and automated.
  • The process used to generate UrbanSim data should be fully transparent
  • Assumptions/choices in the data processing stage of model development influence model results, so data regeneration should be considered part of the modeling.

By facilitating end-to-end automation of data processing and cleaning steps, spandex will facilitate a workflow that is in accordance with best practices for reproducible research. The resulting urban modeling data workflow will be more transparent and less error prone. The random number generator seed used as part of each regeneration will be saved (since certain allocation steps will have a random component), along with the git version of code used. Testing will be emphasized to ensure that code is functioning as expected. The data pipeline will always be run from raw data so as to preserve the input in an unaltered state and so that all manipulations, from the earliest data processing steps onwards, are repeatable. Data cleaning methodology influences model results and should be considered part of the modeling.

You can’t perform that action at this time.