Skip to content

Pulling and Summarizing Raw Trip Records

Alex Bettinardi edited this page Aug 28, 2017 · 6 revisions

Background and Purpose

A typical output/request of SWIM is the travel patterns it produces. SWIM has a tour based model for the travel component within SWIM, PT. The tour based model produces synthetic trip and tour records for the entire model population for the average model day. This results in around 20 million trip records for the base year, and nearly double (~40 million) for 20-30 year-out future years. This rich dataset can be used to answer many questions, so knowing how to access it and summarize it can be very important.

The example snippets of r script below help show how to gain access to the trip records and provide examples for how to filter and summarize the information.

Example script

The first step that needs to be completed (before a data can be pulled), is for the SWIM scenario(s) of interest to be run. Assuming that has been completed, the analyst then needs to identify the scenario(s) of interest and the specific years of interest. SWIM scenarios typically contain 10 years with travel demand model data (SWIM runs for 30 years, every third year runs the travel component). Pulling and analyzing detailed trip tables (records) for every year is typically not warranted, so the analyst will typically need to sort through the SWIM run to determine a specific year or years of interest and pull trip records for just those years. See, Assessing Local Land Use Growth Patterns for instructions on how to pull land use growth patterns. That information can/should be used to pick the correct years for further analysis.

This first step specifies the location of the scenario and the year of interest. Note that the model runs in reference to the original build year of 1990. So the starting year (time zero) is "t0". 2016 in the example below can be found in the output folder "t26".

# first identify the scenario / year of interest
# in this case the reference scenario, year t26 (2016)
dataDir <- "L:/D_Copy/swim2/Reference/outputs/t26"

Once the scenario and year are identified the following lines can be used to pull in the three trip tables of interest. For those less-familiar with SWIM; PT runs a short-distance travel (SDT) and a long-distance travel (LDT) component. The model handles each of these types of travel separately, and therefore the output (trip tables) are found in two different files which need to be summarized and assessed separately.

The Commercial Transport (CT or truck) also runs and produces it's output separately.

# read the trip records - note that sdt is a large file and takes several minutes to read
sdt <- read.csv(paste(dataDir,"/Trips_SDTPerson.csv",sep=""),as.is=T)
ldt <- read.csv(paste(dataDir,"/Trips_LDTVehicle.csv",sep=""),as.is=T)
ct <- read.csv(paste(dataDir,"/Trips_CTTruck.csv",sep=""),as.is=T)

Once the raw trip records are read in, there are different types of analysis that can be conducted. One typical process is to filter out just trips in vehicles and convert the person vehicle trips to vehicles. This can be done by creating a "Vehicle Factor" or fraction field based on the model traveled as in this example:

# Add a Vehicle Factor or Fraction field to the data
sdt$VehFac <- c("DA"=1,"SR2"=0.5,"SR3P"=.3, "BIKE"=0,"SCHOOL_BUS"=0,"WALK"=0,"WK_TRAN"=0)[as.character(sdt$tripMode)] 
ldt$VehFac <- c("DA"=1,"SR2"=0.5,"SR3P"=.3, "BIKE"=0,"SCHOOL_BUS"=0,"WALK"=0,"WK_TRAN"=0)[as.character(ldt$tripMode)] 

Those factors can then be used to build zone by purpose matrices or similar tabulations (note that CT doesn't need a vehicle factor field added, because all trips in the CT trip table are individual truck trips).

# build trips by zone and purpose tables
sdt.ZnPu <- tapply(sdt$VehFac,list(sdt$destination,sdt$tripPurpose),sum)
sdt.ZnPu[is.na(sdt.ZnPu)] <- 0

ldt.ZnPu <- tapply(ldt$VehFac,list(ldt$destination,ldt$tripPurpose),sum)
ldt.ZnPu[is.na(ldt.ZnPu)] <- 0

ct.ZnPu <- table(ct$destination,ct$tripMode)
ct.ZnPu[is.na(ct.ZnPu)] <- 0

It is important not to forget that any summary datasets from SDT, LDT, or CT will likely have differing dimensions. As an example, there might not be commercial truck trips or long-distance trips that go to every zone in the model. Similarly, SDT does not deal with any travel to/from the external zones (5001-5012), so the example data sets above can not be simply added together, the user first needs to create a common zone system set and data "tabulator" object that has all the zones and dimensions across the different summary objects:

# create a common zone list
zns <- sort(unique(c(sdt$destination,ldt$destination,ct$destination)))

# empty object to hold trip data 
trips.ZnPu <- array(0, dim=c(length(zns), length(unique(sdt$tripPurpose))+1), dimnames=list(zns,c(sort(unique(sdt$tripPurpose)),"TRUCK")))    

When summing up the trips from the three different datasets, some care needs to be taken to ensure that the purposes line up as intended (see example, note truck mode from above is collapsed in this example).

# add in sdt trips
trips.ZnPu[rownames(sdt.ZnPu),colnames(sdt.ZnPu)] <- trips.ZnPu[rownames(sdt.ZnPu),colnames(sdt.ZnPu)] + sdt.ZnPu 
# add in ldt trips
trips.ZnPu[rownames(ldt.ZnPu),c("OTHER"="OTHER","WORKRELATED"="WORK","HOUSEHOLD"="HOME")[colnames(ldt.ZnPu)]] <- trips.ZnPu[rownames(ldt.ZnPu),c("OTHER"="OTHER","WORKRELATED"="WORK","HOUSEHOLD"="HOME")[colnames(ldt.ZnPu)]] + ldt.ZnPu 
# add in ct trips
trips.ZnPu[rownames(ct.ZnPu),"TRUCK"] <- rowSums(ct.ZnPu) 

In addition to tallying up information such as trips, vehicles, VMT... by zone, another common summary / report is to compile OD patterns from the trip data.

A typical first step to develop OD patterns is to bring in a project specific zone "district" aggregation layer. This is done because it typically does not make sense to look at a ~3000 x 3000 OD matrix, so first the TAZs need to be grouped into meaningful districts. One typical grouping is the County system. To do this, a zone table with the County field by zone needs to be brought in. Every scenario and output folder has a file called "AllZones.csv" which can be brought in to the R workspace to help aggregate zones.

# Add AllZones.csv table for zone-to-county reference
azn <- read.csv(paste(dataDir,"/AllZones.csv",sep=""),as.is=T)
rownames(azn) <- azn$Azone

With the "a2b" file the County-to-County flow information can be brought over to the three trip tables as new Count specific origin and destination fields.

# Add Aggregate Origin and Destination fields to sdt
sdt$AggO <- azn[as.character(sdt$origin),"COUNTY"]  
sdt$AggD <- azn[as.character(sdt$destination),"COUNTY"] 

# Add Aggregate Origin and Destination fields to ldt
ldt$AggO <- azn[as.character(ldt$origin),"COUNTY"]  
ldt$AggD <- azn[as.character(ldt$destination),"COUNTY"] 

# Add Aggregate Origin and Destination fields to ct
ct$AggO <- azn[as.character(ct$origin),"COUNTY"]  
ct$AggD <- azn[as.character(ct$destination),"COUNTY"] 

With the new Aggregated Origin and Destination fields, Zone-Zone tabulations can be created.

# Origin and Destination sdt table
sdt.ZnZn <- tapply(sdt$VehFac, list(sdt$AggO, sdt$AggD), sum)
sdt.ZnZn[is.na(sdt.ZnZn)] <- 0

# Origin and Destination ldt table
ldt.ZnZn <- tapply(ldt$VehFac, list(ldt$AggO, ldt$AggD), sum)
ldt.ZnZn[is.na(ldt.ZnZn)] <- 0

# Origin and Destination ct table, no need to use tapply with "VehFac", since these are all vehicles - 
# just use a simple "table" function
ct.ZnZn <- table(ct$AggO, ct$AggD)
ct.ZnZn[is.na(ct.ZnZn)] <- 0

As with above, there is no guarantee that sdt, ldt, and ct tabulations will have the same district-to-district structure. Of specific note is that sdt does not have external zones, so it is important to create a common data object to fill, as was done above. Unless of course the end analysis desires to look at sdt, ldt, and ct information separately. If that is the case the OD matrices from above could simply be exported as different files and shared / used.

# create a common district list
zns <- sort(unique(c(sdt$AggO,ldt$AggO,ct$AggO)))

# empty object to hold trip data 
trips.ZnZn <- array(0, dim=c(length(zns), length(zns)), dimnames=list(zns,zns))    

# add in sdt trips
trips.ZnZn[rownames(sdt.ZnZn),colnames(sdt.ZnZn)] <- trips.ZnZn[rownames(sdt.ZnZn),colnames(sdt.ZnZn)] + sdt.ZnZn 
# add in ldt trips
trips.ZnZn[rownames(ldt.ZnZn),colnames(ldt.ZnZn)] <- trips.ZnZn[rownames(ldt.ZnZn),colnames(ldt.ZnZn)] + ldt.ZnZn 
# add in ct trips
trips.ZnZn[rownames(ct.ZnZn),colnames(ct.ZnZn)] <- trips.ZnZn[rownames(ct.ZnZn),colnames(ct.ZnZn)] + ct.ZnZn  

Here is a final example from a SWIM request that helps tie all the elements above into an actual example to provide output (information to a customer):

# Coming Soon