Tyler Byers
August 23, 2015
This is my final project for University of Washington's Methods for Data Analysis class, course #2 of 3 in the Data Science Certificate program. This project looks at public data from the Denver B-cycle program, which is merged with distance data from Google Maps and weather data from forecast.io.
The following project files are in this project directory:
- README.md -- This document, with project description.
- Denver_B-Cycle_2014.md -- Final project writeup.
- bcycle_final_script.R -- Production-level final script.
- exploring_bcycle_data.Rmd -- Contains code for data-set building (some processes are fairly complex and time-intensive and would not make sense to build in a production-level script) and initial data explorations.
- ./data -- Directory containing data files used in the scripts.
- ./figures -- Directory with figures loaded into the final project writeup.
- B-Cycle Rider Data: https://denver.bcycle.com/company, Denver B-Cycle Trip Data 2014 link at bottom (https://denver.bcycle.com/docs/librariesprovider34/default-document-library/2014denverbcycletripdata_public.xlsx?sfvrsn=2).
- B-Cycle Station Locations: From the Google Maps locations on https://denver.bcycle.com/. I was unable to easily programmatically access the map layer data, so collected these addresses "manually."
- Weather Data: Downloaded from forecast.io via the developer API.
- ggmap package in R: https://cran.r-project.org/web/packages/ggmap/index.html. Used to access between-station distances.
- Holidays 2014: opm.gov and Google search for Cesar Chavez Day, the latter of which is a City of Denver public holiday.
All data analysis was done using R in RStudio. The following R packages are required in order to re-run the final R-script and the Data Building/Exploratory Analysis file exploring_bcycle_data.Rmd
. Note that to fully run the exploring_bcycle_data.Rmd file, you will need your own forecast.io developer API key, and will need to run the kiosk_pairs
code chunk over the course of at least 3 days (there is a limit of 2500 calls to Google Distance Matrix API using the mapdist
code per day).
library(ggplot2); library(dplyr); library(tidyr)
library(lubridate); library(xml2); library(readxl)
library(ggmap); library(logging); library(jsonlite)
A tip of the hat to Denver B-Cycle's annual report. Due to some likely differences in data processing and analysis, some of our conclusions (such as number of trips and miles ridden) are different than the official B-cycle analysis. However, my analysis couldn't have been possible without first reading their analysis and learning about the data, and certainly would not have been possible without the public availability of their ridership data.