# Final Project Plan


## Introduction and Background

For my final project, I'm interested in learning more about how people park in Seattle. More specifically, I'd like to know if actual parking data can illuminate ways that drivers can improve their parking experience by taking advantage of underutilized parking spots, favorable times of day, or any other interesting patterns when parking in the city. These same patterns hopefully could inform ways local government could better optimize their parking policies including spot allocation, payment, zones, and ticketing/enforcement. Since I both live and own a car in Seattle, I'm interested in these dynamics here in the city on a practical level, and think it could be practically useful for others as well. 

The City of Seattle sees the potential value in identifying parking trends, as they've released paid parking studies annually since 2010 [1]. These studies have been made possible by data collected from smart parking meters deployed city-wide, and were motivated to better set parking price rates. Also in 2010, the Performance-Based Parking Pricing Program tied pricing to the performance goal of one to two spaces being open and available throughout the day [2]. The Seattle Department of Transportation has made over 250 changes to paid parking rates using these data over time, and proposed 32 more (17 rate increases, 14 rate decreases, 1 hour change) in their 2018 parking study.

A past DATA 512 student, Rex Thompson, looked at Seattle street parking data from 2012 to 2017 and found that drivers paid over 88 thousand dollars to park during peak hours when technically not allowed to do so, but that the city has since likely fixed this issue [3]. While this is intruiging and hints at other "bugs in the system" that are worth exploring, in this project I plan on looking more closely at location-specific city parking patterns within normal paid parking hours and policies.


## Data

The data set I'm using for this project is updated daily by the Seattle Department of Transportation and contains the last 30 days of paid parking occupancy in Seattle [4]. Government data, it's been released into the public domain, and contains no personally-identifiable information such as hashed or plaintext names, license numbers, car makes/models, etc. The provided license URL for all SDOT released data is at http://www.seattle.gov/transportation/projects-and-programs/programs/parking-program/maps-and-data. I downloaded the parking data on November 7th, and although updates come in daily, due to processing there is a 7 day delay for data to be uploaded to the site. Therefore the data I'll be working with starts on October 2nd 2019, ends on October 31st 2019, and contains 24.7 million individual paid parking instances. 

Each paid parking record has the following fields which I plan to use: Timestamp, Paid Occupancy (number of total paid cars on the block at that time), Block, Side of Street, Parking Time Limit, Number of Parking Spaces on the block, Neighborhood, Sub-neighborhood, Zone Type.  

Probably the biggest limitation to this dataset is that it necessarily doesn't include unpaid parking whether done legally or illegally. Complicating this assumption is that there are many legal exceptions for paying (Sundays, disabled tags, etc.) as well as illegal ones (bus stops during rush hour, food truck locations, etc.) However, by observing where people have paid and potentially comparing it with other public sources of data like neighborhood populations, I still hope to glean some interesting takeaways. The 30 day cap to the data also means I will miss out on any seasonal effects, but hopefully can still pick up on weekly trends. Ethically, I think using this data set is fairly benign with two exceptions: the first is that if I find any obvious exploits for parking illegally in Seattle without being caught, I probably should keep those to myself. And next, making it easier to park in the city could encourage people to choose driving over public transportation more often which could have negative environmental effects.



## Research Questions

My research questions are as follows: 

1. Which Seattle neighborhoods (and subneighborhoods) are the hardest and easiest in which to find paid parking spots?
    1.1. Are there any specific blocks that act as useful outliers within these neighborhoods? 
2. At which times of day and on which days of the week is it hardest and easiest to find paid parking spots, and how does this vary by neighborhood?
    2.1. Are there any specific blocks that act as useful outliers within these times?


## Methodology

My first method will be to calculate ease of finding parking, or number of occupied spots divided by total spots on a block. I'll just call this "occupancy" after what the city calls it in their reports. Note that this will be a proxy for ease of parking due to a lack of having the real occupation numbers (people who don't pay when they should). However, it should still make for coherent comparisons among times, blocks, and neighborhoods. 

Next, I plan to plot "occupancy" averaged over the month's data sorted by neighborhood and subneighborhood. I will then do the same analysis at the individual block level to try and find particularly under-occupied or over-occupied blocks that I can show on a map in-figure. 

Lastly, I will plot a time series of occupancy over the course of the average day for each weekday city-wide and for some selected neighborhoods and blocks (likely some of the most and least occupied). 


## Limitations and Commentary

Since filling out my first project proposal, I am less worried about how to visualize my analysis or supplement it with additional datasets. I think that I will be able to answer my research questions based on the data I already have available. However, there are a few things that now concern me. First, I worry that I'm duplicating findings that the city publish in its 2019 annual report. Hopefully this is mitigated by three things I'm planning though:  making my analysis and code fully transparent and replilcable, reporting on specific block outliers, and looking at day-of-the-week trends. Next, I worry that parsing, slicing, and dicing the data in order to get what I want (timestamp parsing, block lookup, etc.) will be so time-consuming that I will need to leave parts of my research questions underexplored, though I think the project is now at a scope that this is less scary. Finally, the data set itself is massive (almost 4GB) so I will likely need to work from a random sample in order to get my full replicable analysis onto free github. 


## Sources
1. https://data.seattle.gov/api/views/rke9-rsvs/files/4c21d126-ae8f-49f3-8ab9-9e0281542534?download=true&filename=Paid_Parking_Occupancy_Metadata.pdf
2. https://www.seattle.gov/Documents/Departments/SDOT/ParkingProgram/PaidParking/SDOT_AnnualReport2018.pdf
3. https://github.com/rexthompson/DATA-512-Final-Project/blob/master/DATA-512-Final-Project.ipynb
4. https://data.seattle.gov/Transportation/Paid-Parking-Occupancy-Last-30-Days-/rke9-rsvs