Skip to content

Extract

Yogita edited this page Jun 27, 2021 · 16 revisions

Below is the summary of data sources in this project: Python script for downloading the files from Seattle Open Data to Azure file share.

Paid Parking Data is available for the city of Seattle in form of CSV from 2012 to the Present. Except for the year 2020 (pandemic) all other years had file size of about 42 GB. Downloading the files was not straightforward as each file has a unique code associated with it. To fully automate the ingestion process, the code was extracted via an python automation script using Selenium and Headless Chrome browser.

Method Source Feature/Key Frequency Description
Python/Selenium Seattle Open Data 2012 Year-to-Date Historic Once Entire Paid Parking records for the year 2012
Python/Selenium Seattle Open Data 2013 Year-to-Date Historic Once Entire Paid Parking records for the year 2013
Python/Selenium Seattle Open Data 2014 Year-to-Date Historic Once Entire Paid Parking records for the year 2014
Python/Selenium Seattle Open Data 2015 Year-to-Date Historic Once Entire Paid Parking records for the year 2015
Python/Selenium Seattle Open Data 2016 Year-to-Date Historic Once Entire Paid Parking records for the year 2016
Python/Selenium Seattle Open Data 2017 Year-to-Date Historic Once Entire Paid Parking records for the year 2017
Python/Selenium Seattle Open Data 2018 Year-to-Date Historic Once Entire Paid Parking records for the year 2018
Python/Selenium Seattle Open Data 2019 Year-to-Date Historic Once Entire Paid Parking records for the year 2019
Python/Selenium Seattle Open Data 2020 Year-to-Date Historic Once Entire Paid Parking records for the year 2020
Python/Selenium Seattle Open Data 2021 Year-to-Date Delta Daily Delta Paid Parking records for the year 2021
Python/Selenium Blockface Daily 2021 Year-to-Date Delta Entire Paid Parking records for the year 2021

Extraction Code:

driver = webdriver.Chrome(chrome_options=chrome_options, executable_path=chromeDriver_Path)

driver.get(self.seattle_open_data_url)
time.sleep(2)
# Enter the search '{Year} Paid Parking' in the search bar 
search_data = driver.find_element_by_xpath(self.search_dataByYear)
if year !=current_year:
   search_data.send_keys("{} Paid Parking".format(year))
else:
   search_data.send_keys("Paid Parking Last 30 days")
   time.sleep(4)

# Click on the search '{Year} Paid Parking' in the dropdown 
print(driver.find_element_by_xpath(self.parking_Occpn_Option).text)
driver.find_element_by_xpath(self.parking_Occpn_Option).click()

time.sleep(10)
# Get the URL of Parking Occupancy Data by Year
url=driver.find_element_by_xpath(self.parking_Occpn_Option_ByYear).get_attribute("href")
global url_type, file_extn

urls=url.split("/")
if "Archive" in url:
   url_type ="Archive"
   file_extn =".zip" 
else:
   url_type = "Latest"
   file_extn = ".csv"
        
code=urls[5]

Paid Parking Dataset (2012-2017)

Interested columns to clean and transform. Historic data needs to be transformed to get them in common format.

Column Description
occupancydatetime The date and time (minute) of the transaction as recorded
paidoccupancy This is the number of vehicles paid for parking at this time.
blockfacename Street segment, name of street with the “from street” and “to street;" Example is "1ST AVE BETWEEN BELL ST AND BATTERY ST"
sideofstreet Options are: E, S, N, W, NE, SW, SE, NW
sourceelementkey Unique identifier for the city street segment where the pay station is located
parkingtimelimitcategory In minutes. Options are 120 (2-hour parking), 240 (4-hour parking), 30, or 600 (10-hour parking)
available_spots Number of paid spaces on the blockface at the given date and time.
paidparkingarea The primary name of a paid parking neighborhood. Example is Commercial Core.
paidparkingsubarea A subset of a paid parking area—not all paid parking areas have subareas.
paidparkingrate Parking rate charged at date and time
parkingcategory An overall description of the type of parking allowed on a blockface
latitude Latitude of a location
longitude Longitude of a location

Paid Parking Dataset (2018- Present)

Column Description
occupancydatetime The date and time (minute) of the transaction as recorded
paidoccupancy This is the number of vehicles paid for parking at this time.
blockfacename Street segment, name of street with the “from street” and “to street;" Example is "1ST AVE BETWEEN BELL ST AND BATTERY ST"
sideofstreet Options are: E, S, N, W, NE, SW, SE, NW
sourceelementkey Unique identifier for the city street segment where the pay station is located
parkingtimelimitcategory In minutes. Options are 120 (2-hour parking), 240 (4-hour parking), 30, or 600 (10-hour parking)
available_spots Number of paid spaces on the blockface at the given date and time.
paidparkingarea The primary name of a paid parking neighborhood. Example is Commercial Core.
paidparkingsubarea A subset of a paid parking area—not all paid parking areas have subareas.
paidparkingrate Parking rate charged at date and time
parkingcategory An overall description of the type of parking allowed on a blockface
Location Calculated based on the known location of a pay station along the same blockface.

Blockface

Column Description
station_id The date and time (minute) of the transaction as recorded
station_address This is the number of vehicles paid for parking at this time.
side Street segment, name of street with the “from street” and “to street;" Example is "1ST AVE BETWEEN BELL ST AND BATTERY ST"
block_nbr Options are: E, S, N, W, NE, SW, SE, NW
parking_category Unique identifier for the city street segment where the pay station is located
wkd_rate1 In minutes. Options are 120 (2-hour parking), 240 (4-hour parking), 30, or 600 (10-hour parking)
wkd_start1 Number of paid spaces on the blockface at the given date and time.
wkd_end1 The primary name of a paid parking neighborhood. Example is Commercial Core.
wkd_rate2 In minutes. Options are 120 (2-hour parking), 240 (4-hour parking), 30, or 600 (10-hour parking)
wkd_start2 Number of paid spaces on the blockface at the given date and time.
wkd_end2 The primary name of a paid parking neighborhood. Example is Commercial Core.
wkd_rate3 In minutes. Options are 120 (2-hour parking), 240 (4-hour parking), 30, or 600 (10-hour parking)
wkd_start3 Number of paid spaces on the blockface at the given date and time.
wkd_end3 The primary name of a paid parking neighborhood. Example is Commercial Core.
sat_rate1 In minutes. Options are 120 (2-hour parking), 240 (4-hour parking), 30, or 600 (10-hour parking)
sat_start1 Number of paid spaces on the blockface at the given date and time.
sat_end1 The primary name of a paid parking neighborhood. Example is Commercial Core.
sat_start2 In minutes. Options are 120 (2-hour parking), 240 (4-hour parking), 30, or 600 (10-hour parking)
wkd_start3 Number of paid spaces on the blockface at the given date and time.
sat_end2 The primary name of a paid parking neighborhood. Example is Commercial Core.
sat_start3 In minutes. Options are 120 (2-hour parking), 240 (4-hour parking), 30, or 600 (10-hour parking)
wkd_start3 Number of paid spaces on the blockface at the given date and time.
sat_end3 The primary name of a paid parking neighborhood. Example is Commercial Core.
parking_time_limit The primary name of a paid parking neighborhood. Example is Commercial Core.
subarea The primary name of a paid parking neighborhood. Example is Commercial Core.
Clone this wiki locally