-
Notifications
You must be signed in to change notification settings - Fork 0
Extract
Below is the summary of data sources in this project: Python script for downloading the files from Seattle Open Data to Azure file share.
Paid Parking Data is available for the city of Seattle in form of CSV
from 2012 to the Present. Except for the year 2020 (pandemic) all other years had file size of about 42 GB. Downloading the files was not straightforward as each file has a unique code associated with it. To fully automate the ingestion process, the code was extracted via an python automation script using Selenium and Headless Chrome browser.
Method | Source | Feature/Key | Frequency | Description |
---|---|---|---|---|
Python/Selenium | Seattle Open Data | 2012 Year-to-Date Historic | Once | Entire Paid Parking records for the year 2012 |
Python/Selenium | Seattle Open Data | 2013 Year-to-Date Historic | Once | Entire Paid Parking records for the year 2013 |
Python/Selenium | Seattle Open Data | 2014 Year-to-Date Historic | Once | Entire Paid Parking records for the year 2014 |
Python/Selenium | Seattle Open Data | 2015 Year-to-Date Historic | Once | Entire Paid Parking records for the year 2015 |
Python/Selenium | Seattle Open Data | 2016 Year-to-Date Historic | Once | Entire Paid Parking records for the year 2016 |
Python/Selenium | Seattle Open Data | 2017 Year-to-Date Historic | Once | Entire Paid Parking records for the year 2017 |
Python/Selenium | Seattle Open Data | 2018 Year-to-Date Historic | Once | Entire Paid Parking records for the year 2018 |
Python/Selenium | Seattle Open Data | 2019 Year-to-Date Historic | Once | Entire Paid Parking records for the year 2019 |
Python/Selenium | Seattle Open Data | 2020 Year-to-Date Historic | Once | Entire Paid Parking records for the year 2020 |
Python/Selenium | Seattle Open Data | 2021 Year-to-Date Delta | Daily | Delta Paid Parking records for the year 2021 |
Python/Selenium | Blockface | Daily | 2021 Year-to-Date Delta | Entire Paid Parking records for the year 2021 |
Extraction Code:
driver = webdriver.Chrome(chrome_options=chrome_options, executable_path=chromeDriver_Path)
driver.get(self.seattle_open_data_url)
time.sleep(2)
# Enter the search '{Year} Paid Parking' in the search bar
search_data = driver.find_element_by_xpath(self.search_dataByYear)
if year !=current_year:
search_data.send_keys("{} Paid Parking".format(year))
else:
search_data.send_keys("Paid Parking Last 30 days")
time.sleep(4)
# Click on the search '{Year} Paid Parking' in the dropdown
print(driver.find_element_by_xpath(self.parking_Occpn_Option).text)
driver.find_element_by_xpath(self.parking_Occpn_Option).click()
time.sleep(10)
# Get the URL of Parking Occupancy Data by Year
url=driver.find_element_by_xpath(self.parking_Occpn_Option_ByYear).get_attribute("href")
global url_type, file_extn
urls=url.split("/")
if "Archive" in url:
url_type ="Archive"
file_extn =".zip"
else:
url_type = "Latest"
file_extn = ".csv"
code=urls[5]
Paid Parking Dataset (2012-2017)
Interested columns to clean and transform. Historic data needs to be transformed to get them in common format.
Column | Description |
---|---|
occupancydatetime | The date and time (minute) of the transaction as recorded |
paidoccupancy | This is the number of vehicles paid for parking at this time. |
blockfacename | Street segment, name of street with the “from street” and “to street;" Example is "1ST AVE BETWEEN BELL ST AND BATTERY ST" |
sideofstreet | Options are: E, S, N, W, NE, SW, SE, NW |
sourceelementkey | Unique identifier for the city street segment where the pay station is located |
parkingtimelimitcategory | In minutes. Options are 120 (2-hour parking), 240 (4-hour parking), 30, or 600 (10-hour parking) |
available_spots | Number of paid spaces on the blockface at the given date and time. |
paidparkingarea | The primary name of a paid parking neighborhood. Example is Commercial Core. |
paidparkingsubarea | A subset of a paid parking area—not all paid parking areas have subareas. |
paidparkingrate | Parking rate charged at date and time |
parkingcategory | An overall description of the type of parking allowed on a blockface |
latitude | Latitude of a location |
longitude | Longitude of a location |
Paid Parking Dataset (2018- Present)
Column | Description |
---|---|
occupancydatetime | The date and time (minute) of the transaction as recorded |
paidoccupancy | This is the number of vehicles paid for parking at this time. |
blockfacename | Street segment, name of street with the “from street” and “to street;" Example is "1ST AVE BETWEEN BELL ST AND BATTERY ST" |
sideofstreet | Options are: E, S, N, W, NE, SW, SE, NW |
sourceelementkey | Unique identifier for the city street segment where the pay station is located |
parkingtimelimitcategory | In minutes. Options are 120 (2-hour parking), 240 (4-hour parking), 30, or 600 (10-hour parking) |
available_spots | Number of paid spaces on the blockface at the given date and time. |
paidparkingarea | The primary name of a paid parking neighborhood. Example is Commercial Core. |
paidparkingsubarea | A subset of a paid parking area—not all paid parking areas have subareas. |
paidparkingrate | Parking rate charged at date and time |
parkingcategory | An overall description of the type of parking allowed on a blockface |
Location | Calculated based on the known location of a pay station along the same blockface. |
Blockface
Column | Description |
---|---|
station_id | The date and time (minute) of the transaction as recorded |
station_address | This is the number of vehicles paid for parking at this time. |
side | Street segment, name of street with the “from street” and “to street;" Example is "1ST AVE BETWEEN BELL ST AND BATTERY ST" |
block_nbr | Options are: E, S, N, W, NE, SW, SE, NW |
parking_category | Unique identifier for the city street segment where the pay station is located |
wkd_rate1 | In minutes. Options are 120 (2-hour parking), 240 (4-hour parking), 30, or 600 (10-hour parking) |
wkd_start1 | Number of paid spaces on the blockface at the given date and time. |
wkd_end1 | The primary name of a paid parking neighborhood. Example is Commercial Core. |
wkd_rate2 | In minutes. Options are 120 (2-hour parking), 240 (4-hour parking), 30, or 600 (10-hour parking) |
wkd_start2 | Number of paid spaces on the blockface at the given date and time. |
wkd_end2 | The primary name of a paid parking neighborhood. Example is Commercial Core. |
wkd_rate3 | In minutes. Options are 120 (2-hour parking), 240 (4-hour parking), 30, or 600 (10-hour parking) |
wkd_start3 | Number of paid spaces on the blockface at the given date and time. |
wkd_end3 | The primary name of a paid parking neighborhood. Example is Commercial Core. |
sat_rate1 | In minutes. Options are 120 (2-hour parking), 240 (4-hour parking), 30, or 600 (10-hour parking) |
sat_start1 | Number of paid spaces on the blockface at the given date and time. |
sat_end1 | The primary name of a paid parking neighborhood. Example is Commercial Core. |
sat_start2 | In minutes. Options are 120 (2-hour parking), 240 (4-hour parking), 30, or 600 (10-hour parking) |
wkd_start3 | Number of paid spaces on the blockface at the given date and time. |
sat_end2 | The primary name of a paid parking neighborhood. Example is Commercial Core. |
sat_start3 | In minutes. Options are 120 (2-hour parking), 240 (4-hour parking), 30, or 600 (10-hour parking) |
wkd_start3 | Number of paid spaces on the blockface at the given date and time. |
sat_end3 | The primary name of a paid parking neighborhood. Example is Commercial Core. |
parking_time_limit | The primary name of a paid parking neighborhood. Example is Commercial Core. |
subarea | The primary name of a paid parking neighborhood. Example is Commercial Core. |