-
Notifications
You must be signed in to change notification settings - Fork 0
Extract
Below is the summary of data sources in this project: Python script for downloading the files from Seattle Open Data to Azure file share.
Paid Parking Data is available for the city of Seattle in form of CSV
from 2012 to the Present. Except for the year 2020 (pandemic) all other years had file size of about 42 GB. Downloading the files was not straightforward as each file has a unique code associated with it. To fully automate the ingestion process, the code was extracted via an python automation script using Selenium and Headless Chrome browser.
Method | Source | Feature/Key | Frequency | Description |
---|---|---|---|---|
Python/Selenium | Seattle Open Data | 2012 Year-to-Date Historic | Once | Entire Paid Parking records for the year 2012 |
Python/Selenium | Seattle Open Data | 2013 Year-to-Date Historic | Once | Entire Paid Parking records for the year 2013 |
Python/Selenium | Seattle Open Data | 2014 Year-to-Date Historic | Once | Entire Paid Parking records for the year 2014 |
Python/Selenium | Seattle Open Data | 2015 Year-to-Date Historic | Once | Entire Paid Parking records for the year 2015 |
Python/Selenium | Seattle Open Data | 2016 Year-to-Date Historic | Once | Entire Paid Parking records for the year 2016 |
Python/Selenium | Seattle Open Data | 2017 Year-to-Date Historic | Once | Entire Paid Parking records for the year 2017 |
Python/Selenium | Seattle Open Data | 2018 Year-to-Date Historic | Once | Entire Paid Parking records for the year 2018 |
Python/Selenium | Seattle Open Data | 2019 Year-to-Date Historic | Once | Entire Paid Parking records for the year 2019 |
Python/Selenium | Seattle Open Data | 2020 Year-to-Date Historic | Once | Entire Paid Parking records for the year 2020 |
Python/Selenium | Seattle Open Data | 2021 Year-to-Date Delta | Daily | Delta Paid Parking records for the year 2021 |
Python/Selenium | Blockface | Daily | 2021 Year-to-Date Delta | Entire Paid Parking records for the year 2021 |
Extraction Code: `driver = webdriver.Chrome(chrome_options=chrome_options, executable_path=chromeDriver_Path)
# get request to target the site selenium is active on
driver.get(self.seattle_open_data_url)
time.sleep(2)
# Enter the search '{Year} Paid Parking' in the search bar
search_data = driver.find_element_by_xpath(self.search_dataByYear)
if year !=current_year:
search_data.send_keys("{} Paid Parking".format(year))
else:
search_data.send_keys("Paid Parking Last 30 days")
time.sleep(4)
# Click on the search '{Year} Paid Parking' in the dropdown
print(driver.find_element_by_xpath(self.parking_Occpn_Option).text)
driver.find_element_by_xpath(self.parking_Occpn_Option).click()
time.sleep(10)
# Get the URL of Parking Occupancy Data by Year
url=driver.find_element_by_xpath(self.parking_Occpn_Option_ByYear).get_attribute("href")
global url_type, file_extn
urls=url.split("/")
if "Archive" in url:
url_type ="Archive"
file_extn =".zip"
else:
url_type = "Latest"
file_extn = ".csv"
code=urls[5]
` Paid Parking Dataset (2012-2017)
Interested columns to clean and transform
- occupancydatetime
- paidoccupancy
- blockfacename
- sideofstreet
- sourceelementkey
- parkingtimelimitcategory
- available_spots
- paidparkingarea
- paidparkingsubarea
- paidparkingrate
- parkingcategory
- latitude
- longitude
Paid Parking Dataset (2018- Present)
- occupancydatetime
- paidoccupancy
- blockfacename
- sideofstreet
- sourceelementkey
- parkingtimelimitcategory
- available_spots
- paidparkingarea
- paidparkingsubarea
- paidparkingrate
- parkingcategory
- location
Blockface