Skip to content

Extract

Yogita edited this page Jun 27, 2021 · 16 revisions

Below is the summary of data sources in this project: Python script for downloading the files from Seattle Open Data to Azure file share.

Paid Parking Data is available for the city of Seattle in form of CSV from 2012 to the Present. Except for the year 2020 (pandemic) all other years had file size of about 42 GB. Downloading the files was not straightforward as each file has a unique code associated with it. To fully automate the ingestion process, the code was extracted via an python automation script using Selenium and Headless Chrome browser.

Method Source Feature/Key Frequency Description
Python/Selenium Seattle Open Data 2012 Year-to-Date Historic Once Entire Paid Parking records for the year 2012
Python/Selenium Seattle Open Data 2013 Year-to-Date Historic Once Entire Paid Parking records for the year 2013
Python/Selenium Seattle Open Data 2014 Year-to-Date Historic Once Entire Paid Parking records for the year 2014
Python/Selenium Seattle Open Data 2015 Year-to-Date Historic Once Entire Paid Parking records for the year 2015
Python/Selenium Seattle Open Data 2016 Year-to-Date Historic Once Entire Paid Parking records for the year 2016
Python/Selenium Seattle Open Data 2017 Year-to-Date Historic Once Entire Paid Parking records for the year 2017
Python/Selenium Seattle Open Data 2018 Year-to-Date Historic Once Entire Paid Parking records for the year 2018
Python/Selenium Seattle Open Data 2019 Year-to-Date Historic Once Entire Paid Parking records for the year 2019
Python/Selenium Seattle Open Data 2020 Year-to-Date Historic Once Entire Paid Parking records for the year 2020
Python/Selenium Seattle Open Data 2021 Year-to-Date Delta Daily Delta Paid Parking records for the year 2021
Python/Selenium Blockface Daily 2021 Year-to-Date Delta Entire Paid Parking records for the year 2021

Extraction Code: `driver = webdriver.Chrome(chrome_options=chrome_options, executable_path=chromeDriver_Path)

    # get request to target the site selenium is active on
    driver.get(self.seattle_open_data_url)
    time.sleep(2)

    # Enter the search '{Year} Paid Parking' in the search bar 
    search_data = driver.find_element_by_xpath(self.search_dataByYear)
    if year !=current_year:
        search_data.send_keys("{} Paid Parking".format(year))
    else:
        search_data.send_keys("Paid Parking Last 30 days")
    time.sleep(4)

    # Click on the search '{Year} Paid Parking' in the dropdown 
    print(driver.find_element_by_xpath(self.parking_Occpn_Option).text)
    driver.find_element_by_xpath(self.parking_Occpn_Option).click()

    time.sleep(10)

    # Get the URL of Parking Occupancy Data by Year
    url=driver.find_element_by_xpath(self.parking_Occpn_Option_ByYear).get_attribute("href")
    global url_type, file_extn

    urls=url.split("/")

    if "Archive" in url:
        url_type ="Archive"
        file_extn =".zip" 
    else:
        url_type = "Latest"
        file_extn = ".csv"
    
    code=urls[5]

` Paid Parking Dataset (2012-2017)

Interested columns to clean and transform

  • occupancydatetime
  • paidoccupancy
  • blockfacename
  • sideofstreet
  • sourceelementkey
  • parkingtimelimitcategory
  • available_spots
  • paidparkingarea
  • paidparkingsubarea
  • paidparkingrate
  • parkingcategory
  • latitude
  • longitude

Paid Parking Dataset (2018- Present)

  • occupancydatetime
  • paidoccupancy
  • blockfacename
  • sideofstreet
  • sourceelementkey
  • parkingtimelimitcategory
  • available_spots
  • paidparkingarea
  • paidparkingsubarea
  • paidparkingrate
  • parkingcategory
  • location

Blockface

Clone this wiki locally