# Google Project Sunroof - Web scraper assessment

Sunroof does not have an API that I could find. I've emailed their team to inquire and they are interested in our particular use case. In the meantime I'll figure out whether a web scraper is feasible in a reasonable amount of time. 

### &diams;&check; Download the chromedriver and set up an instance

In [11]:
# URL for selenium chwromedriver
url = 'https://chromedriver.chromium.org/downloads'
# download the right version, put it in this directory and 
# ensure it clears the security protocols on your local machine


In [68]:
# import libraries
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
import pandas as pd

#import helper functions
from helper_functions import get_numeric

### &diams; Create a DataFrame that will store our results

In [69]:
df = pd.DataFrame(columns=['Address', 'Annual Sunlight (Hours)', 'Roof Area (sq ft)', 'Savings ($)'])

In [70]:
# instantiate an automated browser
driver = webdriver.Chrome(service=Service('chromedriver'), options=webdriver.ChromeOptions())
driver.maximize_window()

### &diams;&check; Get the sunroof homepage and obtain the results for an address

In [71]:
address = '6968 Almondine Dr, Garden Grove CA 92845'

driver.get('https://sunroof.withgoogle.com/') 
address_input = driver.find_element(by=By.CSS_SELECTOR, value="md-autocomplete.address-input")
address_input.send_keys(address) # input address 
address_input.send_keys(Keys.RETURN) # open the dropdown menu of addresses
address_input.send_keys(Keys.DOWN) # tab down to the top result 
address_input.send_keys(Keys.RETURN) # execute the search and load the next page

### &diams; Parse the HTML for the numbers we need and store them in variables

In [74]:
panel_facts = driver.find_elements(by=By.CLASS_NAME, value='panel-fact-text')
sunlight_hours, sq_ft = [fact.text for fact in panel_facts]
savings = driver.find_element(by=By.CLASS_NAME, value='panel-estimate-savings').text
sunlight_hours, sq_ft, savings = list(map(lambda string: get_numeric(string), (sunlight_hours, sq_ft, savings)))
sq_ft

1445

### &diams; Build a dataframe with the results

In [75]:
data = pd.DataFrame(
    {
    'Address': [address]
    , 'Annual Sunlight (Hours)': [sunlight_hours]
    , 'Roof Ares (sq ft)': [sq_ft]
    , 'Savings ($)': [savings]
    }
)
df = pd.concat([df, data], ignore_index=True)
df

Unnamed: 0,Address,Annual Sunlight (Hours),Roof Area (sq ft),Savings ($),Roof Ares (sq ft)
0,"6968 Almondine Dr, Garden Grove CA 92845",1811,,12000,1445.0
