## Quick README

Note: This code is run on Colab due to collaboration restraints on the Stanford Team

TODO 1-4.5 were exercises completed by the Stanford fellowship, the full functioning code is in the last cell, called, run_all ()

#### Note: some of our code is written w/ X-Path parsing which we have observed to potentially change (among other website changes), this code is up to date as of 7/16/24
#### Note: also the saved data schema is slightly different from the spec but a pretty trivial transformation either way
#### Note: the code is pretty first order robust although rerunning on errors, seem finicky-- we are however, logging often and should be good to rerun from a log. Currently nothing clever from scraping best practices. We should probably log errors better, but hard to get a good sense of logic vs network errors.
#### Note: Some basic experiment showed like 20+seconds per one result which seems hopelessly slow. Quick fixes here might be 1. getting the implicit waits to be explicit (currently not working) 2. using driver.find_elements instead of try catch statements 3. maybe some multi-threading or batching but as far as I know Selenium is not thread safe.

In [None]:
%%shell
# Ubuntu no longer distributes chromium-browser outside of snap
#
# Proposed solution: https://askubuntu.com/questions/1204571/how-to-install-chromium-without-snap

# Add debian buster
cat > /etc/apt/sources.list.d/debian.list <<'EOF'
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-buster.gpg] http://deb.debian.org/debian buster main
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-buster-updates.gpg] http://deb.debian.org/debian buster-updates main
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-security-buster.gpg] http://deb.debian.org/debian-security buster/updates main
EOF

# Add keys
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys DCC9EFBF77E11517
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 648ACFD622F3D138
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 112695A0E562B32A

apt-key export 77E11517 | gpg --dearmour -o /usr/share/keyrings/debian-buster.gpg
apt-key export 22F3D138 | gpg --dearmour -o /usr/share/keyrings/debian-buster-updates.gpg
apt-key export E562B32A | gpg --dearmour -o /usr/share/keyrings/debian-security-buster.gpg

# Prefer debian repo for chromium* packages only
# Note the double-blank lines between entries
cat > /etc/apt/preferences.d/chromium.pref << 'EOF'
Package: *
Pin: release a=eoan
Pin-Priority: 500


Package: *
Pin: origin "deb.debian.org"
Pin-Priority: 300


Package: chromium*
Pin: origin "deb.debian.org"
Pin-Priority: 700
EOF

# Install chromium and chromium-driver
apt-get update
apt-get install chromium chromium-driver

# Install selenium
pip install selenium

Executing: /tmp/apt-key-gpghome.HWLaFA9wnW/gpg.1.sh --keyserver keyserver.ubuntu.com --recv-keys DCC9EFBF77E11517
gpg: key DCC9EFBF77E11517: public key "Debian Stable Release Key (10/buster) <debian-release@lists.debian.org>" imported
gpg: Total number processed: 1
gpg:               imported: 1
Executing: /tmp/apt-key-gpghome.4WcFYlrSYk/gpg.1.sh --keyserver keyserver.ubuntu.com --recv-keys 648ACFD622F3D138
gpg: key DC30D7C23CBBABEE: public key "Debian Archive Automatic Signing Key (10/buster) <ftpmaster@debian.org>" imported
gpg: Total number processed: 1
gpg:               imported: 1
Executing: /tmp/apt-key-gpghome.swzRUOLBjh/gpg.1.sh --keyserver keyserver.ubuntu.com --recv-keys 112695A0E562B32A
gpg: key 4DFAB270CAA96DFA: public key "Debian Security Archive Automatic Signing Key (10/buster) <ftpmaster@debian.org>" imported
gpg: Total number processed: 1
gpg:               imported: 1
Get:1 http://deb.debian.org/debian buster InRelease [122 kB]
Get:2 http://deb.debian.org/debian bust



## Development

In [None]:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service

service = Service(executable_path=r'/usr/bin/chromedriver')
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')

url = "https://rechart1.acgov.org/RealEstate/SearchEntry.aspx" # Go directly to the Real Estate page

driver = webdriver.Chrome(service=service, options=options)
driver.get(url)
driver.implicitly_wait(10)

In [None]:
from selenium.webdriver.common.by import By

# Ack the Disclaimer
accept_button = driver.find_element(By.ID, "cph1_lnkAccept")
assert accept_button.text == "Click here to acknowledge the disclaimer and enter the site."
accept_button.click()

Selenium Documentation
NAVIGATING & INTERACTING w/ WEBSITE: https://selenium-python.readthedocs.io/navigating.html

LOCATING STUFF on WEBSITE: https://selenium-python.readthedocs.io/locating-elements.html

WAITS: https://selenium-python.readthedocs.io/waits.html

In [None]:
# TODO 1: create a function that inputs the from date and the to date and presses search
from selenium.webdriver.common.keys import Keys
# Dates should be in the format "mm/dd/yyyy"
def make_search(from_date, to_date):
  input_class_name = "igte_ElectricBlueEditInContainer"

  driver.implicitly_wait(10)
  from_date_element, to_date_element = driver.find_elements(By.CLASS_NAME, input_class_name)

  # Emulate real human key strokes
  from_date_element.send_keys(from_date.replace("/", ""))
  to_date_element.send_keys(to_date.replace("/", ""))

  """ It seems that running the JS alone is not sufficient here """
  # driver.execute_script(f"document.getElementById('{from_date_id}').value = '{from_date}';")
  # driver.execute_script(f"document.getElementById('{to_date_id}').value = '{to_date}';")

  search_button = driver.find_element(By.ID, "cphNoMargin_SearchButtons1_btnSearch")
  search_button.click()

# Example search
make_search("11/04/2004", "02/15/2024")

In [None]:
# TODO 2: print the transaction information (instrument_no (int), recording date (datetime), parcel ID (string), document type (string)) for the first 5 results
def print_information(num_results = 5):
  driver.implicitly_wait(25)
  test = driver.find_element(By.ID, "cphNoMargin_cphNoMargin_SearchCriteriaTop_Criteria")
  print (test.text)

  def fill_x_path (row, info_type):
    """Returns the correct x-path for the desired table row and column type"""
    if (info_type == "recording_date"):
      col = 8
    elif (info_type == "parcel_ID"):
      col = 18
    elif (info_type == "document_type"):
      col = 9
    else:
      raise ValueError
    return f"/html/body/div[2]/form/div[3]/div[3]/table/tbody/tr[4]/td/div/div/table/tbody/tr[2]/td/table/tbody[2]/tr/td/div[2]/table/tbody/tr[{row+2}]/td[{col}]"

  transcaction_info = []
  for i in range(num_results):
    transcaction = {}
    transcaction["instrument_no"] = driver.find_element(By.ID, f"ctl00_ctl00_cphNoMargin_cphNoMargin_g_G1_ctl00_it3_{i}_Label1").text
     # NOTE: currently does not return date time, need to get date time information by clicking into the trascaction for the recording date
    transcaction["recording_date"] = driver.find_element(By.XPATH, fill_x_path(i, "recording_date")).text
    transcaction["parcel_ID"] = driver.find_element(By.XPATH, fill_x_path(i, "parcel_ID")).text
    transcaction["document_type"] = driver.find_element(By.XPATH, fill_x_path(i, "document_type")).text
    transcaction_info.append(transcaction)
  return transcaction_info

print_information()

NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"[id="cphNoMargin_cphNoMargin_SearchCriteriaTop_Criteria"]"}
  (Session info: headless chrome=90.0.4430.212); For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors#no-such-element-exception
Stacktrace:
#0 0x569fdb3c67f9 <unknown>
#1 0x569fdb3663b3 <unknown>
#2 0x569fdb0ae016 <unknown>
#3 0x569fdb0e281e <unknown>
#4 0x569fdb1188fb <unknown>
#5 0x569fdb105ded <unknown>
#6 0x569fdb1169e1 <unknown>
#7 0x569fdb105c93 <unknown>
#8 0x569fdb0d7ce4 <unknown>
#9 0x569fdb0d94d2 <unknown>
#10 0x569fdb392542 <unknown>
#11 0x569fdb3a1ce7 <unknown>
#12 0x569fdb3a19e4 <unknown>
#13 0x569fdb3a613a <unknown>
#14 0x569fdb3a25b9 <unknown>
#15 0x569fdb387e00 <unknown>
#16 0x569fdb3b95d2 <unknown>
#17 0x569fdb3b9778 <unknown>
#18 0x569fdb3d1a1f <unknown>
#19 0x7f6a2c83fac3 <unknown>
#20 0x7f6a2c8d1850 <unknown>


In [None]:
# TODO 3: click into the transaction and print the grantor and the grantee for the first result only
def select_first_transaction():
  search_column = driver.find_element(By.ID, 'ctl00_ctl00_cphNoMargin_cphNoMargin_g_G1_ctl00_it3_0_Label1')
  search_column.click()
  driver.implicitly_wait(5)

  personell = {}
  grantors = []

  still_more_grantors = True
  count = 0
  while still_more_grantors:
    try:
      first_name = driver.find_element(By.ID, f'ctl00_cphNoMargin_f_oprTab_tmpl0_DataList11_ctl0{count}_lblGrantorFirstName').text
      last_name = driver.find_element(By.ID, f'ctl00_cphNoMargin_f_oprTab_tmpl0_DataList11_ctl0{count}_lblGrantorLastName').text
      count += 1

      full_name = first_name + " " + last_name
      grantors.append (full_name)
    except:
      still_more_grantors = False

  personell["grantors"] = grantors
  personell["grantee"] = driver.find_element(By.ID, 'ctl00_cphNoMargin_f_oprTab_tmpl0_Datalist1_ctl00_lblGranteeLastName').text
  return personell

print(select_first_transaction())

{'grantors': ['GIOVANNI MATTEUCCI', 'LAURA MATTEUCCI'], 'grantee': 'CAL STATE 9 CREDIT UNION'}


In [None]:
# TODO 4: click into the transaction and print the grantor and the grantee for the first two results, HINT: use driver.back()
def select_first_few_transactions (n = 2):
  all_personell = []

  for i in range (n):
    personell = {}
    try:
      search_column = driver.find_element(By.ID, f'ctl00_ctl00_cphNoMargin_cphNoMargin_g_G1_ctl00_it3_{i}_Label1')
      personell['instrument_no'] = search_column.text
      search_column.click()
    except:
      driver.back()
      search_column = driver.find_element(By.ID, f'ctl00_ctl00_cphNoMargin_cphNoMargin_g_G1_ctl00_it3_{i}_Label1')
      personell['instrument_no'] = search_column.text
      search_column.click()

    driver.implicitly_wait(5)

    grantors = []
    still_more_grantors = True
    count = 0
    while still_more_grantors:
      try:
        first_name = driver.find_element(By.ID, f'ctl00_cphNoMargin_f_oprTab_tmpl0_DataList11_ctl0{count}_lblGrantorFirstName').text
        last_name = driver.find_element(By.ID, f'ctl00_cphNoMargin_f_oprTab_tmpl0_DataList11_ctl0{count}_lblGrantorLastName').text
        count += 1

        full_name = first_name + " " + last_name
        grantors.append (full_name)
      except:
        still_more_grantors = False

    personell["grantors"] = grantors
    personell["grantee"] = driver.find_element(By.ID, 'ctl00_cphNoMargin_f_oprTab_tmpl0_Datalist1_ctl00_lblGranteeLastName').text

    all_personell.append (personell)

  return all_personell

print(select_first_few_transactions())

[{'instrument_no': '2004493950', 'grantors': ['GIOVANNI MATTEUCCI', 'LAURA MATTEUCCI'], 'grantee': 'CAL STATE 9 CREDIT UNION'}, {'instrument_no': '2004493951', 'grantors': ['BRUCE A HIRONAKA', 'VALERIE L HIRONAKA'], 'grantee': 'HIRONAKA'}]


In [None]:
# TODO 4.5: Aaron & other if want, 1. handle the case of multiple grantees, 2. store the date

def multi_grantee(n = 5):
  all_personell = []

  for i in range(n):
      personell = {}
      try:
        search_column = driver.find_element(By.ID, f'ctl00_ctl00_cphNoMargin_cphNoMargin_g_G1_ctl00_it3_{i}_Label1')
        personell['instrument_no'] = search_column.text
        search_column.click()
      except:
        driver.back()
        search_column = driver.find_element(By.ID, f'ctl00_ctl00_cphNoMargin_cphNoMargin_g_G1_ctl00_it3_{i}_Label1')
        personell['instrument_no'] = search_column.text
        search_column.click()

      driver.implicitly_wait(5)


      date = driver.find_element(By.ID, f'ctl00_cphNoMargin_f_oprTab_tmpl0_documentInfoList_ctl00_DataLabel3').text
      grantors = []
      still_more_grantors = True
      count = 0
      while still_more_grantors:
        try:
          first_name = driver.find_element(By.ID, f'ctl00_cphNoMargin_f_oprTab_tmpl0_DataList11_ctl0{count}_lblGrantorFirstName').text
          last_name = driver.find_element(By.ID, f'ctl00_cphNoMargin_f_oprTab_tmpl0_DataList11_ctl0{count}_lblGrantorLastName').text
          count += 1

          full_name = first_name + " " + last_name
          grantors.append (full_name)
        except:
          still_more_grantors = False

      grantees = []
      still_more_grantees = True
      count = 0
      while still_more_grantees:
        try:
          first_name = driver.find_element(By.ID, f'ctl00_cphNoMargin_f_oprTab_tmpl0_Datalist1_ctl0{count}_lblGranteeFirstName').text
          last_name = driver.find_element(By.ID, f'ctl00_cphNoMargin_f_oprTab_tmpl0_Datalist1_ctl0{count}_lblGranteeLastName').text
          count += 1

          full_name = first_name + " " + last_name
          grantees.append(full_name)
        except:
          still_more_grantees = False

      print (grantees)


      personell["grantors"] = grantors
      personell["grantees"] = grantees
      personell["date"] = date
      all_personell.append(personell)

  return all_personell

print (multi_grantee())


#create a list of grantees with the condition that there are more grantors
#create a try function after each time the condition is satisfied
#create count variable that increases every time the number exists and is stored
#store the id number, date grantee(s) name,



[' CAL STATE 9 CREDIT UNION']
['BRUCE TR HIRONAKA', 'VALERIE TR HIRONAKA', 'VALERIE L TRUST HIRONAKA', 'BRUCE A TRUST HIRONAKA']
[' CHASE MANHATTAN BANK USA']
[' CHASE MANHATTAN BANK USA']
[' ']
[{'instrument_no': '2004493950', 'grantors': ['GIOVANNI MATTEUCCI', 'LAURA MATTEUCCI'], 'grantees': [' CAL STATE 9 CREDIT UNION'], 'date': '11/04/2004 08:30:00 AM'}, {'instrument_no': '2004493951', 'grantors': ['BRUCE A HIRONAKA', 'VALERIE L HIRONAKA'], 'grantees': ['BRUCE TR HIRONAKA', 'VALERIE TR HIRONAKA', 'VALERIE L TRUST HIRONAKA', 'BRUCE A TRUST HIRONAKA'], 'date': '11/04/2004 08:30:00 AM'}, {'instrument_no': '2004493952', 'grantors': ['BRUCE A HIRONAKA', 'VALERIE L HIRONAKA'], 'grantees': [' CHASE MANHATTAN BANK USA'], 'date': '11/04/2004 08:30:00 AM'}, {'instrument_no': '2004493953', 'grantors': ['JANE BRUNNER'], 'grantees': [' CHASE MANHATTAN BANK USA'], 'date': '11/04/2004 08:30:00 AM'}, {'instrument_no': '2004493954', 'grantors': ['TERI L CRUZ', 'DANIEL W CRUZ', ' CAL STATE 9 CREDIT 

## FINAL CODE BELOW

In [None]:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service

service = Service(executable_path=r'/usr/bin/chromedriver')
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')

url = "https://rechart1.acgov.org/RealEstate/SearchEntry.aspx" # Go directly to the Real Estate page

driver = webdriver.Chrome(service=service, options=options)
driver.get(url)

In [None]:
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait

"""
IMPORTANT NOTE ON SEARCH BEHAVIOR
By default, if you click the document type boxes and then search, your search results
will be sorted by document type then by date.
For example, you will likely get all the DEED OF TRUST types across your entire date
range (which may be very large) before you get any document types. If this is concerning
you can either search one document type at a time or search without checking boxes
(which will sort results by date) and then filter by document type.
"""
def get_document_types(document_str="DEED"):
  type_xpath_str = "/html/body/div[2]/form/div[3]/div[3]/table/tbody/tr[3]/td/table/tbody/tr[6]/td[2]/div/table/tbody/tr[{}]/td/label"
  idx = 1
  deed_types = []
  while True:
    try:
      document_type = driver.find_element(By.XPATH, type_xpath_str.format(idx)).text
      idx += 1
      if (document_str.upper() in document_type.upper()):
        deed_types.append((document_type, idx))
    except:
      break
  return deed_types

# Dates should be in the format "mm/dd/yyyy"
def make_search(from_date, to_date, document_types = [], document_search_str = "DEED"):
  # Ack the disclaimer
  try:
    # WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID, "cph1_lnkAccept")))
    driver.implicitly_wait(10)
    accept_button = driver.find_element(By.ID, "cph1_lnkAccept")
    accept_button.click()
  except:
    pass

  input_class_name = "igte_ElectricBlueEditInContainer"
  # WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID, input_class_name)))
  driver.implicitly_wait(10)
  from_date_element, to_date_element = driver.find_elements(By.CLASS_NAME, input_class_name)

  # Emulate real human key strokes
  from_date_element.send_keys(from_date.replace("/", ""))
  to_date_element.send_keys(to_date.replace("/", ""))

  # Check the doucment type
  if (len(document_types) > 0): pass # TODO: quick logic to handle this for indices
  else:
    document_types = get_document_types(document_str=document_search_str)

  check_box_id= "cphNoMargin_f_dclDocType_{}"
  for _ ,idx in document_types:
    check_box = driver.find_element(By.ID, check_box_id.format(idx))
    check_box.click()

  search_button = driver.find_element(By.ID, "cphNoMargin_SearchButtons1_btnSearch")
  search_button.click()

In [None]:
import time
import pandas as pd
import os

def save_to_csv(transactions, filename):
    df = pd.DataFrame(transactions)
    # Check if the file exists
    if os.path.isfile(filename):
        # Append to the existing file
        df.to_csv(filename, mode='a', header=False, index=False)
        print(f"Data appended to {filename}")
    else:
        # Create a new file
        df.to_csv(filename, index=False)
        print(f"Data saved to {filename}")

def get_file_len(filename):
    if os.path.isfile(filename):
        df = pd.read_csv(filename)
        return len(df)
    else:
        return 0

def fill_x_path (row, info_type):
    """Returns the correct x-path for the desired table row and column type"""
    if (info_type == "recording_date"):
      col = 8
    elif (info_type == "parcel_ID"):
      col = 18
    elif (info_type == "document_type"):
      col = 9
    else:
      raise ValueError
    return f"/html/body/div[2]/form/div[3]/div[3]/table/tbody/tr[4]/td/div/div/table/tbody/tr[2]/td/table/tbody[2]/tr/td/div[2]/table/tbody/tr[{row+2}]/td[{col}]"

def run_all(from_date, to_date, document_types = [], document_search_str = "DEED", max_num_res = 20, save_increment = 10, should_make_search=True):
  """
  FROM_DATE/TO_DATE should be in the format "mm/dd/yyyy"
  DOCUMENT_TYPES should be a list of tuples (document_type, index)
  DOCUMENT_SEARCH_STR is a string that is used to filter and compute DOCUMENT_TYPES if input is None
  MAX_NUM_RES is the max number of additional transactions to record
  SAVE_INCREMENT is the number of transactions to save at a time
  MAKE_SEARCH will run the search if True, turn off if re-running the search on the same driver browser

  Also for some reason all the explicit waits are not working correctly
  Can be run iteratively to add new data
  """
  if should_make_search:
    make_search(from_date, to_date, document_types, document_search_str)
  outfilename = f'/content/drive/My Drive/AEMP/data/transactions_{from_date.replace("/","-")}_{to_date.replace("/","-")}_{document_search_str}.csv'
  transaction_info = []

  # Confirm we are one the search page, useful for reruns
  try:
    driver.find_element(By.ID, f'ctl00_cphNoMargin_f_oprTab_tmpl0_documentInfoList_ctl00_DataLabel3') # Only seen on the document page
    print ("Already on the search page")
    driver.back() # Go back to the search page
  except:
    pass

  # WebDriverWait(driver, 20).until(EC.presence_of_element_located(
  #     (By.ID, "cphNoMargin_cphNoMargin_SearchCriteriaTop_Criteria")))
  driver.implicitly_wait(20)

  # Pull up untruncated search
  try:
    full_list = driver.find_element(By.ID, "cphNoMargin_cphNoMargin_SearchCriteriaTop_FullCount1")
    full_list.click()
    while (full_list.text != 'count again'):
      time.sleep(1) # The ID doesnt change but the text changes, so I think you need to do this instead of a wait but might be clunky
  except:
    pass

  num_existing_transactions = get_file_len(outfilename)

  for j in range(num_existing_transactions + max_num_res):
    if j % 10 == 0: print (f"TID {j}")
    transaction = {}

    i = j % 25
    # Increment the page
    if j != 0 and i == 0:
      next_page_button = driver.find_element(By.ID, "OptionsBar2_imgNext")
      if ("disabled" in next_page_button.get_attribute('src')): # If the next paged button is grayed
        print ("No more pages")
        break
      next_page_button.click()
      print ("Loading all pages")
      driver.implicitly_wait(10) # Wait for the page to load, not sure how to do this explictly
    elif j == 0:
      # WebDriverWait(driver, 20).until(EC.presence_of_element_located(
      #   (By.ID, 'ctl00_ctl00_cphNoMargin_cphNoMargin_g_G1_ctl00_it3_0_Label1')))
      driver.implicitly_wait(10)

    # Skip if already logged
    if j < num_existing_transactions:
      continue

    # Get search info
    try:
      search_column = driver.find_element(By.ID, f'ctl00_ctl00_cphNoMargin_cphNoMargin_g_G1_ctl00_it3_{i}_Label1')
    except:
      print ("No more results")
      break

    transaction["instrument_no"] = search_column.text
    transaction["parcel_ID"] = driver.find_element(By.XPATH, fill_x_path(i, "parcel_ID")).text
    transaction["document_type"] = driver.find_element(By.XPATH, fill_x_path(i, "document_type")).text

    # Click into the transaction
    search_column.click()
    # WebDriverWait(driver, 5).until(EC.presence_of_element_located(
    #   (By.ID, 'ctl00_cphNoMargin_f_oprTab_tmpl0_documentInfoList_ctl00_DataLabel3')))
    driver.implicitly_wait(5)
    transaction["date"] = driver.find_element(By.ID, 'ctl00_cphNoMargin_f_oprTab_tmpl0_documentInfoList_ctl00_DataLabel3').text

    # Get all the grantor names
    grantors = []
    still_more_grantors = True
    count = 0
    while still_more_grantors:
      try:
        first_name = driver.find_element(By.ID, f'ctl00_cphNoMargin_f_oprTab_tmpl0_DataList11_ctl0{count}_lblGrantorFirstName').text
        last_name = driver.find_element(By.ID, f'ctl00_cphNoMargin_f_oprTab_tmpl0_DataList11_ctl0{count}_lblGrantorLastName').text
        count += 1

        full_name = first_name + " " + last_name
        grantors.append (full_name)
      except:
        still_more_grantors = False

    # Get all the grantee names
    grantees = []
    still_more_grantees = True
    count = 0
    while still_more_grantees:
      try:
        first_name = driver.find_element(By.ID, f'ctl00_cphNoMargin_f_oprTab_tmpl0_Datalist1_ctl0{count}_lblGranteeFirstName').text
        last_name = driver.find_element(By.ID, f'ctl00_cphNoMargin_f_oprTab_tmpl0_Datalist1_ctl0{count}_lblGranteeLastName').text
        count += 1

        full_name = first_name + " " + last_name
        grantees.append(full_name)
      except:
        still_more_grantees = False

    transaction["grantors"] = grantors
    transaction["grantees"] = grantees
    driver.back()

    transaction_info.append(transaction)
    if j % save_increment == 0:
      save_to_csv(transaction_info, outfilename)
      transaction_info = []
  save_to_csv(transaction_info, outfilename)

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
run_all("11/04/2004", "02/15/2024", should_make_search=False)

TID 0
0
1
2
3
4
5
6
7
8
9
TID 10
10
11
12
13
14
15
16
17
18
19
TID 20
20
No more results
Data appended to /content/drive/My Drive/AEMP/data/transactions_11-04-2004_02-15-2024_DEED.csv


In [None]:
def where_am_i():
  try:
    driver.find_element(By.ID, "cphNoMargin_SearchButtons1_btnSearch")
    print ("On the search page")
  except:
    try:
      driver.find_element(By.ID, "OptionsBar2_imgNext")
      print ("On the search results page")
    except:
      try:
        driver.find_element(By.ID, f'ctl00_cphNoMargin_f_oprTab_tmpl0_documentInfoList_ctl00_DataLabel3')
        print ("On individual res page")
      except:
        print ("Idk man")

# Maybe driver.back() accordingly