# Texas Cosmetologist Violations

Texas has a system for [searching for license violations](https://www.tdlr.texas.gov/cimsfo/fosearch.asp). You're going to search for cosmetologists!

## Setup: Import what you'll need to scrape the page

We'll be using Selenium for this, *not* BeautifulSoup and requests.

In [1]:
from bs4 import BeautifulSoup
import requests

import pandas as pd

import time

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select

from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(ChromeDriverManager().install())



Current google-chrome version is 96.0.4664
Get LATEST chromedriver version for 96.0.4664 google-chrome
Driver [/Users/tanazmeghjani/.wdm/drivers/chromedriver/mac64/96.0.4664.45/chromedriver] found in cache
  driver = webdriver.Chrome(ChromeDriverManager().install())


## Starting your search

Starting from [here](https://www.tdlr.texas.gov/cimsfo/fosearch.asp), search for cosmetologist violations for people with the last name **Nguyen**.

In [2]:
driver.get("https://www.tdlr.texas.gov/cimsfo/fosearch.asp")

In [3]:
driver.find_element(By.ID, 'pht_lnm').send_keys('Nguyen')

In [4]:
driver.find_element(By.NAME, "B1").click()

## Scraping

Once you are on the results page, do this. **I step you through things bit by bit, so it's going to be a little different than we did in class.** Also, no `pd.read_html` allowed because this isn't actual tabular data!

> You can use either Selenium by itself or Selenium+BeautifulSoup to scrape the results page. The choice is up to you!

### Loop through each result and print the entire row

Okay wait, maybe not, i's a heck of a lot of rows. Use `[:10]` to only do the first ten! For example, if you saved the table rows into `results` you might do something like this:

```python
for result in results[:10]:
    print(result)
```

Although you'd want to print out the text from the row (I give example output below).

> *Tip: If you're using Selenium, `By.TAG_NAME` is used if you don't have a class or ID. If you're using BeautifulSoup, just do your normal thing.*

In [5]:
rows = driver.find_elements(By.TAG_NAME, "tr") 

In [6]:
for row in rows[:10]:
    print(row.text)

Name and Location Order Basis for Order
NGUYEN, THANH
City: FRISCO
County: COLLIN
Zip Code: 75034


License #: 790672

Complaint # COS20210004784 Date: 11/16/2021

Respondent is assessed an administrative penalty in the amount of $1,875. Respondent failed to clean and sanitize whirlpool foot spas as required at the end of each day, the Department is charging 2 violations; Respondent operated a cosmetology salon without the appropriate license.
NGUYEN, DAI T
City: HOUSTON
County: Harris
Zip Code: 77034


License #: 765339

Complaint # COS20210005027 Date: 11/16/2021

Respondent is assessed an administrative penalty in the amount of $1,500. Respondent failed to follow whirlpool foot spas cleaning and sanitization procedures as required; Respondent failed to keep a record of the date and time of each foot spa daily or bi-weekly cleaning and if the foot spa was not used; Respondent failed to store eyelash extensions in a sealed bag or covered container and kept in a clean dry debris-free s

The result should look something like this:

```
Name and Location Order Basis for Order
NGUYEN, THANH
City: FRISCO
County: COLLIN
Zip Code: 75034


License #: 790672

Complaint # COS20210004784 Date: 11/16/2021

Respondent is assessed an administrative penalty in the amount of $1,875. Respondent failed to clean and sanitize whirlpool foot spas as required at the end of each day, the Department is charging 2 violations; Respondent operated a cosmetology salon without the appropriate license.
NGUYEN, LONG D
City: SAN SABA
County: SAN SABA
Zip Code: 76877
```

### Loop through each result and print each person's name

You'll get an error because the first one doesn't have a name. How do you make that not happen?! If you want to ignore an error, you use code like this:

```python
try:
   # try to do something
except:
   print("It didn't work')
```

It should help you out. If you don't want to print anything when there's an error, you can type `pass` instead of the `print` statement.

**Why doesn't the first one have a name?**

Output should look like this:

```
Doesn't have a name
NGUYEN, THANH
NGUYEN, LONG D
NGUYEN, LUCIE HUONG
NGUYEN, CHINH
NGUYEN, JIMMY
```

* *Tip: The name has a class you can use. The class name is reused in a lot of places, but because it's the first one you don't have to worry about that!*
* *Tip: Instead of searching across the entire page – `driver.find_element` or `doc.select_one` – you should be doing your searching just inside of each **row** (I used this technique in the beginning of class with BeautifulSoup when we were scraping the books page)* 

In [7]:
for row in rows[:10]:
    try:
        print(row.find_element(By.CLASS_NAME, 'results_text').text)
    except:
        print("It didn't work")
    

It didn't work
NGUYEN, THANH
NGUYEN, DAI T
NGUYEN, LONG D
NGUYEN, LUCIE HUONG
NGUYEN, CHINH
NGUYEN, JIMMY
NGUYEN, NAM
NGUYEN, DUC
NGUYEN, THU THAO THI


## Loop through each result, printing each violation description ("Basis for order")

Your results should look something like:

```
Doesn't have a violation
Respondent failed to clean and sanitize whirlpool foot spas as required at the end of each day, the Department is charging 2 violations; Respondent operated a cosmetology salon without the appropriate license.
Respondent failed to keep a record of the date and time of each foot spa daily or bi-weekly cleaning and if the foot spa was not used, the Department is charging 2 violations; Respondent failed to clean, disinfect, and sterilize manicure and pedicure implements after each use; Respondent failed to clean and disinfect manicure tables prior to use for each client.
...
```

> - *Tip: You'll get an error even if you're ALMOST right - which row is causing the problem?*
> - *Tip: If you're using Selenium by itself, you can get the HTML of something by doing `.get_attribute('innerHTML')` – that way it'll look like BeautifulSoup when you print it. It might help you diagnose your issue!*
> - *Tip: Or I guess you could just skip the one with the problem...*

In [8]:
for row in rows[1:10]:
    print(row.find_elements(By.TAG_NAME, 'td')[2].text)  
    print("-----")

Respondent failed to clean and sanitize whirlpool foot spas as required at the end of each day, the Department is charging 2 violations; Respondent operated a cosmetology salon without the appropriate license.
-----
Respondent failed to follow whirlpool foot spas cleaning and sanitization procedures as required; Respondent failed to keep a record of the date and time of each foot spa daily or bi-weekly cleaning and if the foot spa was not used; Respondent failed to store eyelash extensions in a sealed bag or covered container and kept in a clean dry debris-free storage area.
-----
Respondent failed to keep a record of the date and time of each foot spa daily or bi-weekly cleaning and if the foot spa was not used, the Department is charging 2 violations; Respondent failed to clean, disinfect, and sterilize manicure and pedicure implements after each use; Respondent failed to clean and disinfect manicure tables prior to use for each client.
-----
Respondent failed to keep a record of the

## Loop through each result, printing the complaint number

Output should look like this:

```
Doesn't have a complaint number
COS20210004784
COS20210009745
COS20210011484
...
```

- *Tip: Think about the order of the elements. Can you count from the opposite direction than you normally do?*

In [9]:
for row in rows[:10]:
    try:
        print(row.find_elements(By.CLASS_NAME, 'results_text')[-2].text)
    except:
        print("It didn't work")

It didn't work
COS20210004784
COS20210005027
COS20210009745
COS20210011484
COS20210011721
COS20200007069
COS20210010530
COS20200007141
COS20200000839


## Saving the results

### Loop through each result to create a list of dictionaries

Each dictionary must contain

- Person's name
- Violation description
- Violation number
- License Numbers
- Zip Code
- County
- City

Create a new dictionary for each result (except the header).

Based on what you print out, the output might look something like:

```
This row is broken: Name and Location Order Basis for Order
{'name': 'NGUYEN, THANH', 'city': 'FRISCO', 'county': 'COLLIN', 'zip_code': '75034', 'complaint_no': 'COS20210004784', 'license_numbers': '790672', 'complaint': 'Respondent failed to clean and sanitize whirlpool foot spas as required at the end of each day, the Department is charging 2 violations; Respondent operated a cosmetology salon without the appropriate license.'}
{'name': 'NGUYEN, LONG D', 'city': 'SAN SABA', 'county': 'SAN SABA', 'zip_code': '76877', 'complaint_no': 'COS20210009745', 'license_numbers': '760420, 1620583', 'complaint': 'Respondent failed to keep a record of the date and time of each foot spa daily or bi-weekly cleaning and if the foot spa was not used, the Department is charging 2 violations; Respondent failed to clean, disinfect, and sterilize manicure and pedicure implements after each use; Respondent failed to clean and disinfect manicure tables prior to use for each client.'}
```

> *Tip: If you want to ask for the "next sibling," you can't use `find_next_sibling` in Selenium, you need to use `element.find_element_by_xpath("following-sibling::div")` to find the next div, or `element.find_element_by_xpath("following-sibling::*")` to find the next anything.

In [16]:
list_of_dict = []

rows = driver.find_elements(By.TAG_NAME, "tr") 

for row in rows[1:10]:
    dict = {}
    dict['name:'] = row.find_elements(By.CLASS_NAME, 'results_text')[0].text
    dict['city'] = row.find_elements(By.CLASS_NAME, 'results_text')[1].text
    dict['county'] = row.find_elements(By.CLASS_NAME, 'results_text')[2].text
    dict['zip_code'] = row.find_elements(By.CLASS_NAME, 'results_text')[3].text
    dict['license_number'] = row.find_elements(By.CLASS_NAME, 'results_text')[4].text
    dict['complaint_no'] = row.find_elements(By.CLASS_NAME, 'results_text')[5].text
    dict['complaint'] = row.find_elements(By.TAG_NAME, 'td')[2].text
    
    list_of_dict.append(dict)
list_of_dict

NoSuchWindowException: Message: no such window: window was already closed
  (Session info: chrome=96.0.4664.55)
Stacktrace:
0   chromedriver                        0x0000000100636269 __gxx_personality_v0 + 582729
1   chromedriver                        0x00000001005c1c33 __gxx_personality_v0 + 106003
2   chromedriver                        0x000000010017ee28 chromedriver + 171560
3   chromedriver                        0x000000010016ea95 chromedriver + 105109
4   chromedriver                        0x000000010016ff42 chromedriver + 110402
5   chromedriver                        0x0000000100168c22 chromedriver + 80930
6   chromedriver                        0x00000001001802b3 chromedriver + 176819
7   chromedriver                        0x00000001001e3b0c chromedriver + 584460
8   chromedriver                        0x00000001001d1c23 chromedriver + 511011
9   chromedriver                        0x00000001001a775e chromedriver + 337758
10  chromedriver                        0x00000001001a8a95 chromedriver + 342677
11  chromedriver                        0x00000001005f28ab __gxx_personality_v0 + 305803
12  chromedriver                        0x0000000100609863 __gxx_personality_v0 + 399939
13  chromedriver                        0x000000010060ec7f __gxx_personality_v0 + 421471
14  chromedriver                        0x000000010060abba __gxx_personality_v0 + 404890
15  chromedriver                        0x00000001005e6e51 __gxx_personality_v0 + 258097
16  chromedriver                        0x0000000100626158 __gxx_personality_v0 + 516920
17  chromedriver                        0x00000001006262e1 __gxx_personality_v0 + 517313
18  chromedriver                        0x000000010063d6f8 __gxx_personality_v0 + 612568
19  libsystem_pthread.dylib             0x00007ff817c3d514 _pthread_start + 125
20  libsystem_pthread.dylib             0x00007ff817c3902f thread_start + 15


### Save that to a CSV named `output.csv`

The dataframe should look something like...

|index|name|city|county|zip_code|complaint_no|license_numbers|complaint|
|---|---|---|---|---|---|---|---|
|0|NGUYEN, THANH|FRISCO|COLLIN|75034|COS20210004784|790672|Respondent failed to clean and sanitize whirlp...|
|1|NGUYEN, LONG D|SAN SABA|SAN SABA|76877|COS20210009745|760420, 1620583|Respondent failed to keep a record of the date...|


- *Tip: If you send a list of dictionaries to `pd.DataFrame(...)`, it will create a dataframe out of that list!*

In [13]:
df = pd.DataFrame(list_of_dict)
df

Unnamed: 0,name:,city,county,zip_code,license_number,complaint_no,complaint
0,"NGUYEN, THANH",FRISCO,COLLIN,75034,790672,COS20210004784,Respondent failed to clean and sanitize whirlp...
1,"NGUYEN, DAI T",HOUSTON,Harris,77034,765339,COS20210005027,Respondent failed to follow whirlpool foot spa...
2,"NGUYEN, LONG D",SAN SABA,SAN SABA,76877,"760420, 1620583",COS20210009745,Respondent failed to keep a record of the date...
3,"NGUYEN, LUCIE HUONG",UVALDE,UVALDE,78801,"762626, 1811788",COS20210011484,Respondent failed to keep a record of the date...
4,"NGUYEN, CHINH",TEMPLE,BELL,76502,777067,COS20210011721,Respondent failed to follow whirlpool foot spa...
5,"NGUYEN, JIMMY",ROWLETT,DALLAS,75088,796773,COS20200007069,Respondent failed to clean and sanitize whirlp...
6,"NGUYEN, NAM",HOUSTON,HARRIS,77025,688039,COS20210010530,Respondents failed to follow proper sequential...
7,"NGUYEN, DUC",ABILENE,TAYLOR,79605,758793,COS20200007141,Respondent failed to clean and sanitize whirlp...
8,"NGUYEN, THU THAO THI",SAN ANTONIO,BEXAR,78244,"802892, 1286737",COS20200000839,Respondent performed or attempted to perform a...


In [14]:
df.to_csv('output.csv')

### Open the CSV file and examine the first few. Make sure you didn't save an extra weird unnamed column.

In [15]:
df.head()

Unnamed: 0,name:,city,county,zip_code,license_number,complaint_no,complaint
0,"NGUYEN, THANH",FRISCO,COLLIN,75034,790672,COS20210004784,Respondent failed to clean and sanitize whirlp...
1,"NGUYEN, DAI T",HOUSTON,Harris,77034,765339,COS20210005027,Respondent failed to follow whirlpool foot spa...
2,"NGUYEN, LONG D",SAN SABA,SAN SABA,76877,"760420, 1620583",COS20210009745,Respondent failed to keep a record of the date...
3,"NGUYEN, LUCIE HUONG",UVALDE,UVALDE,78801,"762626, 1811788",COS20210011484,Respondent failed to keep a record of the date...
4,"NGUYEN, CHINH",TEMPLE,BELL,76502,777067,COS20210011721,Respondent failed to follow whirlpool foot spa...
