# Mine Safety

We're interested in [US mine safety](https://arlweb.msha.gov/drs/drshome.htm), thank goodness we can search for these things.

## Setup: Import what you'll need to search and scrape and Selenium

In [7]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
driver=webdriver.Chrome()
driver.get('https://arlweb.msha.gov/drs/drshome.htm')

## Starting from `https://arlweb.msha.gov/drs/drshome.htm`, search for every operator with 'dirt' in their name, including abandoned mines.

> - *Tip: If you can't make an element work using name, class or ID, try to use the XPath*

In [8]:
op_name=driver.find_element_by_name("OperSearch")

In [9]:
op_name.send_keys('dirt')

In [10]:
button = driver.find_element_by_name('Abandoned')
driver.execute_script("arguments[0].scrollIntoView(true)", button)
button.click()

In [11]:
button = driver.find_element_by_xpath('//*[@id="content"]/form[1]/table/tbody/tr[7]/td[3]/input[1]')
driver.execute_script("arguments[0].scrollIntoView(true)", button)
button.click()

## Scrape the results page, saving it as `dirt-operators.csv`

> - *Tip: Think about what each row in your dataset will be, and start by looping through that*
> - *Tip: Printing is cool and good! Print everything! Move it into a dictionary later.*
> - *Tip: If you don't want a row, think about what's in the row that makes it different. You can use an `if` statement or list slicing to skip the ones you aren't interested in.*
> - *Tip: Make sure your dictionary and your loop variable have DIFFERENT NAMES*
> - *Tip: After you've made your dictionary (and printed it, of course), you'll want to add it to your list of rows*
> - *Tip: Be sure to import pandas to convert it to a dataframe*
> - *Tip: Make sure you don't include the index when saving your dataframe*

### Hopefully you know that each `tr` is supposed to be a row of your data. What is the index of the first row element that is actually a result?

> - *Tip: `.text` will help you here.*
> - *Tip: You aren't interesting in annotations or anything, just mines and where they are from*
> - *Tip: Using `print("-----")` will help you keep track of different rows*
> - *Tip: If you have a list called `animals`, `animals[2:]` will skip the first two and start with the third. You can use this to skip ahead to the 'good' data if you want*

In [64]:
results=driver.find_elements_by_tag_name('tr')
rows=[]
for dirt in results[7:-1]:
    first_result=dirt.text
    rows.append(first_result)
    
    

In [65]:
print("This is the first row that is actually a result: ", rows[0])

This is the first row that is actually a result:  3503598
OR  Newberg Rock & Dirt   Newberg Rock & Dirt Surface M  Active  Crushed, Broken Stone NEC 


### Loop through each operator result, printing its name

> - *Tip: If you have a list called `animals`, `animals[2:]` will skip the first two and start with the third.*
> - *Tip: You can use list slicing or an `if` statement to skip the non-data row(s). List slicing is probably easier, even if you aren't comfortable with it.*
> - *Tip: or honestly you can use `try` and `except` if you know how it works.*
> - *Tip: Once you have the "right" rows of data, you're going to be looking for a certain tag inside*
> - *Tip: Sometimes you can't say "give me this class," and instead you have to say "give me all of the `div` elements, and then give me the third one."*

In [78]:
results=driver.find_elements_by_tag_name('tr')
for dirt in results[7:-1]:
    operators=dirt.find_elements_by_tag_name('td')
    print("Operator: ", operators[2].text)

Operator:  Newberg Rock & Dirt  
Operator:  Allied Dirt Moving Company  
Operator:  AM Dirtworks & Aggregate Sales  
Operator:  Atlas-Dirty Devil Mining  
Operator:  Atlas-Dirty Devil Mining  
Operator:  Babe's Dirt Work  
Operator:  Bar-Lin Dirt Company  
Operator:  Barber'S Dirt Pit  
Operator:  Bender Sand & Dirt  
Operator:  BERT'S DIRT  
Operator:  Big D Dirt Service Inc  
Operator:  Big Red Dirt Farm LLC  
Operator:  Big River Dirt Pit  
Operator:  Bob Harris Dirt Contracting  
Operator:  Bohannon Sand & Dirt  
Operator:  Bratcher'S Sand & Dirt  
Operator:  Brewer Dirt Works  
Operator:  Buck'S Dirt Pit  
Operator:  C & G Dirt Hauling  
Operator:  C N C Dirt Movers, Inc.  
Operator:  Cambridge Dirt Sand and Gravel LLC  
Operator:  Central Iowa Dirt & Demo LLC  
Operator:  Crowes Trucking & Dirt Pit Services  
Operator:  D & H Dirt  
Operator:  Diez Dirt & Sand Hauling Inc  
Operator:  Dirt Cheap  
Operator:  Dirt Company  
Operator:  Dirt Company  
Operator:  Dirt Company  
Opera

### Loop through each operator result, printing its ID

There should be ONE code per row, and NO empty rows between them.

In [81]:
results=driver.find_elements_by_tag_name('tr')
for dirt in results[7:-1]:
    ids=dirt.find_elements_by_tag_name('td')
    print("ID_: ", ids[0].text)

ID_:  3503598
ID_:  0502030
ID_:  4801789
ID_:  4201449
ID_:  4201450
ID_:  1002257
ID_:  1601167
ID_:  4103265
ID_:  1401575
ID_:  1700776
ID_:  1601251
ID_:  0301963
ID_:  1601082
ID_:  3401751
ID_:  1600916
ID_:  3401211
ID_:  0301267
ID_:  1600956
ID_:  2200033
ID_:  0504953
ID_:  3401929
ID_:  1302445
ID_:  1601106
ID_:  3400915
ID_:  1600983
ID_:  4503200
ID_:  3401266
ID_:  3401468
ID_:  5001797
ID_:  4608254
ID_:  1510279
ID_:  2103723
ID_:  0100776
ID_:  4104016
ID_:  2103914
ID_:  4104757
ID_:  0301729
ID_:  0404851
ID_:  2200734
ID_:  5002028
ID_:  1513393
ID_:  3800602
ID_:  3101630
ID_:  3200860
ID_:  3401762
ID_:  2103517
ID_:  2402626
ID_:  2103181
ID_:  1601124
ID_:  1601150
ID_:  4703427
ID_:  0801306
ID_:  2501216
ID_:  3200965
ID_:  2901371
ID_:  2901544
ID_:  2901709
ID_:  4102355
ID_:  4102420
ID_:  4102869
ID_:  4102951
ID_:  4102958
ID_:  4104876
ID_:  3003502
ID_:  4103258
ID_:  3901432
ID_:  2103556
ID_:  1601250
ID_:  1600908
ID_:  1600953
ID_:  4104185
ID_:  

## Saving the results

### Loop through each `tr` to create a list of dictionaries

Each dictionary must contain

- Operator ID
- Operator name
- Mine name
- State
- Mine type
- Coal or metal
- Status
- Commodity

Create a new dictionary for each row.

> - *Tip: Start with an empty dictionary, then add the keys one at a time like we did during class*
> - *Tip: You might want to save all of the cells in a variable, then use indexes to get the second, third, fourth, etc.*
> - *Tip: I know you already skipped a bunch of rows already, but one of them still might be bad! Which one is it? How can you skip it? You might need to slice out some of the end of your list, too. Use `print` to help you debug, or just look at the page closely.*
> - *Tip: Or, if you did the other homework already, `try` / `except` is also an option*

In [82]:
results=driver.find_elements_by_tag_name('tr')
rows=[]
for dirt in results[7:-1]:
    dic_dirt={}
    infos=dirt.find_elements_by_tag_name('td')
    dic_dirt['Operator_id']=infos[0].text
    dic_dirt['State']=infos[1].text
    dic_dirt['Operator_name']=infos[2].text
    dic_dirt['Mine_Name']=infos[3].text
    dic_dirt['Mine_type']=infos[4].text
    dic_dirt['Coal_or_Metal']=infos[5].text
    dic_dirt['Status']=infos[6].text
    dic_dirt['Commodity']=infos[7].text
    rows.append(dic_dirt)

### Save that to a CSV named `dirt-operators.csv`

In [83]:
import pandas as pd
%matplotlib inline
df=pd.DataFrame(rows)

In [84]:
df.to_csv("dirt_operators.csv", index=False)

### Open the CSV file and examine the first few.

Make sure you didn't save that extra weird unnamed index column.

In [89]:
df=pd.read_csv("dirt_operators.csv")
df=df[['Operator_id','State', 'Operator_name', 'Mine_Name', 'Mine_type', 'Coal_or_Metal', 'Status', 'Commodity']]

In [90]:
df.head()

Unnamed: 0,Operator_id,State,Operator_name,Mine_Name,Mine_type,Coal_or_Metal,Status,Commodity
0,3503598,OR,Newberg Rock & Dirt,Newberg Rock & Dirt,Surface,M,Active,"Crushed, Broken Stone NEC"
1,502030,CO,Allied Dirt Moving Company,Allied Dirt Moving Co Pit & Plant,Surface,M,Abandoned,Construction Sand and Gravel
2,4801789,ND,AM Dirtworks & Aggregate Sales,AM Dirtworks & Aggregate Sales,Surface,M,Abandoned,Construction Sand and Gravel
3,4201449,UT,Atlas-Dirty Devil Mining,Unit Train Loading Facility,Facility,C,Abandoned,Coal (Bituminous)
4,4201450,UT,Atlas-Dirty Devil Mining,Blackie Surface Mine & Prep Plant,Surface,C,Abandoned,Coal (Bituminous)
