# Mine Safety

We're interested in [US mine safety](https://arlweb.msha.gov/drs/drshome.htm), thank goodness we can search for these things.

## Setup: Import what you'll need to search and scrape and Selenium

In [1]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait

## Starting from `https://arlweb.msha.gov/drs/drshome.htm`, search for every operator with 'dirt' in their name, including abandoned mines.

> - *Tip: If you can't make an element work using name, class or ID, try to use the XPath*

In [2]:
driver = webdriver.Firefox()
driver.get('https://arlweb.msha.gov/drs/drshome.htm')

In [3]:
operator_search_input = driver.find_element_by_name('OperSearch')
operator_search_input.send_keys('dirt')

abandoned_mines_checkbox = driver.find_element_by_name('Abandoned')
abandoned_mines_checkbox.click()

In [4]:
submit_button = driver.find_element_by_xpath('/html/body/div[5]/div/form[1]/table/tbody/tr[7]/td[3]/input[1]')
submit_button.click()

## Scrape the results page, saving it as `dirt-operators.csv`

> - *Tip: Think about what each row in your dataset will be, and start by looping through that*
> - *Tip: Printing is cool and good! Print everything! Move it into a dictionary later.*
> - *Tip: If you don't want a row, think about what's in the row that makes it different. You can use an `if` statement or list slicing to skip the ones you aren't interested in.*
> - *Tip: Make sure your dictionary and your loop variable have DIFFERENT NAMES*
> - *Tip: After you've made your dictionary (and printed it, of course), you'll want to add it to your list of rows*
> - *Tip: Be sure to import pandas to convert it to a dataframe*
> - *Tip: Make sure you don't include the index when saving your dataframe*

### Hopefully you know that each `tr` is supposed to be a row of your data. What is the index of the first row element that is actually a result?

`.text` will help you here.

In [6]:
data_table = driver.find_element_by_xpath('/html/body/div[5]/div/table[3]')

In [7]:
# index of first data row is 1

### Loop through each operator result, printing its name

You can use list slicing or an `if` statement to skip the non-data row(s).

In [12]:
data_rows = data_table.find_elements_by_tag_name('tr')

for row in data_rows[1:len(data_rows)-1]:
  operator_name_element = row.find_element_by_xpath('td[3]')
  print(operator_name_element.text)

Newberg Rock & Dirt  
Allied Dirt Moving Company  
AM Dirtworks & Aggregate Sales  
Atlas-Dirty Devil Mining  
Atlas-Dirty Devil Mining  
Babe's Dirt Work  
Bar-Lin Dirt Company  
Barber'S Dirt Pit  
Bender Sand & Dirt  
BERT'S DIRT  
Big D Dirt Service Inc  
Big Red Dirt Farm LLC  
Big River Dirt Pit  


Bob Harris Dirt Contracting  
Bohannon Sand & Dirt  
Bratcher'S Sand & Dirt  
Brewer Dirt Works  
Buck'S Dirt Pit  
C & G Dirt Hauling  


C N C Dirt Movers, Inc.  
Cambridge Dirt Sand and Gravel LLC  
Central Iowa Dirt & Demo LLC  
Crowes Trucking & Dirt Pit Services  
D & H Dirt  
Diez Dirt & Sand Hauling Inc  
Dirt Cheap  
Dirt Company  
Dirt Company  
Dirt Company  
Dirt Con  
Dirt Diggers Inc  
Dirt Doctor Inc  
Dirt Inc  


Dirt Pit  
Dirt Work Specialists LLC  
Dirt Works  
Dirtco Inc  
Dirtman Trucking  
DIRTWORKS, INC.  
Dirtworks, Inc.  
Dirty Coal  
Dorchester Dirt Company Inc  
Douglas Dirt Sand & Gravel Company  
Ell Dirt Works LLC.  
Floyd Smith Dirt Pit  
Gary Kelm Dirt Service  
Gerald Fenger/Rock & Dirt Const  


Gerald Illies Gravel & Dirt Company  
Guidry Sand & Dirt Pit Inc  
Harris Dirt Company Inc  
Hatchet Creek Rock & Dirt LLC  
Holley Dirt Company, Inc  
Iske Dirt Sand & Gravel  
J M Lynn Dirtwork  
Jake Diel Dirt & Paving Inc  
Jake Diel Dirt & Paving Inc  
Jake Diel Dirt & Paving Inc  
Jake Diel Dirt & Paving Inc  
Jake Diel Dirt & Paving Inc  
Jake Diel Dirt & Paving Inc  
Jake Diel Dirt & Paving Inc  
Jake Diel Dirt & Paving Inc  


Jarratt Dirt Work and Paving, Inc.  
JBS DIRT, INC.  
Jones Bros Dirt & Paving Contractors Inc  
Krueger Brothers Gravel & Dirt  
Krueger Dirt Werx, Inc.  
L I P Dirt & Trucking  
Lee'S Dirt Pit Inc  
Little-G-Dirt Pit  
Lone Star Dirt & Paving  
Loyd'S Dirt & Gravel  
M R Dirt  
M.C. Dirt LLC  
M.R. Dirt Inc.  
Maurice Dirt & Sand  
Mc Dirt Industries Inc  


Mike Duhon Dirt Pit  
Mike Duhon Dirt Pit  
Mike Duhon Dirt Pit  
Moss Dirt Company  
Moss Dirt Company  
Muckler Fill Dirt & Top Soil  
Nelson & Sons Dirt Haulers Inc  
Nelson'S Dirt Pit  
Nicholson Dirt Contracting  
Nitty Gritty Dirt LLC  
Northest Louisiana Dirt Contractors  
Orvil Carter Dirt Contractor Inc  


Orvil Carter Dirt Contractor Inc  
P B Dirt Movers Inc  
P B Dirt Movers Inc  
P B Dirt Movers Inc  
P B Dirt Movers Inc  
P B Dirt Movers Inc  
P B Dirt Movers Inc.  
P B Dirt Movers, Inc  
P B Dirt Movers, Inc.  
P.B. Dirtmovers  
PAPA'S DIRT WORKES  
Paydirt Exc Inc  
PB Dirt Movers  
PB Dirt Movers Inc.  


PB Dirt Movers, Inc  
Peveto Dirt Pit  
Phil-Dirt Industries, Inc  
Prescott Dirt, LLC  
R & R Dirtworks  
R D Blankenship Dirt Work LLC  
Reeves Dirt Pit Inc  
River Bottom Dirt  
Roe'S Dirt Pit  
Russell Trest-Dirt Contractor  
S J Stahr Dirt Movers Inc  
Sand & Dirt, Inc  
Sand and Dirt, Inc.  
Sierra Rock & Dirt, Inc.  


Simpson Dirtworx llc  
SIMPSON DIRTWORX LLC  
SIMPSON DIRTWORX LLC  
Slay'S Dirt Hauling  
Southside Dirt Company  
Spry's Dirt & Gravel, Inc.  
Stewart Dirt Pit  
Stewart Dirt Work, Inc.  
Sweat'S Dirt Hauling Inc  
Toler Roe Dirt Pit  
Tres Palacios Dirt, Sand & Gravel  
Vogt Dirt Service  
Watson Dirt Sand & Gravel Pit Inc  


Y B Dirt & Loam  
Yarbrough Dirt Pit Inc  


### Loop through each operator result, printing its ID

There should be ONE code per row, and NO empty rows between them.

In [13]:
for row in data_rows[1:len(data_rows)-1]:
  operator_id_element = row.find_element_by_xpath('td[1]')
  print(operator_id_element.text)

3503598
0502030
4801789
4201449
4201450
1002257
1601167
4103265
1401575
1700776
1601251
0301963
1601082
3401751
1600916
3401211


0301267
1600956
2200033
0504953
3401929
1302445
1601106
3400915
1600983
4503200
3401266


3401468
5001797
4608254
1510279
2103723
0100776
4104016
2103914
4104757
0301729
0404851
2200734
5002028
1513393


3800602
3101630
3200860
3401762
2103517
2402626
2103181
1601124
1601150
4703427
0801306
2501216
3200965
2901371
2901544
2901709


4102355
4102420
4102869
4102951
4102958
4104876
3003502
4103258
3901432
2103556
1601250
1600908
1600953


4104185
2901536
3609624
3800709
3609931
1601257
0801275
1601379
1601380
1601381
1601134
1601165
3901042
1601194
4104054


4801674
2402474
1600920
4102955
4103107
1512530
1515619
1518318
4405366
4407196
1519685
1519799
4407379
4407003


2602570
2402503
4407296
1519273
4407270
4102682
0801259
0203332
0302015
2901986
1601127
4105017
1600986
4103324
4202013
0801417


0801371
2402115
4300748
4300768
4300776
0103209
1601159
2302283
4102586
4104475
3800617
1601234


4104648
2103518
1601292
4103429
4103264


## Saving the results

### Loop through each `tr` to create a list of dictionaries

Each dictionary must contain

- Operator ID
- Operator name
- Mine name
- State
- Mine type
- Coal or metal
- Status
- Commodity

Create a new dictionary for each row.

In [18]:
headers = [
  'operator_id',
  'state',
  'operator_name',
  'mine_name',
  'mine_type',
  'coal_or_metal',
  'status',
  'commodity'
]

# create a list
compiled_list = []

# loop through all rows, without the first and the last one
for row in data_rows[1:len(data_rows) - 1]:

  # get all cells in the row
  row_cells = row.find_elements_by_tag_name('td')
  dictionary = {}
  for index in range(len(row_cells)):
    dictionary[headers[index]] = row_cells[index].text

  compiled_list.append(dictionary)

len(compiled_list)

132

### Save that to a CSV named `dirt-operators.csv`

In [17]:
import pandas as pd
df = pd.DataFrame(compiled_list)
df.to_csv('07-homework/mines.csv', index=False)

### Open the CSV file and examine the first few.

Make sure you didn't save that extra weird unnamed index column.