# Hazardous Spots

The aim herein is to acquire the geographic - and industry - details of all entitities that are considered hazard release spots by the [toxics release inventory](https://enviro.epa.gov/triexplorer/tri_release.facility)

<br>

## Preliminaries


**Unzip**

In the case of a specific directory

* `unzip -u -d -q directory src.zip`

wherein the options -u, -d, & -q denote update, specified directory, & quietly, respectively. Additionally, for archive content previewing purposes

* `unzip -l src.zip`
* `unzip -v src.zip`

<br>

**Counting**

!ls ... | wc -l

<br>

**Ubuntu**

```bash
%%bash
cat /etc/issue &> ubuntuLOG.txt
cat /proc/cpuinfo &> cpuLOG.txt
cat /proc/meminfo &> memoryLOG.txt
```

<br>


In [None]:
%%bash
rm -f *LOG.txt && rm -f *.pdf
rm -rf states &&rm -rf counties && rm -rf tracts 
rm -rf spots && rm -rf naics && rm -rf designs

In [None]:
%%bash
rm -rf src
unzip -u -q src.zip
rm -r src.zip

rm -rf scripts
unzip -u -q scripts.zip
rm -r scripts.zip

<br>
<br>

### Packages

libspatialindex

In [None]:
%%bash
chmod +x scripts/libspatialindex.sh
./scripts/libspatialindex.sh &> libspatialindexLOG.txt

<br>

rtree

In [None]:
%%bash
chmod +x scripts/rtree.sh
./scripts/rtree.sh &> rtreeLOG.txt

<br>

geopandas

In [None]:
!pip install geopandas &> geopandasLOG.txt

<br>

dotmap

In [None]:
!pip install dotmap &> dotmapLOG.txt

<br>

Quantities

In [None]:
!pip install quantities &> quantitiesLOG.txt

<br>
<br>

### Libraries

In [None]:
import pandas as pd
import dask.dataframe as dd
import dask
import numpy as np
import requests
import os

In [None]:
import logging

<br>

### Logging

In [None]:
logging.basicConfig(level=logging.ERROR, format='%(asctime)s \n\r %(levelname)s %(message)s', datefmt='%H:%M:%S')
logger = logging.getLogger(__name__)

<br>
<br>

### Classes

In [None]:
import src.boundaries.boundaries
import src.settings
import src.references.chemicals
import src.releases.request
import src.releases.helpers

<br>

Instantiate

In [None]:
settings = src.settings.Settings()
boundaries = src.boundaries.boundaries.Boundaries(crs=settings.crs)

<br>
<br>

## Front Matter

This module's objectives

* acquire the lists of hazardous sites from the old TRI repository, and ensure that each site has its coordinate details: longitude, latitude, state FP/GEOID, county FP/GEOID, tract CE/GEOID
* acquire the lists of hazardous sites from the latest TRI Services repository
* map the details of these repositories
* acquire their industry classifications
* create an efficient storage set-up for the outcomes

<br>
<br>

### TRI

```bash
%%bash
python src/tri/main.py
ls spots/mapable/*csv | wc -l
ls spots/mapable/*csv | wc -l
```

<br>
<br>

### NAICS

```bash
%%bash
python src/naics/main.py
ls naics/*csv | wc -l
```

<br>
<br>

## Releases

States

In [None]:
states = boundaries.states(settings.latest)
states.info()

<br>

Counties

In [None]:
counties = boundaries.counties(settings.latest)
counties = counties.merge(states[['STATEFP', 'STUSPS']], on='STATEFP', how='left')
counties.rename(columns={'GEOID': 'COUNTYGEOID'}, inplace=True)
counties.info()

<br>

Chemicals

In [None]:
chemicals = src.references.chemicals.Chemicals().exc()
chemicals.info()

<br>

Directories

In [None]:
path = os.path.join(os.getcwd(), 'designs')
if not os.path.exists(path):
    os.makedirs(path)

<br>
<br>

### Setting-up

Initially focus on LA.  Remember, these are **on-site releases & disposals**

In brief:

* Drop: FACILITY_NAME, EPA_REGISTRY_ID, CAS_CHEM_NAME, RELEASE_BASIS_EST_CODE
* Only select cases whereby TRADE_SECRET_IND == 0
* Ascertain data consistency w.r.t. ENVIRONMENTAL_MEDIUM
* Drop duplicates

The last step merges the distributed data lines of the dask.DataFrame.  Beware of

* The `DOC_CTRL_NUM`, `WATER_SEQUENCE_NUM`, `ENVIRONMENTAL_MEDIUM` fields; these probably aid distinct record identification via `drop_duplicates`.

* Units of measure differences; a mix of pounds & grams



<br>

<br>
<br>

### Steps

In [None]:
request = src.releases.request.Request()
helpers = src.releases.helpers.Helpers(counties=counties, chemicals=chemicals)

In [None]:
for index in states.index:

    logger.info('\n...{}'.format(states.STUSPS[index]))

    # A state's data streams
    streams = request.exc(state=states.STUSPS[index])

    # The streams: county, year, and chemical level
    distributions = streams.compute(scheduler='processes')

    # The chemical amount released; the by year optionmight be removed in future
    base = distributions.groupby(by=['COUNTYGEOID', 'TRI_CHEM_ID', 'REPORTING_YEAR'])['TOTAL_RELEASE'].sum()
    base = base.reset_index(drop=False)

    # Get the unit of measure per chemical
    details = helpers.units(data=base.copy())

    # Ensure consistent release measures
    transformed = helpers.weights(data=details.copy())

    # Preliminary Focus: Analysis w.r.t. total release over time, per county & chemical
    points = transformed.groupby(by=['COUNTYGEOID', 'TRI_CHEM_ID'])['RELEASE_KG'].sum()
    points = points.reset_index(drop=False)

    # Design matrix
    matrix = helpers.regressors(data=points, state=states.STUSPS[index])
    logger.info(matrix.info())
    matrix.to_csv(path_or_buf=os.path.join(path, states.STUSPS[index] + '.csv'), index=True, encoding='UTF-8', header=True)


<br>

A Computation Graph

In [None]:
streams.visualize(filename='streams', format='pdf')

<br>
<br>

### Later

* Selections w.r.t. the states of `CAAC_IND`, `CARC_IND`, & `R3350_IND` in chemicals, per `TRI_CHEM_ID`.