# Mapping Groundwater Contaminants of California

Potential goals of this notebook:
1. Clean the dataset into a workable dataframe
2. Spatially plot the data using geopandas, or cartopy
3. ...

## 1. Reorganizing the data

This first section will deal with cleaning and reorganizing the data. The `%matplotlib inline` syntax prints out the figures that are created after each specific call. 

In [27]:
%matplotlib inline

## Imports
import pandas as pd
import numpy as np
import os
pd.set_option('display.max_columns', 500)

## Set the data's directory path
script_dir = os.path.abspath('')
data_dir   = os.path.join( os.path.split(os.path.split(script_dir)[0])[0], # shared path
                           'whw2019_India GW Code & Data\\whw2019_NWQData' )

The data we are using for this analysis are from a collaboration between the United States Geological Survey ([USGS](https://www.usgs.gov/)), the Environmental Protection Agency([EPA](https://www.epa.gov/)), United States Department of Agriculture Agricultural Reaseach Service ([USDA ARS](https://www.ars.usda.gov/)), and the National Water Quality Monitoring Council ([NWQMC](https://acwi.gov/monitoring/)). The groundwater quality data was aggragated and downloaded from the [Water Quality Portal](https://www.waterqualitydata.us/coverage/). 

The reported data sources are:
* National Water Information System ([NWIS](https://waterdata.usgs.gov/nwis)) - USGS
* STOrage and RETrieval ([STORET](https://www.epa.gov/waterdata/water-quality-data-wqx)) Data Warehouse - EPA
* Sustaining The Earth's Watersheds - Agricultural Research Database System ([STEWARDS]())

For now the state/region of interest is California (CA). However, we hope to be able to apply similar analyses to other states around the US, or to other countries (e.g., India) should adequate spatial (X,Y,Z) and temporal data resolution be available.

To read in the datafiles, we must make the proper call toward their storage location (on Hydroshare). The following `pd.read_csv` commands may present with some warnings after running. In this instance, the warnings are fine to ignore (however, always be mindful of the coding issues). 


In [30]:
# Enter state code in 'state' variable to read in that states data results.
state = 'CA'
results = pd.read_csv(r'{}\\{}_result.csv'.format(data_dir, state))
stations = pd.read_csv(r'{}\\{}_station.csv'.format(data_dir, state))

  interactivity=interactivity, compiler=compiler, result=result)
  interactivity=interactivity, compiler=compiler, result=result)


In [31]:
results.drop(columns=['OrganizationIdentifier', 
                      'OrganizationFormalName',
                      'ActivityIdentifier', 
                      'ActivityTypeCode', 
                      'ActivityMediaName',
                      'ActivityMediaSubdivisionName',
                      'ResultStatusIdentifier', 
                      'StatisticalBaseCode', 
                      'ResultValueTypeName',
                      'ResultWeightBasisText', 
                      'ResultTimeBasisText',
                      'ResultTemperatureBasisText',
                      'ResultParticleSizeBasisText',
                      'PrecisionValue', 
                      'ResultCommentText',
                      'USGSPCode',
                      'ResultDepthHeightMeasure/MeasureValue',
                      'ResultDepthHeightMeasure/MeasureUnitCode',
                      'ResultDepthAltitudeReferencePointText', 
                      'SubjectTaxonomicName',
                      'SampleTissueAnatomyName',
                      'ResultAnalyticalMethod/MethodIdentifier',
                      'ResultAnalyticalMethod/MethodIdentifierContext',
                      'ResultAnalyticalMethod/MethodName', 
                      'MethodDescriptionText',
                      'LaboratoryName',
                      'AnalysisStartDate', 
                      'ResultLaboratoryCommentText',
                      'DetectionQuantitationLimitTypeName',
                      'DetectionQuantitationLimitMeasure/MeasureValue',
                      'DetectionQuantitationLimitMeasure/MeasureUnitCode',
                      'PreparationStartDate', 
                      'ProviderName',
                      'ProjectIdentifier',
                      'ActivityConductingOrganizationText',
                      'ActivityCommentText',
                      'MeasureQualifierCode', 
                      'SampleCollectionMethod/MethodIdentifier',
                      'SampleCollectionMethod/MethodIdentifierContext',
                      'SampleCollectionMethod/MethodName',
                      'SampleCollectionEquipmentName',
                      'ResultDetectionConditionText'
                     ], inplace=True)