Automates retrieval and submission of ocean acidification data for the Center for Biological Diversity.
Currently contains scripts to
- Retrieve and aggregate data from NERRS, King County, Ocean Observatories Initiative and CeNCOOS
- Reformat that data to make it ready for submission to California, Washington, Hawaii, and Oregon.
These steps must be taken before using the program. Most will only be required once.
This setup should only have to be run once per machine you run it on.
-
Install Docker. The project is designed to run in a Docker container. Therefore, the only prerequisite is Docker: Get Docker
-
Clone the repository. If you haven't already:
git clone https://github.com/11th-Hour-Data-Science/cbd-ocean-acidification.git
-
Change to the root project directory:
cd cbd-ocean-acidification
-
Build the Docker image:
docker build --tag cbd .
Some steps in the 303(d) data submission process must be taken manually once before completion. The metadata/stations.csv
contains metadata about the stations that are available to retrieve data from. You can open this in Excel or any csv editor of your choice. Just be sure to save any changes as a csv.
Please refer to "Stations Table Schema" below for information on how to fill out columns.
Washington uses EIM to handle environmental data. It has three distinct types of data.
- Results, which are the values of a given parameter at a given time and place, are formatted in a results table automatically.
- Locations, which are the descriptions of locations from which results originate, are formatted in a locations table automatically.
- Studies are data about how the results were collected and who administers the locations. These must be submitted manually once.
To set up submitting data to Washington:
- Follow the instructions here through creating the relevant studies
- Update
stations.csv
to add theeim_study_id
to the relevant stations - Ensure all stations you wish to submit have the required information in
stations.csv
andstation_parameter_metadata.csv
For more information, visit the EIM page
CEDEN is California's primary portal for environmental data upload, but it does not accept time series data as of this writing. All time-series data must be submitted to the Integrated Report Document Upload Portal. To submit to California:
- Create an IR Portal account: https://public2.waterboards.ca.gov/IRPORTAL/Account/Register
- Ensure all stations you wish to submit have the required information in
stations.csv
andstation_parameter_metadata.csv
For new stations and locations:
- Contact the data source to receive approval, ensure we are following their terms of service, and to ensure they are not already submitting data.
- If the station's data is available through one of the available collectors: King County, IPACOA, ERDDAP, find its ID in this service and use it as our station_id
- If it is not available through one of the available collectors, a new scraper will need to be created. If you are able you can try to write one yourself (following existing patterns) and open a Pull Request. Otherwise open an issue describing the new station you would like and where you retrieved its data from.
- Add an entry to
stations.csv
with all relevant data. You may need to contact the source. - Add entries to
station_parameter_metadata.csv
with all relevant data. You may need to contact the source.
If you are submitting NERRS data:
-
Get your ipv4 address (going to a website like https://whatismyipaddress.com/ should do it)
-
Request a webservices account from NERRS: http://cdmo.baruch.sc.edu/web-services-request/
-
Wait for your confirmation email. Since most IP addresses change over time, you may have to do this before each time you acquire NERRS data, or get a static IP.
-
In the project root directory, run:
bash run_tool.sh <STATE> <start_date> <end_date>
where should be the state name (California, Hawaii, or Washington), <start_date> and <end_date> should be dates YYYY/MM/DD format (exclude <>). This will prompt your password, as we are using sudo priveleges to change the owner of the output files from root (since docker created them) to the current user.
-
Results will be saved in
results/STATE/YYYY-MM-DDTHH-MM
with aREADME.txt
file explaining further instructions.
- In the project root directory, run:
where should be the state name (California, Hawaii, or Washington), <start_date> and <end_date> should be dates YYYY/MM/DD format (exclude <>).
python main.py <STATE> --start <start_date> --end <end_date>
- Results will be saved in
results/STATE/YYYY-MM-DDTHH-MM
with aREADME.txt
file explaining further instructions.
ipacoa_measurement_lookup.csv
: Lookup table that shows IDs, names, and measurement units for all measurements provided by stations listed in the asset list.ipacoa_platform_measurements.csv
: Contains all combinations of stations and measurements accessible through IPACOA. Theprocess
column indicates whether that information is going to be downloaded by the scraperipacoa.py
. If you want to add additional stations or measurements, you should updateprocess
column of the relevant row to beTrue
. Otherwise that information will not be included in the dataset. Currently only ocean acidification related measurements of West Coast stations are set to haveprocess=True
.king-county-keys.json
: Contains necessary keys to make aGET
request to King County's API. Used by thekingcounty.py
script.kingcounty_measurement_lookup.csv
: Contains information on measurements, units, and devices used in the stations listed by King County data portal.stations.csv
: Table containing information on all stations that can be accessed through both IPACOA and King County data sources.
- station_id: unique identifier for the station, usually taken from station.
- name: descriptive name of a station
- approved: TRUE if we have permision to submit station data, FALSE if not.
- source: organization that operates station
- provider: source from which we retrieve data from this station
- QAPP: link to qapp, if available
- state: state or province station is located in
- latitude: latitude expressed in decimal form
- longitude: longitude expressed in decimal form
- description: qualitative information about station and station location
- setting: where the station is located. Options are: "Canal Transport", "Estuary", "Lake", "Ocean", "Other-Surface Water", "Reservoir", "River/Stream", "Seep", "Spring", "Storm Sewer". This is from Oregon's DEQ and is converted for other states.
- collector: organization type of operator. "University", "GovFed", "GovLocal", "NOAA"
- horizonatl_datum: Datum that was used for horizontal coordinates. Acceptable values are: "WGS84", "NAD27", "NAD83".
- horizontal_coordinate_accuracy:
- horizontal_coordinate_collection: (DEQ, ?). For DEQ: "GPS-Unspecified", "Interpolation-Map", or blank.
- study_specific_id:
- reference_point:
- ceden_id: (CEDEN only)
- ceded_project_code: (CEDEN only)
- tribal_land: (DEQ only) Yes or No.
- eim_location_study: (EIM only)
- eim_study_id: (EIM only)
You can contact us by opening an issue or emailing tspread at uchicago edu
Project Link: https://github.com/chicago-cdac/cbd-ocean-acidification