/
api-requests.Rmd
244 lines (162 loc) · 11.3 KB
/
api-requests.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
---
title: "API Requests"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{API Requests}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
## Introduction
Before you can fetch data of any kind from Cal-Adapt, you have to construct an API request object. Creating an API request object is like filling out an order form. Once completed, you feed the API request to a function that actually retrieves data (either tabular or raster).
An API request object contains all the information needed to fetch data, including the location(s) of interest, dataset(s), and time period. In practice, you construct an API request object by stringing together a series of functions that specify the different pieces of the request. Examples:
```{r sac_temperature_cap}
library(caladaptr)
## Request modeled climate data (from Scripps)
sac_minmaxtemp_cap <- ca_loc_pt(coords = c(-121.4687, 38.5938)) %>% ## pointlocation(s)
ca_gcm(c("HadGEM2-ES", "CNRM-CM5", "CanESM2","MIROC5")) %>% ## GCM(s)
ca_scenario(c("rcp45","rcp85")) %>% ## emission scenarios(s)
ca_cvar(c("tasmin", "tasmax")) %>% ## climate variables
ca_period("year") %>% ## temporal aggregation period
ca_years(start = 1971, end = 2070) ## start and end dates
## Specify Livneh data (observed historical) for a HUC10 watershed
huc_pr_cap <- ca_loc_aoipreset(type = "hydrounits", ## Preset AOI
idfld = "huc10",
idval = "1809020409") %>%
ca_livneh(TRUE) %>% ## Livneh data
ca_cvar("pr") %>% ## precipitation
ca_period("day") %>% ## daily
ca_dates(start = "1960-10-01", end = "2010-09-30") %>% ## start and end dates
ca_options(spatial_ag = "mean") ## spatially aggregate w/ mean
## Specify desired data by slug for irrigated water management districts
irwm_ht_cap <- ca_loc_aoipreset(type = "irwm", idfld = "name") %>% ## Preset AOI
ca_slug(c("exheat_year_ens32avg_historical", ## slug(s)
"exheat_year_ens32avg_rcp45",
"exheat_year_ens32avg_rcp85")) %>%
ca_options(spatial_ag = "max") ## spatially aggregate w/ max
```
\
## Specifying Location
All API requests must specify the location(s) of interest. Locations can be specified by points, preset areas-of-interest, or user-provided geoms. All of these options allow you to specify a primary key field which will be included in the results so the results can be joined to the input features.
\
*1) Points*
Point locations can be included using `ca_loc_pt()` if the points are saved in a vector, matrix or data frame, and `ca_loc_sf()` if the points are in a simple feature data frame.
If points are passed in a matrix, vector or data frame, the first column should be longitude (x), and the second column should be latitude (y). Geographic coordinates are not required if the points are in a sf data frame (but its a good idea). Example:
```{r ca_loc_pt_coords}
(ca_loc_pt(coords = c(-122.1, 38.1)))
```
\
*2) Preset areas-of-interest*
The Cal-Adapt API is integrated with a number of preset polygon areas-of-interest, also referred to as *boundary layers*. These are things like county boundaries, census tracts, watersheds, etc. If your study area(s) coincide with one of these layers, you can specify them by name using `ca_loc_aoipreset()`.
To see a list of the available preset layers, run:
```{r}
aoipreset_types
```
When you use a preset AOI layer in an API request, you can specify which feature(s) you're interested in (if you don't want them all). For example if you want data summarized by county, you can specify which of the over 50 counties in California you're interested in. Two arguments must be passed to specify features of interest: i) `idfld` (a field in the attribute table that has unique values), and ii) `idval` (the id numbers or values of the features). Example:
```{r}
(ca_loc_aoipreset(type = "counties", idfld = "fips", idval = "06013"))
```
To figure out which fields are available to specify locations for a given preset layer, you can use the built-in constant `aoipreset_idflds`. For example if you're interested in counties, you can use the following fields:
```{r}
aoipreset_idflds$counties
```
To see the valid values to specify features, use the built-in constant `aoipreset_idval`. For example, the 'Counties' preset layer has a column named 'fips' you can use to specify which county(s) you're interested in. The values look like:
```{r}
aoipreset_idval$counties$fips %>% head()
```
\
All the preset areas of interest can be imported as sf objects using `ca_aoipreset_geom()`. For example the climate regions for the 5th climate change assessment look like:
```{r ca_climregions_sf}
ca_climregions_sf <- ca_aoipreset_geom("climregions")
library(tmap)
tm_shape(ca_climregions_sf) + tm_fill(col = "MAP_COLORS", palette = "Pastel1")
```
\
*3) User-provided sf features*
If Cal-Adapt's preset areas-of-interest don't align with your study area(s), you can provide your own locations as a simple feature data frame. Your sf object should have a column with unique values (like OBJECTID) to join the input features to the Cal-Adapt data.
Add locations from a sf object to an API request with `ca_loc_sf()`. POINT, POLYGON, and MULTIPOLYGON geometries are supported. Lines are not supported, and multipoint features must be converted to simple point features (see `st_cast`). The `idfld` argument specifies a column in the sf object that contains unique values. If the sf object lacks a column with unique values, you can add one using `mutate()`, or provide a vector of unique id values with `idval`.
The Cal-Adapt API has limits as to how large a spatial area can be queried. If your area-of-interest is larger than a county, consider blocking the area with `ca_biggeom_blocks()`. If your locations include multiple points per 6km LOCA grid cell (Cal-Adapt's smallest spatial unit), you can group them and just make one call per grid cell (see the [Large Queries](large-queries-sqlite.html) vignette for sample code).
In the next example, we get a sf data frame of the Congressional Districts, and use it to start an API request object.
```{r}
## Get Congressional Districts as a sf object
(cdistricts_sf <- ca_aoipreset_geom("cdistricts", quiet = TRUE))
## Start an API request object
(ca_loc_sf(loc = cdistricts_sf, idfld = "geoid"))
```
## Datasets
A Cal-Adapt API request must also specify the data layers(s) to retrieve. A description of the datasets on Cal-Adapt is beyond the scope of this vignette, but some general guidelines include:
i) Review the [data documentation](https://cal-adapt.org/data/) on Cal-Adapt, including the [Introduction to Climate Data](https://cal-adapt.org/blog/2020/webinar-jan-2020/) webinar.
ii) A local copy of Cal-Adapt data catalog can be retrieved from `ca_catalog_rs()`, which returns a tibble. One way to find a dataset for your project is to view the catalog in a RStudio viewer pane and use the filter tool to find specific datasets.
```
View(ca_catalog_rs())
```
iii) To search for data layers and view their properties, you can use `ca_catalog_search()`, passing it keywords or a slug. Partial matches are returned as well.
```{r ca_catalog_search}
ca_catalog_search("pr_day_gridmet")
```
### Specifying Datasets
Once you've identified which dataset you need, add functions to the Cal-Adapt API request. The groups of functions you use to specify datasets fall in three categories: 1) one of the main climate models, 2) observed climate data, and 3) slugs.
\
**Modeled Climate Data**
Many of Cal-Adapt's datasets are based on the direct output of global climate models (e.g., temperature, precipitation) or their derivatives (e.g., snow water equivalent, evaoptranspiration). These models have been used to predict both future and historic conditions based on specific emissions scenarios. API requests for modeled climate data should include:
- climate variable(s) - e.g., tasmin, tasmax, pr, et, etc.
- GCM(s)
- emissions scenario(s)
- temporal period - e.g., year, day, month
To help you construct an API request for modeled climate data, the following built-in constants provide valid values. Note that not all combinations have an existing dataset.
```{r constants}
cvars
gcms
scenarios
periods
```
Example:
```{r}
(ca_cvar(cvar = "tasmin") %>%
ca_gcm(gcm = gcms[1:4]) %>%
ca_scenario("rcp85") %>%
ca_period("year"))
```
\
**Observed Data**
Livneh data are based on historic records that have been spatially interpolated. To specify a Livneh dataset, include `ca_livneh()` in the request and omit GCMs and emission scenarios:
```{r}
(ex2_cap <- ca_cvar(cvar = "pr") %>%
ca_livneh(TRUE) %>%
ca_period("year"))
```
\
**Slugs**
Each of the nearly 950 raster series available through the Cal-Adapt API has a unique `slug`, or short name. You can look up slugs in the Cal-Adapt catalog (`ca_catalog_rs()`).
If the dataset you're interested in can't be specified using the functions above, you can use `ca_slug()` to specify it by its slug. With `ca_slug()`, you don't have to include the climate variable, period, GCM, or scenario - those are all implied by the slug. But you'll still need to specify a location and optionally dates for the API request to be complete.
```{r}
(ca_slug(slug = "exheat_year_ens32avg_historical"))
```
## Dates
Date ranges are optional in API requests. If a date range is not provided, data for the entire time range of the dataset will be returned. You can include a date range with `ca_dates()` or `ca_years()`. Enter years as integers, and dates as character values `"yyyy-mm-dd"`. Examples:
```{r}
(ca_years(start = 2035, end = 2065))
(ca_dates(start = "2050-03-15", end = "2070-10-31"))
```
## Options
`ca_options()` is how you specify any other options for the request. Currently the only option in use is `spatial_ag`. `spatial_ag` is the spatial aggregation function that should be used when querying polygon areas. If a feature intersects more than one LOCA grid cell, this function will be used to collapse the grid cells into one value.
```{r}
(ca_options(spatial_ag = "mean"))
```
## Checking the ingetrity of an API request()
Before you use an API request to fetch data, there are a couple of things you can do to check its validity.
To verify the API request covers the correct location, you can plot it. Below we plot a sample API request returned by `ca_example_apireq()`. Adding locagrid = TRUE overlay the actual LOCA grid cells.
```{r samp_cap_plot}
(samp_cap <- ca_example_apireq(3))
plot(samp_cap, locagrid = TRUE)
```
\
`ca_preflight()` does a more complete check of an API request, double-checking that it refers to an existing dataset, data are available for the requested date range, the request doesn't have conflicting elements, it isn't too big, etc. Some warnings are specific to fetching values vs fetching rasters, and are reported accordingly:
```{r ca_preflight}
ca_example_apireq(4) %>% ca_preflight()
```