diff --git a/_bookdown.yml b/_bookdown.yml index 9bfa094..97ee7ec 100644 --- a/_bookdown.yml +++ b/_bookdown.yml @@ -3,7 +3,7 @@ output_dir: "docs" language: ui: chapter_name: "Chapter " -rmd_files: ["index.Rmd", "traits/00-BETYdb-getting-started.Rmd", "traits/01-web-access.Rmd", -"traits/02-betydb-api-access.Rmd", "traits/03-access-r-traits.Rmd", "traits/04-danforth-indoor-phenotyping-facility.Rmd", -"traits/06-agronomic-metadata.Rmd", "traits/07-betydb-sql-access.Rmd", "traits/10-simulated-sorghum.Rmd"]#, "10-simulated-sorghum.Rmd" +rmd_files: ["index.Rmd", "vignettes/00-introduction.Rmd", "vignettes/01-get-trait-data-R.Rmd", "vignettes/02-get-weather-data-R.Rmd", +"vignettes/03-get-images-python.Rmd", "vignettes/04-synthesis-data.Rmd", "traits/03-access-r-traits.Rmd","sensors/01-meteorological-data.Rmd", +"sensors/06-list-datasets-by-plot.Rmd"] diff --git a/index.Rmd b/index.Rmd index 8da0886..bb8043b 100644 --- a/index.Rmd +++ b/index.Rmd @@ -11,15 +11,38 @@ output: # Overview -This book is intended to introduce users to TERRA REF data as quickly as possible. +This book is intended to quickly introduce users to TERRA REF data through a series of tutorials. TERRA REF has many types of data, and most can be accessed in multiple ways. Although this makes it more complicated to learn (and teach!), the objective is to provide users with the flexibility to access data in the most useful way. -It introduces to the wide range of phenomics datasets generated by the TERRA Reference program. Not only does TERRA REF have a large number of data sets, but many of the databases can be accessed in a number of different ways. While this makes it more complicated to learn, the goal is to provide users with the flexibility to access data in the most useful way. + +## Contents + +The first section walks the user through the steps of downloading and combining three different types of data: plot level phenotypes, meteorological data, and images. Subesquent sections provide more detailed examples that show how to access a larger variety of data and meta-data. + +## Pre-requisites + +While we assume that readers will have some familiarity with the nature of the problem - remote sensing of crop plants - for the most part, these tutorials assume that the user will bring their own scientific questions and a sense of curiosity and are eager to learn. + +These tutorials are aimed at users who are familiar with or willing to learn programming languages including R (particularly for accessing plot level trait data) and Python (primarily for accessing environmental data and sensor data). In addition, there are examples of using SQL for more sophisticated database queries as well as the bash terminal. + +Some of the lessons only require a web browser; others will assume familarity with programming at the command line in (typically only one of) Python, R, and / or SQL. You should be willing to find help (see finding help, below). + +## Technical Requirements + +At a minimum, you should have: + +* An internet connection +* Web Browser +* Access to the data that you are using + + The tutorials will state which databases you will need access to +* Software: + + Software requirements vary with the tutorials, and may be complex ## User Accounts and permission to access TERRA REF data -TODO: link to relevant parts of docs.terraref.org +We have tried to write these tutorials using open access sample data sets. However, access to much of the data will require you to 1) fill out the TERRA REF Beta user questionaire ([terraref.org/beta](terraref.org/beta)) and 2) request access to specific databases. -* Info on how to [request access to data](https://docs.terraref.org/user-manual/how-to-access-data/using-betydb-trait-data-experimental-metadata) + ## Ways of Acessing Data @@ -40,41 +63,21 @@ The TERRA REF website: [terraref.org](http://terraref.org/) The TERRA REF Technical Documentation: [docs.terraref.org](docs.terraref.org) -## Contents - -Scope ... - -Audience ... - - -## Pre-requisites - -While we assume that readers will have some familiarity with the nature of the problem - remote sensing of crop plants - for the most part, these tutorials assume that the user will bring their own scientific questions and a sense of curiosity and are eager to learn. - -Some of the lessons only require a web browser; others will assume familarity with programming at the command line in (typically only one of) Python, R, and / or SQL. You should be willing to find help (see finding help, below). - -## Technical Requirements - -At a minimum, you should have: - -* An internet connection -* Web Browser -* A TERRA REF Beta User account - + If you have not done so, please sign up at [terraref.org/beta](terraref.org/beta) -* Access to the data that you are using - + The tutorials will state which databases you will need access to -* Software: - + Software requirements vary with the tutorials, and may be complex - ## Finding help -- [Slack](terra-ref.slack.com) -- [GitHub](https://github.com/terraref/tutorials) -- [Google](https://www.google.com/) +- Slack at terra-ref.slack.com ([signup](https://terraref-slack-invite.herokuapp.com/)) +- Browse issues and repositories in GitHub: + - search the organization at github.com/terraref + - questions about the tutorials in the [tutorials repository](https://github.com/terraref/tutorials/issues) + - about the data in the [reference-data repository](https://github.com/terraref/reference-data/issues) ```{r, include = FALSE} -knitr::opts_chunk$set(echo = FALSE, cache = TRUE) +knitr::opts_chunk$set(echo = FALSE, + engine.path = list( + python = 'python3' + )) + options(warn = -1) ``` diff --git a/traits/00-BETYdb-getting-started.Rmd b/traits/00-BETYdb-getting-started.Rmd index 5457247..e4dc960 100644 --- a/traits/00-BETYdb-getting-started.Rmd +++ b/traits/00-BETYdb-getting-started.Rmd @@ -1,4 +1,4 @@ -# (PART\*) Secton 1: Traits {-} +# (PART\*) Secton 2: Traits {-} # Getting Started with BETYdb diff --git a/traits/02-betydb-api-access.Rmd b/traits/02-betydb-api-access.Rmd index eafb8e0..351802f 100644 --- a/traits/02-betydb-api-access.Rmd +++ b/traits/02-betydb-api-access.Rmd @@ -26,10 +26,10 @@ The first step toward reproducible pipelines is to automate the process of searc ### Using Your API key to Connect An API key is like a password. It allows you to access data, and should be kept private. -Therefore, we are not going to put it in code that we share. The one exception is the key 9999999999999999999999999999999999999999 that will allow you to access metadata tables (all tables except _traits_ and _yields_). It will also allow you to access all of the simulated data in the https://terraref.ncsa.illinois.edu/bety-test database. -A common way of handling private API keys is to place it in a text file in your current directory. -Don't put it in a project directory where it might be inadvertently shared. +Therefore, we are not going to put it in code that we share. + +A common way of handling private API keys is to place it in a text file in your current directory. Don't put it in a project directory where it might be inadvertently shared. Here is how to find and save your API key: @@ -37,7 +37,8 @@ Here is how to find and save your API key: * copy the api key that was sent when you registered into the file * file --> save as '.betykey' -For the public key, you can call this file `.betykey_public`. +An API key is not needed to access public data. This includes metadata tables and simulated data in the https://terraref.ncsa.illinois.edu/bety-test database. + ## Accessing data using a URL query @@ -49,7 +50,9 @@ For the public key, you can call this file `.betykey_public`. * path to the api: `/api/v1` * api endpoint: `/search` or `traits` or `sites`. For BETYdb, these are the names of database tables. * Query parameters: `genus=Sorghum` -* Authentication: `key=9999999999999999999999999999999999999999` is the public key for the TERRA REF traits database. + +* Authentication: `key=api_key` is your assigned API key. This will only be needed when querying trait data. No key is needed to access the public metadata tables. + ### Constructing a URL query @@ -62,17 +65,17 @@ First, lets construct a query by putting together a URL. 3. Add the name of the table you want to query. Lets start with `variables` * terraref.ncsa.illinois.edu/bety/api/v1/variables 4. add query terms by appending a `?` and combining with `&`, for example: - * `key=9999999999999999999999999999999999999999` * `type=trait` where the variable type is 'trait' * `name=~height` where the variable name contains 'height' 5. This is your complete query: - * `terraref.ncsa.illinois.edu/bety/api/v1/variables?type=trait&name=~height&key=9999999999999999999999999999999999999999` + * `terraref.ncsa.illinois.edu/bety/api/v1/variables?type=trait&name=~height` * it will query all variables that are type trait and have 'height' in the name * Does it return the expected values? ## Your Turn -> What will the URL https://terraref.ncsa.illinois.edu/bety/api/v1/species?genus=Sorghum&key=9999999999999999999999999999999999999999 return? +> What will the URL https://terraref.ncsa.illinois.edu/bety/api/v1/species?genus=Sorghum return? + > Write a URL that will query the database for sites with "Field Scanner" in the name field. Hint: combine two terms with a `+` as in `Field+Scanner` @@ -84,23 +87,30 @@ Type the following command into a bash shell (the `-o` option names the output f ```sh curl -o sorghum.json \ - "https://terraref.ncsa.illinois.edu/bety/api/v1/species?genus=Sorghum&key=9999999999999999999999999999999999999999" + "https://terraref.ncsa.illinois.edu/bety/api/v1/species?genus=Sorghum" ``` If you want to write the query without exposing the key in plain text, you can construct it like this: ```sh curl -o sorghum.json \ - "https://terraref.ncsa.illinois.edu/bety/api/v1/species?genus=Sorghum&key=`cat .betykey_public`" + "https://terraref.ncsa.illinois.edu/bety/api/v1/species?genus=Sorghum" ``` ## Using the R jsonlite package to access the API with a URL query + +```{r 02-jsonlite-load, include = FALSE} + +library(jsonlite) + +``` + ```{r text-api, warning = FALSE} sorghum.json <- readLines( paste0("https://terraref.ncsa.illinois.edu/bety/api/v1/species?genus=Sorghum&key=", - readLines('traits/.betykey'))) - + readLines('.betykey'))) + ## print(sorghum.json) ## not a particularly useful format ## lets convert to a data frame diff --git a/traits/03-access-r-traits.Rmd b/traits/03-access-r-traits.Rmd index bb42101..6bbe9fc 100644 --- a/traits/03-access-r-traits.Rmd +++ b/traits/03-access-r-traits.Rmd @@ -8,50 +8,54 @@ The rOpenSci traits package makes it easier to query the TERRA REF trait databas Install the traits package -The traits package is on CRAN, and can therefore be installed using the following command: +The traits package can be installed through github using the following command: ```{r install_traits, echo = TRUE, message = FALSE} -install.packages('traits', repos = 'http://cran.rstudio.com/') + +if(packageVersion("traits") == '0.2.0'){ + devtools::install_github('ropensci/traits') +} + ``` Load other packages that we will need to get started. -```{r 00-setup, message = FALSE, echo = TRUE} +```{r 00-setup, message = FALSE, echo = TRUE, warning = FALSE} library(traits) library(ggplot2) library(ggthemes) theme_set(theme_bw()) library(dplyr) ``` +Create a file that contains your API key. If you have signed up for access to the TERRA REF database, your API key will have been sent to you in an email. You will need this personal key _and_ permissions to access the trait data. If you receive empty (NULL) datasets, it is likely that you do not have permissions. -Create a file that contains your API key. If you have signed up for access to the TERRA REF database, your API key will have been sent to you in an email. The public key will provide access to all metadata; you will need a personal key _and_ permissions to access the trait data. If you receive empty (NULL) datasets, it is likely that you do not have permissions. ```{r writing-key, echo = TRUE} # This should be done once with the key sent to you in your email -# writeLines('abcdefg_rest_of_key_sent_in_email', + +# Example: +#writeLines('abcdefg_rest_of_key_sent_in_email', # con = '.betykey') -# Example with the public key: -writeLines('9999999999999999999999999999999999999999', - con = '.betykey_public') ``` + #### R - using the traits package The R traits package is an API 'client'. It does two important things: 1. It makes it easier to specify the query parameters without having to construct a URL 2. It returns the results as a data frame, which is easier to use within R -Lets start with the query of information about Sorghum from species table from above +Lets start with the query of information about Sorghum from the species table -```{r query-species, echo = TRUE} +```{r query-species, results = 'hide', echo = TRUE} sorghum_info <- betydb_query(table = 'species', - genus = "Sorghum", - api_version = 'v1', - limit = 'none', - betyurl = "https://terraref.ncsa.illinois.edu/bety/", - key = readLines('.betykey', warn = FALSE)) + genus = "Sorghum", + api_version = 'v1', + limit = 'none', + betyurl = "https://terraref.ncsa.illinois.edu/bety/", + key = readLines('.betykey', warn = FALSE)) ``` @@ -59,8 +63,6 @@ sorghum_info <- betydb_query(table = 'species', Notice all of the arguments that the `betydb_query` function requires? We can change this by setting the default connection options thus: - - ```{r 03-set-up, echo = TRUE} options(betydb_key = readLines('.betykey', warn = FALSE), betydb_url = "https://terraref.ncsa.illinois.edu/bety/", @@ -69,7 +71,8 @@ options(betydb_key = readLines('.betykey', warn = FALSE), Now the same query can be reduced to: -```{r query-species-reduce, echo = TRUE, results = FALSE} +```{r query-species-reduce, message = FALSE, echo = TRUE} + sorghum_info <- betydb_query(table = 'species', genus = "Sorghum", limit = 'none') @@ -78,20 +81,23 @@ sorghum_info <- betydb_query(table = 'species', ### Time series of height Now let's query some trait data. -```{r canopy_height, echo = TRUE, results = FALSE} -sorghum_height <- betydb_query(table = 'search', + +```{r canopy_height, echo = TRUE, message = FALSE} +canopy_height <- betydb_query(table = 'search', trait = "canopy_height", - sitename = "~Season 6", + sitename = "~Season 2", limit = 'none') ``` ```{r plot_height} -ggplot(data = sorghum_height, + +ggplot(data = canopy_height, aes(x = lubridate::yday(lubridate::ymd_hms(raw_date)), y = mean)) + geom_point(size = 0.5, position = position_jitter(width = 0.1)) + # scale_x_datetime(date_breaks = '6 months') + xlab("Day of Year") + ylab("Plant Height") + guides(color = guide_legend(title = 'Genotype')) + theme_bw() + ``` diff --git a/traits/04-danforth-indoor-phenotyping-facility.Rmd b/traits/04-danforth-indoor-phenotyping-facility.Rmd index b53f7ec..a049a81 100644 --- a/traits/04-danforth-indoor-phenotyping-facility.Rmd +++ b/traits/04-danforth-indoor-phenotyping-facility.Rmd @@ -1,6 +1,7 @@ # Danforth Indoor Phenotype Analysis ```{r 02-setup, include=FALSE} + knitr::opts_chunk$set(echo = TRUE, cache = TRUE) library(jsonlite) library(dplyr) @@ -21,7 +22,8 @@ library(traits) Unlike the first two tutorials, now we will be querying real data from the public TERRA REF database. So we will use a new URL, https://terraref.ncsa.illinois.edu/bety/, and we will need to use our own private key. ```{r terraref-connect-options} -options(betydb_key = readLines('traits/.betykey', warn = FALSE), + +options(betydb_key = readLines('.betykey', warn = FALSE), betydb_url = "https://terraref.ncsa.illinois.edu/bety/", betydb_api_version = 'v1') ``` @@ -92,7 +94,7 @@ ggplot(data = danforth_sorghum) + ### Growth rate over time -```{r danforth-phenotypes, fig.width=8, fig.height=4} +```{r danforth-phenotypes, fig.width=8, fig.height=4, message = FALSE} ggplot(data = danforth_sorghum, aes(x = date, y = mean, color = cultivar)) + # geom_line(aes(group = entity), size = 0.1) + diff --git a/traits/05-maricopa-field-scanner.Rmd b/traits/05-maricopa-field-scanner.Rmd index 910228d..37d6da9 100644 --- a/traits/05-maricopa-field-scanner.Rmd +++ b/traits/05-maricopa-field-scanner.Rmd @@ -1,7 +1,7 @@ # Plot level data from the field scanner in Maricopa, AZ ```{r traits-05-mac-traits-setup, include=FALSE} -knitr::opts_chunk$set(echo = FALSE, cache = TRUE) +knitr::opts_chunk$set(echo = FALSE, cache = FALSE) library(dplyr) library(tidyr) library(ggplot2) @@ -23,20 +23,23 @@ options(betydb_key = readLines('.betykey', warn = FALSE), First, query the plots for Season 2. The simple way to use this is based on the fact that the plot names at Maricopa contain the season. -```{r traits-05-query-mac-sites, echo = TRUE} +```{r traits-05-query-mac-sites, echo = TRUE, message = FALSE} sites <- betydb_query( table = "sites", city = "Maricopa", sitename = "~Season 2 range", limit = "none") ``` -A more robust (but complicated way) would be to query the experiments and experiments_sites tables. But we will leave that for later. +A more robust (but complicated way) would be to query the experiments and experiments_sites tables. But we will leave that as an exercise for the ambitious user. + ### Plot Season 2 plots ```{r traits-05-map-mac-polygons, echo = TRUE} -site_bounds <- (sites - %>% rowwise() - %>% do(boundaries = readWKT(text = .$geometry, id = .$id))) + +site_bounds <- sites %>% + rowwise() %>% + do(boundaries = readWKT(text = .$geometry, id = .$id)) + site_bounds <- do.call('rbind', site_bounds$boundaries) #names(site_bounds) <- sites$sitename @@ -54,6 +57,7 @@ leaflet() %>% ``` ```{r} + ## Cultivars ``` @@ -80,9 +84,11 @@ leaflet() %>% ``` +``` + ```{r traits-05-height-cover-ndvi} #variables <- betydb_query( -# table = "variables", name = "~^(NDVI|canopy_height|canopy_cover|)$") +# table = "variables", name = "~^(NDVI|canopy_height|canopy_cover|)$") # a tilde ~ can be used to partially match a string #the tilde is used in this query to get variable names that contain either 'NDVI', 'canopy_height', or 'canopy_cover' #variables %>% # select(id, name, units, n_records = `number of associated traits`) diff --git a/traits/06-agronomic-metadata.Rmd b/traits/06-agronomic-metadata.Rmd index 86a03b7..6af2f6a 100644 --- a/traits/06-agronomic-metadata.Rmd +++ b/traits/06-agronomic-metadata.Rmd @@ -71,7 +71,7 @@ Here are some key tables and fields that we will look at: | management_id | managements.id | -```{r 06-setup, include = FALSE} +```{r tutorial-06-set-up, include = FALSE} library(dplyr) library(tidyr) @@ -89,7 +89,8 @@ options(betydb_key = readLines('.betykey', warn = FALSE), -```{r} +```{r 06_tibble, echo = TRUE, warning = FALSE} + ## query and join tables species <- betydb_query(table = "species") %>% select(specie_id = id, scientificname, genus) @@ -132,7 +133,8 @@ Let's do the manual equivalent of a cross-table join. BETY actually does contain The key idea here is that each treatment is associated with some (possibly many) managements, but the treatments table only reports the number of associated managements. To see the management IDs themselves, we need to query an individual treatment ID. So, we retrieve one table, then iterate over each row extracting the foreign keys for the other table. This requires an API call for every treatment, so beware that it is likely to be slow! -```{r} +```{r 06_cross_join, echo = TRUE, results = 'hide'} + treatments <- betydb_query(table = 'treatments') %>% select(treatment_id = id , name, definition, control) diff --git a/traits/07-betydb-sql-access.Rmd b/traits/07-betydb-sql-access.Rmd index 4683f5a..1bce7bd 100644 --- a/traits/07-betydb-sql-access.Rmd +++ b/traits/07-betydb-sql-access.Rmd @@ -21,30 +21,3 @@ User: viewer Password: DelchevskoOro DB: bety ``` - -## Installing the database locally - - -You can run the entire database locally, with daily imports: - -```sh -docker run --name betydb -p 5432:5432 terraref/bety-postgis -``` - -Now it will appear that you have the entire trait database running at localhost on port 5432 just like if it were installed on your system! - - -```{r eval=FALSE} -library(RPostgreSQL) -dbcon <- dbConnect(RPostgreSQL::PostgreSQL(), - dbname = "bety", - password = 'bety', - host = 'localhost', - user = 'bety', - port = 5432) -``` - - - -> #```{sql connection = dbcon, eval=FALSE, } -> #``` diff --git a/traits/10-simulated-sorghum.Rmd b/traits/10-simulated-sorghum.Rmd index 71abeb4..013c82e 100644 --- a/traits/10-simulated-sorghum.Rmd +++ b/traits/10-simulated-sorghum.Rmd @@ -1,8 +1,8 @@ # A Simulated Phenotype Dataset -```{r warnings=FALSE, echo=FALSE} +```{r include = FALSE} library(traits) -knitr::opts_chunk$set(echo = FALSE, cache = TRUE) +knitr::opts_chunk$set(echo = FALSE, cache =FALSE) library(ggplot2) library(ggthemes) library(GGally) @@ -145,7 +145,8 @@ ggplot(sorghum_sla) + ## Your turn: query the list of available traits from the variables table -```{r query-traits} +```{r query-traits, message = FALSE} + trait_list <- c("Vcmax", "c2n_leaf", "cuticular_cond", "SLA", "quantum_efficiency", "leaf_respiration_rate_m2", "stomatal_slope.BB", "Jmax", "chi_leaf", "extinction_coefficient_diffuse") @@ -164,7 +165,7 @@ knitr::kable(variables %>% These traits are not time series, each of the ~500 genotypes is associated with a single value for each trait. This is different from the time series of LAI that we saw in the previous exercise or the biomass data that we will look at below. -```{r} +```{r traits-sel, message = FALSE} traits_list <- list() for(trait in trait_list){ @@ -202,7 +203,7 @@ knitr::kable(variables %>% select(name, description, units)) ``` -```{r all_sorghum, cache=TRUE} +```{r all_sorghum, cache = TRUE, results = 'hide'} site_id <- betydb_query(table = 'sites', sitename = "Central IL Plot D")$id @@ -224,24 +225,24 @@ for(t in c('canopy_height', 'stem_biomass', 'LAI', 'NDVI')){ ``` -``` This is how you can query a time series of sorghum height data for the Northern IL site. -```{r query-sorghum-height} -sorghum_height <- betydb_query(table = 'search', - trait = 'canopy_height', - year = 1022, - site = "~Northern IL", - limit = 'none') -#save(sorghum_height, file = 'data/sorghum_height.RData') +```{r query-sorghum-height, echo = TRUE} +#sorghum_height <- betydb_query(table = 'search', +# trait = 'canopy_height', +# year = 1022, +# site = "~Northern IL", +# limit = 'none') + +#save(sorghum_height, file = 'traits/sorghum_height.RData') ``` However, with almost 200k rows it currently takes 40 minutes to query (this is a limitation of the API). For the purposes of this tutorial, we will use a cached copy of the dataset. -```{r} -#load('data/sorghum_height.RData') +```{r 10-sim-sorg-plot, message = FALSE} +load('traits/sorghum_height.RData') s <- sorghum_height %>% mutate(day = lubridate::yday(raw_date), @@ -267,7 +268,7 @@ Now lets look at a 'pairs' plot to see if there is any covariance among the trai First, lets rearrange the data from 'long' to 'wide' format. We will also take this chance to rename the 'cultivar' field to 'genotype'. -```{r} +```{r 10_traits_wide, echo = TRUE} traits_wide <- traits %>% select(genotype = cultivar, trait, mean) %>% @@ -277,7 +278,7 @@ traits_wide <- traits %>% Now, lets create a variable called `max_height` -```{r max_height} +```{r max_height, echo = TRUE} # create the variable max height max_height <- s %>% group_by(genotype) %>% @@ -287,13 +288,14 @@ max_height <- s %>% Now, join the traits data frame with the new max_height data frame trait data we will merge the two data frames on the `genotype` field. -```{r join_traits_height} +```{r join_traits_height, echo = TRUE, warning = FALSE} + traits_height <- traits_wide %>% left_join(max_height, by = 'genotype') ``` Which traits are related to height? We can discover this in a few way, for example, a pairs plot that shows correltations: -```{r trait_pairs, fig.height = 8, fig.width = 8} +```{r trait_pairs, fig.height = 8, fig.width = 8, warning = FALSE} ggpairs(traits_height %>% select(-genotype), lower = list(continuous = 'density'), upper = list(continuous = 'cor'), diff --git a/vignettes/00-introduction.Rmd b/vignettes/00-introduction.Rmd new file mode 100644 index 0000000..2dd17c6 --- /dev/null +++ b/vignettes/00-introduction.Rmd @@ -0,0 +1,3 @@ +# (PART\*) Secton 1: Vignettes {-} + +# Vignettes Introduction \ No newline at end of file diff --git a/vignettes/01-get-trait-data-R.Rmd b/vignettes/01-get-trait-data-R.Rmd new file mode 100644 index 0000000..aedd75e --- /dev/null +++ b/vignettes/01-get-trait-data-R.Rmd @@ -0,0 +1,171 @@ +# Accessing trait data in R + +```{r chunk-options-setup, echo = FALSE} + +options(width = 100) + +``` + +# Introduction + +The objective of this vignette is to demonstrate to users how to query TERRA REF trait data using the traits package. The traits package allows users to easily pass query parameters into a R function, and returns the data in a tabular format that can be analyzed. + +Through this vignette, users will learn how to query and visualize season 6 canopy height data for May 2018. In addition, users will also be shown how to find more information on a season, such as available traits and dates, when performing their own queries. + +\newline +\newline + +# Getting Started + +First, you will need to install and load the traits package from github. + +```{r traits-setup, message = FALSE, results = FALSE} + +devtools::install_github('terraref/traits', force = TRUE) +library(traits) + + +``` + +\newline +\newline + +# How to query trait data + +## Setting options + +The function that you will be using to perform your queries is `betydb_query`. Options can be set to reduce the number of arguments that need to be passed into the function. + +Note: the `betydb_key` option only needs to be set when accessing non-public data. We will be using public data, so this option does not need to be set. However, when needed, pass in the API key that you were assigned when you first registered for access to the TERRA REF database. The key should be kept private and saved to a file named `.betykey` in your current directory. If you are having trouble locating your API key, you can go to [https://terraref.ncsa.illinois.edu/bety/users](https://terraref.ncsa.illinois.edu/bety/users). + + +```{r options-setup} + +options(# betydb_key = 'Your API Key', # to access non-public data + betydb_url = "https://terraref.ncsa.illinois.edu/bety/", + betydb_api_version = 'v1') + +``` + +## An example: Season 6 canopy height data + +The following is an example of how to query season 6, canopy height data for May 2018. + +```{r canopy_height_query, message = FALSE} + +canopy_height <- betydb_query(table = "search", + trait = "canopy_height", + sitename = "~Season 6", + date = "~2018 May", + limit = "none") + + +``` + +A breakdown of the above query: + +* `table = "search"` + + Specify a table to query with the `table` parameter. Trait data may be queried using the `search` table. + +* `trait = "canopy_height"` + + Specify the trait of interest with the `trait` parameter. + + Trait names must be expressed exactly as they are in the TERRA REF databse. So passing in `Canopy height` instead of `canopy_height` would give NULL results. + + More information on how to determine available traits for a season can be found below under `How to query other seasons, traits, and dates`. + +* `sitename = "~Season 6"` + + Indicate the sites that you would like to query using the `sitename` parameter. + + A tilde `~` is used in this query to get all sitenames that contain `Season 6` + +* `date = "~2018 May"` + + Indicate the date of data collection using the `date` parameter. + + A tilde `~` is used in this query to get all records that have a collection date that contains `2018 May` + +* `limit = "none"` + + Indicate the maximum numnber of records you would like returned with the `limit` parameter. We want all records for this query, so we set limit to `none`. + +## Time series of canopy height + +Here is an example of how to visualize the data that we just queried. + +```{r canopy_height_plot, warning = FALSE, message = FALSE, results = FALSE} + +#load in necessary packages +library(ggplot2) +library(lubridate) + +#plot a time series of canopy height +ggplot(data = canopy_height, + aes(x = lubridate::yday(lubridate::ymd_hms(raw_date)), y = mean)) + + geom_point(size = 0.5, position = position_jitter(width = 0.1)) + + xlab("Day of Year") + ylab("Plant Height") + + guides(color = guide_legend(title = 'Genotype')) + + theme_bw() + +``` + +\newline +\newline + +# May 2018 Season 6 Summary + +The TERRA REF database contains other trait data for May 2018 of season 6. Each trait was measured using a specific method. Here is a summary of available traits and their corresponding methods of measurement. + +```{r season_6_query, message = FALSE, results = FALSE, echo = FALSE} + +#load in dplyr package +library(dplyr) + +#get all season 6 data for May 2018 +season_6 <- betydb_query(table = "search", + sitename = "~Season 6", + date = "~2018 May", + limit = "none") +#get summary +season_6_summary <- season_6 %>% group_by(trait, method_name) %>% summarise(number_of_observations = n()) + +``` + +```{r season_6_summary, echo = FALSE, comment = ""} + +print.data.frame(season_6_summary) + +``` + +\newline +\newline + +# How to query other seasons, traits, and dates + +You can query other seasons, traits, and dates by changing the season number, trait name, and date in the example query. If you are unsure of what traits or dates are available for a season, you can use the following R code to get a subset of a season and figure out what specific dates and traits are available. + +To broaden your queries, remove specific parameters. For example, in order to get all of season 2's data for October 2016, remove the `trait` parameter. + +```{r season_2_query, results = FALSE, message = FALSE} + +#get all of season 2 data for October 2016 +season_2_sub <- betydb_query(table = "search", + sitename = "~Season 2", + date = "~2016 Oct", + limit = "none") + +``` + +```{r season_2_traits, comment = ""} + +#get traits available for the subset of season 2 data +traits <- unique(season_2_sub$trait) + +print(traits) + +``` + +```{r season_2_dates, comment = ""} + +#filter for NDVI trait records +ndvi <- dplyr::filter(season_2_sub, trait == 'NDVI') + +#get unique dates for NDVI records +ndvi_dates <- unique(ndvi$date) + +print(ndvi_dates) +``` diff --git a/vignettes/03-get-images-python.Rmd b/vignettes/03-get-images-python.Rmd new file mode 100644 index 0000000..2fac28e --- /dev/null +++ b/vignettes/03-get-images-python.Rmd @@ -0,0 +1,116 @@ +--- +title: "Get Source Image Files" +output: html_document +--- + +# Objective: To be able to demonstrate how to locate and retrieve RGB image files + +This vignette shows how to locate and retrieve image files associated with growing Season 6 +from the University of Arizona's [Maricopa Agricultural Center](http://cals-mac.arizona.edu/) +using Python. The files are stored online on the data management system Clowder, +which is accessed using an API. We will be working with the image files generated during the +month of May by limiting the requests to that time period. + +After completing this vignette it should be possible to search for and retrieve other +files through the use of the API. + +As an added bonus we've also included an exmple of how to retrieve the list of available +sensor names through the API. By using the sensor names returned, it's possible to retrieve +other files containing the data the sensors have collected. + +**requirements** +* Python 3 +* the terrautils library + * this can be installed from pypi by running `pip install terrautils` in the terminal +* an API key to access these data + +The API key is a string that gets generated upon request through your Clowder account. Existing +API keys will work with this vignette. To get a new API key it is necessary to first register +with Clowder at "https://terraref.ncsa.illinois.edu/clowder/". First click the `Login` button and +wait for the login screen to appear. Then select the `Sign up` button and enter an email +address you have access to. An email is sent to the entered address with instructions for +completing the registration process. Once registration is complete, log +into Clowder and select the `View profile` menu option from the drop-down that is near the search +control. By clicking the `+ Add` button under "User API Keys" heading in the profile page, a new +key is gnerated. + +## Locating the images + +To begin looking for files, a sensor name and site name are needed. We will be using +'RGB GeoTIFFs Datasets' as the sensor name and '' as the site name. Later in this +vignette we show how to retrieve the list of available sensors. + +As mentioned in the overview, the url string will point to the API to use. In this case +we'll be using "https://terraref.ncsa.illinois.edu/clowder/api" and the key will be the +one you created for your Clowder account. + +```{python eval=FALSE} +from terrautils.products import get_file_listing + +url = 'https://terraref.ncsa.illinois.edu/clowder/api' +key = 'YOUR_KEY_GOES_HERE' +sensor = 'RGB GeoTIFFs Datasets' +sitename = '' +files = get_file_listing(None, url, key, sensor, sitename, + since='2018-05-01', until='2018-05-31') +``` + +The `files` variable now contains an array of all the file in the datasets that match the +sensor in the plot for the month of May. When performing you own queries it's possible that there +are no matches found and the `files` array would be empty. + +# Retrieving the images + +Now that we have a list of files we can retrieve them one-by-one. We do this by creating a URL +that identifies the file to retrieve, making the API call to retrieve the file contents, and writing +the contents to disk. + +To create the correct URL we start with the one defined before and attach the keyword '/files/' +followed by the ID of each file. Assuming we have a file ID of '111', the final URL for retrieving +the file would be: + +``` {sh eval=FALSE} +https://terraref.ncsa.illinois.edu/clowder/api/files/111 +``` + +By looping through each of the returned files from the previous example, and using their ID and +filename, we can retrieve the files from the server and store them locally. + +We are streaming the data returned from our server request (`stream=True` in the code below) due to +the high probability of large file sizes. If the `stream=True` parameter was omitted the file's entire +contents would be in the `r` variable which could then be written to the local file. + +```{python eval=FALSE} +# We are using the same `url` and `key` variables declared in the previous example above. +filesurl = url + '/files/' +params={ 'key': key } + +for f in files: + r = requests.get(fileurl + f.id, params=params, stream=True) + with open(f.filename, 'wb') as o: + for chunk in r.iter_content(chunk_size=1024): + if chunk: + o.write(chunk) + +``` + +The images are now stored on the local file system. + +# Retrieving sensor names + +In this section we retrieve the names of different sensor types that are available. This will +allow you to retrieve files other than those containing RBG image data. + +```{python eval=FALSE} +# We are using the same `url` and `key` variables declared in the previous example above. +from terrautils.products import get_sensor_list, unique_sensor_names + +sensors = get_sensor_list(None, url, key) +names = unique_sensor_names(sensors) +``` + +The variable `names` will now contain the list of all available sensors. Using these sensor +names it's possible to use the above search to locate and then retrieve additional data files. +Substitute the new sensor name for 'RGB GeoTIFFs Datasets' where the variable `sensor` is +assigned above. + diff --git a/vignettes/04-synthesis-data.Rmd b/vignettes/04-synthesis-data.Rmd new file mode 100644 index 0000000..8e8313f --- /dev/null +++ b/vignettes/04-synthesis-data.Rmd @@ -0,0 +1 @@ +# Synthesis Vignette \ No newline at end of file