New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formats for traits, phenotypes, and agronomic managements #18

Closed
dlebauer opened this Issue Oct 16, 2015 · 15 comments

Comments

Projects
None yet
7 participants
@dlebauer
Copy link
Member

dlebauer commented Oct 16, 2015

  • Define traits that will be used in GWAS study, including variable names, units, and methods of measurement / validation.

for example:

name units description method of measurement
SLA m2 kg-1 Specific leaf area leaf area per unit mass hole punch

Ultimately, these should be entered into the trait database which will look like this: https://www.betydb.org/variables/15.

  • what is the core set of traits that teams are interested in measuring for calibrating algorithms to derive traits from sensors, predict yields and crop fitness, and use in genomic analysis? (@JeffWhiteAZ)
    • a few easy ones include specific leaf area, height, leaf area index, leaf number
    • what other simple sensor-derived values, such as mean greenness by plot or plant, NDVI, and other indices
  • Is there a set of trait names, canonical units, and definitions (similar to CF Standard Names that we can use for these e.g. specifically what does ICASA provide? (@chporter)
  • How can we distinguish among sensor-derived and field (e.g. calibration / validation) measurements?
    • ans. BETYdb allows each trait record to be linked to a method defined in a separate table
      this method of measurement, e.g. using a hole-punch vs. whole-leaf measurement of a leaf mass / area could affect the magnitude of the measurement, but at the plant scale is interpreted as the same trait. With calibrating precise sub-leaf measurements, we can subset data by method of measurement without distinguishing the variable names.
    • in addition, when different models are used to estimate a parameter, and the parameter is only relevant for a specific model formulation (e.g. Ball-Berry, Leuning, and Medlyn variants of stomatal slope used to estimate leaf-level gas exchange), we use different records in the variables table, because the values are clearly defined as distinct.
    • this is consistent with PEcAn (pecan project.org) RTM inversion

Managing Synonyms:

Tool for managing synonyms integrated w/ Clowder:

Other vocabularies:

@dlebauer dlebauer added this to the 5.1 Define validation & calibration data sets milestone Oct 16, 2015

@dlebauer

This comment has been minimized.

Copy link
Member

dlebauer commented Oct 16, 2015

Traits From the FOA (many need to be defined)

  • Seedling emergence (how is this quantified? Growing degree days with reference temp?)
  • Seedling vigor (how quantified?)
  • Canopy closure (? days? GDD?)
  • Leaf number
  • Leaf angle (parameters for Leaf angle distribution assuming chi-square? other? an empirical distribution)
  • Stem number
  • Stem volume
  • Stem height
  • Lodging (how is this defined?)
  • Above ground yield (all biomass dry; moisture content can be stored separately)
  • Tissue partitioning (percent stem, leaf, and panicle)
  • Days to anthesis
  • Maturity pattern
  • Biotic stress resistance (leaf necrosis) (definition)
  • Abiotic stress resistance (leaf necrosis)
  • Carbon flux (per leaf area, per ground area)
  • Water flux (per leaf area, per ground area)
  • Plant temperature
  • Plant color (surely many variants on this)

For many plot level mass / energy pools and fluxes, CF provides a well defined set of variables that are proposed for use with hyper spectral and other imaging data products (#11).

standard_name canonical units definition
net_primary_productivity_of_biomass_expressed_as_carbon kg m-2 s-1
net_primary_productivity_of_biomass_expressed_as_carbon_accumulated_in_leaves kg m-2 s-1
surface_downward_mole_flux_of_carbon_dioxide mol m-2 s-1
vegetation_area_fraction 1 "X_area_fraction" means the fraction of horizontal area occupied by X. "X_area" means the horizontal area occupied by X within the grid cell. "Vegetation" means any plants e.g. trees, shrubs, grass.
normalized_difference_vegetation_index 1 "Normalized_difference_vegetation_index", usually abbreviated to NDVI, is an index calculated from reflectances measured in the visible and near infrared channels. It is calculated as NDVI = (NIR - R) / (NIR + R) where NIR is the reflectance in the near-infrared band and R is the reflectance in the red visible band. Reflectance is the ratio of the reflected over the incoming radiation in each spectral band. The calculated value of NDVI depends on the precise definitions of the spectral bands and these definitions may vary between different models and remote sensing instruments.
@chporter

This comment has been minimized.

Copy link

chporter commented Oct 17, 2015

DAvid - we put together a set of core variables that are commonly collected in field crop experiments (attached), which is a subset of the ICASA Data Dictionary. These will be used to harmonize data from USDA experiments as part of a National Agricultural Research Data Network.This is still in draft form, but it may help.
Core Harmonized Crop Experiment Data_JWW_chp.docx

@dlebauer

This comment has been minimized.

Copy link
Member

dlebauer commented Oct 20, 2015

Thanks @chporter that looks great, thank you for sending. I think it is clear this is a standard that we should support. Do you have suggestions for how to handle overlapping vocabularies? For example, it seems both CF and ICASA have two variants of each variable name (ICASA has 'variable name' and 'short name' while CF has a 'variable name', a 'standard_name', and a 'long name').

It seems the CF 'name' field (which is not part of the standard naming conventions, but is included in netCDF metadata) and the ICASA 'short name' and the BETYdb.org 'name' fields are project-specific, often acronyms or other concatenations of convenience. Whereas the ICASA 'variable name' and CF 'standard_name' fields are more canonical and intended for unambiguous meaning. In some cases, these fields overlap. So, one option would be to have a variables table and a lookup synonyms table

variables

  • id (integer, primary key)
  • name
  • standard_name (CF-Style redundant key)
  • definition
  • units

synonyms

  • id
  • variable_id (foreign key to look up variables)
  • source (e.g. ICASA, CF, etc)
  • name (variable name, e.g. SLA)
  • standard_name (more like specific_leaf_area)
  • definition
  • units

e.g.

id name definition units
5 LAI Leaf Area Index m2 m-2
id variable_id source name standard_name definition units
10 5 ICASA LAID leaf_area_index Leaf Area index m2/m2
11 5 CF lai leaf_area_index Leaf Area index m2/m2

Perhaps not the best example since the synonyms, definitions, and names are so similar. But that is a rare case.

This would allow, e.g., any data product to be provided in different 'languages'. Using, e.g., the udunits API (there is an R wrapper), we can set a database level constraint that the values of synonym units are convertable (there is a UDUNITS function that returns true/false on this)

I know you, Jeff, and others have been working on interfaces among databases. Have you already developed or designed something like this? This is just a first draft, but seems like this could be an important link in making databases interoperable.

@dlebauer dlebauer changed the title Define set of traits / phenotypes that will be used in GWAS and yield prediction Formats for traits, phenotypes, and agronomic managements Oct 26, 2015

@dlebauer

This comment has been minimized.

Copy link
Member

dlebauer commented Oct 28, 2015

@chporter @jeffwhite_AZ is Core Harmonized Crop Experiment Data a subset of the master variable list?

Does the master variable list on the AgMIP wiki differ from the google spreadsheet?

@JeffWhiteAZ

This comment has been minimized.

Copy link

JeffWhiteAZ commented Oct 29, 2015

David, I believe The AgMIP wiki version actually is a link to the main Google spreadsheet.
Cheryl can confirm but the core variables are meant to be a subset of the AgMIP/ICASA set.

@tedhabermann

This comment has been minimized.

Copy link

tedhabermann commented Nov 21, 2015

On names for parameters - the inclusion of a Thesaurus Table in the database design is a good idea. I suspect you will end up needing a many-to-many relationship between parameters and names for those parameters. Obviously, a single parameter can have standard names from different vocabularies and you want to be able to support that for interoperability with other communities. Standard names from some vocabularies might have different granularity, so a single name from some vocabulary might cover more than one parameter. I suspect that the generality would pay off.

@craig-willis

This comment has been minimized.

Copy link

craig-willis commented Jul 14, 2016

Per discussion with @dlebauer, the remaining work on this task it to ensure that the identified traits are available in the system and using the service in #31 we have a method to map synonyms between vocabularies. Tentatively, this might mean defining an initial representation (e.g., vocabulary/ontology); ensuring the new traits are present in BETYdb; and that the resulting vocabulary/ontology is in the vocabulary server (#31) with synonyms mapped; along with any relevant documentation.

@craig-willis

This comment has been minimized.

Copy link

craig-willis commented Aug 18, 2016

@JeffWhiteAZ @chporter
I'm looking at the ICASA JSON Data Objects page (http://research.agmip.org/display/dev/JSON+Data+Objects) to understand how we can use this as a common interchange format. Is there a specification or schema for the JSON data objects? It looks like some of the object names in the examples differ from the data dictionary dataset/subset/group (initial_condition v initial_conditions; weathers v weather_station; soils v soil_profiles; soilLayer v soil_profile_layer; dailyWeather, etc). If there is a better forum for me to ask questions about ICASA (e.g., mailing list, group), just let me know.

@dlebauer

This comment has been minimized.

Copy link
Member

dlebauer commented Aug 18, 2016

@chporter Before we put effort into using this as a common format I just want to confirm a few of my assumptions:

  • you think this is a good idea (on behalf of the AgMIP IT group)
  • this is the format used within AgMIP / FACE-IT ... and therefore if we export data in this format it could be easily integrated into the FACE-IT / AgMIP workflows ...
  • this format would be supported by the proposed NARDN-HD (e.g. if NARDN-HD differs, its developers (you) would plan to create necessary translators.
  • AgMIP / FACE-IT has been developing translators around this format, and has code that we can reuse.
@chporter

This comment has been minimized.

Copy link

chporter commented Aug 18, 2016

@dlebauer

  • I think it is a good idea, with caveats. For example, ICASA (and therefore AgMIP) does not yet handle sub-daily data.
  • this is the format that all of our translation libraries are based on, including model-specific translators. These libraries are implemented in FACE-IT, some desktop applications and some HPC simulations (such as Joshua Elliott's PSIMS
  • This is the format we are proposing to be implemented as a data transfer mechanism for NARDN-HD. In some cases, data would be stored in this format, in other cases, it would be translated as an API function.
  • There are converters in FACE-IT for translating from netCDF to AgMIP format. U. Chicago maintains these converters as part of the data-type sniffers that they developed for FACE-IT. I don't think they are on a public repo at this time, but I'm sure they will share them.
@chporter

This comment has been minimized.

Copy link

chporter commented Aug 18, 2016

@craig-willis
There is minimal structure in the json, but there is some.
There are 3 main sections: experiments, weathers and soils. I see that this is not adequately documented on the research site. I'll fix that soon.
Other subsections are correctly described. 'Management events', 'initial conditions', and 'observed' are sub-sections of experiments.

@craig-willis

This comment has been minimized.

Copy link

craig-willis commented Aug 24, 2016

Thank you, @chporter

@dlebauer @nfahlgren I've added a Google Doc to the computing-pipeline folder with a very rough rendering of the PlantCV metadata from terraref/computing-pipeline#36 using the AgMIP JSON Data Objects approach. This is just for discussion.

https://docs.google.com/document/d/1pkt_OBytwx4HioeTCOEQeJKXP4xGGXDmGiwFHWhBqUQ/edit#

Any feedback welcome.

@rachelshekar

This comment has been minimized.

Copy link
Contributor

rachelshekar commented Nov 30, 2016

@craig-willis can this be closed?

@craig-willis

This comment has been minimized.

Copy link

craig-willis commented Nov 30, 2016

@rachelshekar
Unfortunately this is such a large open-ended issue, it's hard to know when it's done. We don't yet have a final set of formats for traits, phenotypes, and agronomic managements, but we're making progress. Personally, I'd like to see us close these large open-ended issues and open new ones with specific completion criteria going forward.

@craig-willis

This comment has been minimized.

Copy link

craig-willis commented Dec 15, 2016

@dlebauer @rachelshekar
I'd like to close this issue and open any related issues for work that remains to be done. I see a few different threads:

  1. Defining an exchange format for traits, phenotypes, and agronomic managements, which I think is covered by the existing ICASA ontology issue (#55)
  2. Making sure specific variables are in BETYdb - it's not clear to me whether this is done or not.
  3. Defining a general model for variables that supports mappings between BETYdb, ICASA, and CO -- this is also part of the ICASA ontology work.
  4. Implementing a method/system for mapping variables. This is covered by #31

@dlebauer dlebauer closed this Dec 15, 2016

@rachelshekar rachelshekar added help wanted and removed question labels Jan 3, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment