Skip to content

Commit

Permalink
Refining seed traits (#561)
Browse files Browse the repository at this point in the history
* Changes to seed-related trait definitions, trait values, and data mapping, informed by the AusTraits workshop on `Refining Seed Trait` Vocabularies help in October/November 2021.

* Workshop participants included Elizabeth Wenk, Hervé Sauquet, Lydia Guja, Mark Ooi, Greg Jordan, Russell Barrett, Carl Gosper, Karen Sommerville and Lily Dun. All members helped decide upon the current trait definitions, allowable trait values (and their definitions), and how to map other terminology onto the chosen trait definitions/trait values.

* The new definitions file serves as the primary output of the workshop. In addition to the changes made in the current commit, there are additional metadata fields that will be added to the trait definitions, once AusTraits has this added capability.

Among the changes made: 

* for all fruit_type and fruit_type_functional (dry/fleshy) terms, align all values to whichever of 3 traits they apply to: fruit_type (botanical fruit types), fruit_fleshiness, and fruit_dehiscence. This captures the collection of terms used and splits them into three tidy traits.

* for seed_texture, align to a much shorter list of terms

* all datasets referencing dispersal_appendages, dispersal_syndromes, or dispersers have now been updated to map the trait data onto a reduced collection of terms.

* add helper functions to move values to different traits
  • Loading branch information
ehwenk committed Feb 17, 2022
1 parent 22df3c0 commit 0b04d61
Show file tree
Hide file tree
Showing 106 changed files with 5,023 additions and 1,895 deletions.
95 changes: 95 additions & 0 deletions R/pre_process.R
Original file line number Diff line number Diff line change
Expand Up @@ -214,3 +214,98 @@ separate_range <- function(data, x, y1, y2, sep="-", remove=TRUE) {
replace_duplicates_with_NA <- function(x) {
base::replace(x, duplicated(x), NA)
}


#' Move select trait values from a pre-existing column (trait_name) to a new column (new trait_name)
#'
#' @param data data frame, representing a specific dataset_id
#' @param original_trait name of the variable in the original data file, representing a trait in a wide dataset
#' @param new_trait name of the new variable being created, representing an additional trait in a wide dataset
#' @param original_values values of the original trait that need to be remapped to a different (new) trait
#' @param value_for_new_trait the appropriate value of the new trait; this may be identical to the original values, or may be a slightly different word/syntax
#' @param value_to_keep the appropriate value to retain for the old trait; this may be identical to the original values or may be NA
#'
#' @return
#' @export
#'
#' @examples
#' data <- read_csv(data/"Hughes_1992/data.csv")
#' data %>% move_values_to_new_trait(data, "growth form", "root_structure", "Saprophyte", "saprophyte") -> data
move_values_to_new_trait <- function(data, original_trait, new_trait, original_values, values_for_new_trait, values_to_keep) {

for (j in 1:length(original_values)) {

i <- data[[original_trait]] == original_values[[j]]

data[[new_trait]] = ifelse(i, values_for_new_trait[[j]], data[[new_trait]])
data[[original_trait]] = ifelse(i, values_to_keep[[j]], data[[original_trait]])
data
}

return(data)
}


add_values_to_additional_trait_long <-
function(data, new_trait, traits_column, values_column, original_values, new_values) {
i <- filter(data,data[[values_column]] %in% original_values)
i[[traits_column]] <- new_trait
i[[values_column]] <- new_values
data <- bind_rows(data,i)
}


move_values_to_new_trait_long <-
function(data, original_trait, new_trait, traits_column, values_column, original_values) {

i <- data[[values_column]] %in% original_values

data[[traits_column]] = ifelse(i, new_trait, data[[traits_column]])

data
}



#' Substitutions from csv
#' @description Function that simultaneously adds many trait value replacements, potentially across many trait_names and dataset_ids, to the respective metadata.yml files.
#' This function will be used to quickly re-align/re-assign trait values across all AusTraits studies.
#'
#' @param dataframe_of_substitutions dataframe with columns indicating dataset_id, trait_name, original trait values (find), and AusTraits aligned trait value (replace)
#' @param dataset_id study's dataset_id in AusTraits
#' @param trait_name trait name for which a trait value replacement needs to be made
#' @param find trait value submitted by the contributor for a data observation
#' @param replace AusTraits aligned trait value
#'
#' @return
#' @export
#'
#' @examples read_csv("export/dispersal_syndrome_substitutions.csv") %>% select(-extra) %>% filter(dataset_id == "Angevin_2011") -> dataframe_of_substitutions
#' @examples substitutions_from_csv(dataframe_of_substitutions,dataset_id,trait_name,find,replace)

substitutions_from_csv <- function(dataframe_of_substitutions,dataset_id,trait_name,find,replace) {

#split dataframe of substitutions by row
dataframe_of_substitutions %>%
dplyr::mutate(rows = row_number()) %>%
dplyr::group_split(rows) -> dataframe_of_substitutions

set_name <- "substitutions"

#add substitutions to metadata files
for (i in 1:max(dataframe_of_substitutions)$rows) {
metadata <- metadata_read_dataset_id(dataframe_of_substitutions[[i]]$dataset_id)

to_add <- list(trait_name = dataframe_of_substitutions[[i]]$trait_name, find = dataframe_of_substitutions[[i]]$find, replace = dataframe_of_substitutions[[i]]$replace)

if(is.null(metadata[[set_name]]) || is.na(metadata[[set_name]])) {
metadata[[set_name]] <- list()
}

data <- list_to_df(metadata[[set_name]])

metadata[[set_name]] <- append_to_list(metadata[[set_name]], to_add)

metadata_write_dataset_id(metadata, dataframe_of_substitutions[[i]]$dataset_id)
}
}
646 changes: 235 additions & 411 deletions config/definitions.yml

Large diffs are not rendered by default.

Loading

0 comments on commit 0b04d61

Please sign in to comment.