Refinements to automated substitutions #613

ehwenk · 2022-08-21T23:58:01Z

There are certain circumstances where the automated substitutions code (process.R, line 971) currently requires long lists of substitutions - but maybe could be refined...

Since it only matches entire strings, in circumstances where there are multiple categorical values, one of which needs to be changed, each circumstance with a change to that term needs to be included. For instance, in order to change procumbent to prostrate, there are only 6 times you'd have to replace the term through some variant of str_replace, but 97 different substitutions you'd have to add.

From growth_form branch:

> austraits$traits %>%
+   filter(trait_name == "stem_growth_habit") %>% filter(value == "procumbent") %>% distinct(dataset_id,value)
# A tibble: 6 × 2
  dataset_id         value     
  <chr>              <chr>     
1 Flora_Florabase    procumbent
2 Flora_NT           procumbent
3 Flora_of_Australia procumbent
4 Flora_PlantNet     procumbent
5 Flora_SA           procumbent
6 Flora_VicFlora     procumbent
> austraits$traits %>%
+   filter(trait_name == "stem_growth_habit") %>% filter(str_detect(value, "procumbent")) %>% distinct(dataset_id,value)
# A tibble: 97 × 2
   dataset_id      value                             
   <chr>           <chr>                             
 1 Flora_Florabase procumbent scrambling             
 2 Flora_Florabase procumbent spreading              
 3 Flora_Florabase compact erect procumbent sprawling
 4 Flora_Florabase bushy erect procumbent            
 5 Flora_Florabase bushy procumbent spreading        
 6 Flora_Florabase erect procumbent spreading        
 7 Flora_Florabase erect procumbent                  
 8 Flora_Florabase procumbent prostrate              
 9 Flora_Florabase procumbent                        
10 Flora_Florabase decumbent procumbent prostrate    
# … with 87 more rows
# ℹ Use `print(n = ...)` to see more rows

This gets even harder to fix when the words are entered into the data.csv file in non-alphabetical order, because the output is alphabetical and it is tedious to look up each term in the data.csv file to figure out why the substitution isn't "working".

Could the code be rewritten to replace all instances of a term, rather than an exact string match?

(I also occasionally struggle with capital letters in the input causing substitutions to fail, but this shouldn't be a problem, should it?)

The text was updated successfully, but these errors were encountered:

dfalster · 2023-06-20T09:43:59Z

@ehwenk - is this still relevant?

ehwenk · 2023-06-20T09:48:50Z

Yes, and this would be good to fix. We tend to resort to using str_replace in custom_R_code to avoid the endless substitutions, which isn't ideal, because it is hiding the substitutions in a sense.

yangsophieee · 2023-07-13T04:16:26Z

Moved to traits.build

yangsophieee mentioned this issue Jul 13, 2023

[traits.build adding studies functions] Refinements to automated substitutions traitecoevo/traits.build#21

Open

ehwenk closed this as completed Jul 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refinements to automated substitutions #613

Refinements to automated substitutions #613

ehwenk commented Aug 21, 2022

dfalster commented Jun 20, 2023

ehwenk commented Jun 20, 2023

yangsophieee commented Jul 13, 2023

Refinements to automated substitutions #613

Refinements to automated substitutions #613

Comments

ehwenk commented Aug 21, 2022

dfalster commented Jun 20, 2023

ehwenk commented Jun 20, 2023

yangsophieee commented Jul 13, 2023