Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refinements to automated substitutions #613

Closed
ehwenk opened this issue Aug 21, 2022 · 3 comments
Closed

Refinements to automated substitutions #613

ehwenk opened this issue Aug 21, 2022 · 3 comments

Comments

@ehwenk
Copy link
Collaborator

ehwenk commented Aug 21, 2022

There are certain circumstances where the automated substitutions code (process.R, line 971) currently requires long lists of substitutions - but maybe could be refined...

Since it only matches entire strings, in circumstances where there are multiple categorical values, one of which needs to be changed, each circumstance with a change to that term needs to be included. For instance, in order to change procumbent to prostrate, there are only 6 times you'd have to replace the term through some variant of str_replace, but 97 different substitutions you'd have to add.

From growth_form branch:

> austraits$traits %>%
+   filter(trait_name == "stem_growth_habit") %>% filter(value == "procumbent") %>% distinct(dataset_id,value)
# A tibble: 6 × 2
  dataset_id         value     
  <chr>              <chr>     
1 Flora_Florabase    procumbent
2 Flora_NT           procumbent
3 Flora_of_Australia procumbent
4 Flora_PlantNet     procumbent
5 Flora_SA           procumbent
6 Flora_VicFlora     procumbent
> austraits$traits %>%
+   filter(trait_name == "stem_growth_habit") %>% filter(str_detect(value, "procumbent")) %>% distinct(dataset_id,value)
# A tibble: 97 × 2
   dataset_id      value                             
   <chr>           <chr>                             
 1 Flora_Florabase procumbent scrambling             
 2 Flora_Florabase procumbent spreading              
 3 Flora_Florabase compact erect procumbent sprawling
 4 Flora_Florabase bushy erect procumbent            
 5 Flora_Florabase bushy procumbent spreading        
 6 Flora_Florabase erect procumbent spreading        
 7 Flora_Florabase erect procumbent                  
 8 Flora_Florabase procumbent prostrate              
 9 Flora_Florabase procumbent                        
10 Flora_Florabase decumbent procumbent prostrate    
# … with 87 more rows
# ℹ Use `print(n = ...)` to see more rows

This gets even harder to fix when the words are entered into the data.csv file in non-alphabetical order, because the output is alphabetical and it is tedious to look up each term in the data.csv file to figure out why the substitution isn't "working".

Could the code be rewritten to replace all instances of a term, rather than an exact string match?

(I also occasionally struggle with capital letters in the input causing substitutions to fail, but this shouldn't be a problem, should it?)

@dfalster
Copy link
Member

@ehwenk - is this still relevant?

@ehwenk
Copy link
Collaborator Author

ehwenk commented Jun 20, 2023

Yes, and this would be good to fix. We tend to resort to using str_replace in custom_R_code to avoid the endless substitutions, which isn't ideal, because it is hiding the substitutions in a sense.

@yangsophieee
Copy link
Collaborator

Moved to traits.build

@ehwenk ehwenk closed this as completed Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants