Scientific names
Scientific names follow nomenclatural rules, mainly governed by the ICZN and ICN.
Rather than manually verifying if names in a dataset are well-formed, you can use the GBIF name parser to do that automatically. rgbif
provides a function parsenames()
to interact with the GBIF name parser:
parsed_names <- input_data %>%
distinct(scientific_name) %>% # Remove duplicate names: you only need to parse each name once
pull() %>% # Transform dataframe to vector: parsenames() needs a vector of names
rgbif::parsenames() # Parse names
The name parser will dissect the name into its components and return the following values for a well-formed name:
type = SCIENTIFIC
parsed = TRUE
parsedpartially = FALSE
Information deviating from these criteria could imply that the scientific name is incorrect.
Note that the name parser does not check the existence of a scientific name against an existing registry. That is done by the GBIF species lookup, which verifies the existence of a name in the GBIF backbone taxonomy. Since checklists are sometimes the source of new names, checking them against the backbone is of less importance here.
The type
field indicates whether or not the scientific name is truly scientific (type = SCIENTIFIC
) or whether it is not a scientificname of any kind (type = NO_NAME
). It is important to understand that scientific names deviating from the above criteria are not necessarily incorrect: the name parser just gives you a (very) good idea about which names could be wrong.
The parsed
and parsedpartially
fields indicate whether or not the name parser has parsed the name fully, which is not always the case. This could be due to spelling errors or when taxonomic, nomenclatural or identification notes are added to the end of the name. In these cases the name will only be parsed partially (parsedpartially = TRUE
) or not at all (parsed = FALSE
).
Some examples:
For Acmella agg.
the name parser returns:
scientificname | type | genusorabove | parsed | parsedpartially | canonicalname | canonicalnamewithmarker | canonicalnamecomplete | rankmarker |
---|---|---|---|---|---|---|---|---|
Acmella agg. | INFORMAL | Acmella | TRUE | FALSE | Acmella | Acmella | Acmella | agg. |
Here, the output indicates that Acmella agg.
is a scientific name with some informal addition (type = "INFORMAL"
). The decision whether or not to change the name is up to the author of the checklist.
For AseroÙ rubra
the name parser returns:
scientificname | type | genusorabove | parsed | parsedpartially | canonicalname | canonicalnamewithmarker | canonicalnamecomplete | rankmarker |
---|---|---|---|---|---|---|---|---|
AseroÙ rubra | SCIENTIFIC | Asero | Ù | TRUE | TRUE | Asero | Asero | Asero Ù |
The output indicates that the name was parsed only partially (parsedpartially = TRUE
). This is due to a typo, i.e. the species name should be Asero rubra
. There are two options to correct the scientific name in this case:
- In the raw data file (= permanently, recommended in this case)
- In the R code, using recode:
input_data %<>% mutate(variable = recode(scientific_name_column,
"Asero rubra" = "AseroÙ rubra"
))
- Home
- Getting started
- Basics
- Ingredients: Source data
- Instructions: R Markdown
- Utensils: Tidyverse functions
- Dinner: Darwin Core data
- Mapping script
- Data preparation
- Mapping
- GitHub
- Publishing data
- Examples