Skip to content

Scientific names

Peter Desmet edited this page Jan 30, 2019 · 8 revisions

Scientific names follow nomenclatural rules, mainly governed by the ICZN and ICN.

GBIF name parser

Rather than manually verifying if names in a dataset are well-formed, you can use the GBIF name parser to do that automatically. rgbif provides a function parsenames() to interact with the GBIF name parser:

parsed_names <- input_data %>%
  distinct(scientific_name) %>% # Remove duplicate names: you only need to parse each name once
  pull() %>%                          # Transform dataframe to vector: parsenames() needs a vector of names
  rgbif::parsenames()                 # Parse names

The name parser will dissect the name into its components and return the following values for a well-formed name:

  • type = SCIENTIFIC
  • parsed = TRUE
  • parsedpartially = FALSE

Information deviating from these criteria could imply that the scientific name is incorrect.

Note that the name parser does not check the existence of a scientific name against an existing registry. That is done by the GBIF species lookup, which verifies the existence of a name in the GBIF backbone taxonomy. Since checklists are sometimes the source of new names, checking them against the backbone is of less importance here.

Potentially incorrect names

The type field indicates whether or not the scientific name is truly scientific (type = SCIENTIFIC) or whether it is not a scientificname of any kind (type = NO_NAME). It is important to understand that scientific names deviating from the above criteria are not necessarily incorrect: the name parser just gives you a (very) good idea about which names could be wrong.

The parsed and parsedpartially fields indicate whether or not the name parser has parsed the name fully, which is not always the case. This could be due to spelling errors or when taxonomic, nomenclatural or identification notes are added to the end of the name. In these cases the name will only be parsed partially (parsedpartially = TRUE) or not at all (parsed = FALSE).

Some examples:

For Acmella agg. the name parser returns:

scientificname type genusorabove parsed parsedpartially canonicalname canonicalnamewithmarker canonicalnamecomplete rankmarker
Acmella agg. INFORMAL Acmella TRUE FALSE Acmella Acmella Acmella agg.

Here, the output indicates that Acmella agg. is a scientific name with some informal addition (type = "INFORMAL"). The decision whether or not to change the name is up to the author of the checklist.

For AseroÙ rubra the name parser returns:

scientificname type genusorabove parsed parsedpartially canonicalname canonicalnamewithmarker canonicalnamecomplete rankmarker
AseroÙ rubra SCIENTIFIC Asero Ù TRUE TRUE Asero Asero Asero Ù

The output indicates that the name was parsed only partially (parsedpartially = TRUE). This is due to a typo, i.e. the species name should be Asero rubra. There are two options to correct the scientific name in this case:

  1. In the raw data file (= permanently, recommended in this case)
  2. In the R code, using recode:
input_data %<>% mutate(variable = recode(scientific_name_column,
  "Asero rubra" = "AseroÙ rubra"
))