So you want to use the Checklist recipe to kickstart your own checklist Darwin Core mapping? Awesome! This page has all the information to get you started. To learn more about the concepts of the recipe, browse the other sections of the wiki.
The basic idea behind the Checklist recipe is:
source data → Darwin Core mapping script → generated Darwin Core files
By changing the source data and/or the mapping script, you can alter the generated Darwin Core files. The main advantage is repeatability: once you have done the mapping, you don't have to start from scratch if your source data has been updated. You can just run the mapping script again (with a little tweak here and there) and upload the generated files to a GBIF Integrated Publishing Toolkit for publication. And by having a mapping script, your mapping is also documented.
To know which files to adapt, you need to understand the structure of the recipe.
The structure of the recipe is based on Cookiecutter Data Science. Files and directories indicated with
GENERATED should not be edited manually.
├── README.md : Description of this repository ├── LICENSE : Repository license ├── checklist-recipe.Rproj : RStudio project file ├── .gitignore : Files and directories to be ignored by git │ ├── data │ ├── raw : Source data, input for mapping script │ └── processed : Darwin Core output of mapping script GENERATED │ ├── docs : Repository website GENERATED │ └── src ├── dwc_mapping.Rmd : Darwin Core mapping script, core functionality of this repository ├── _site.yml : Settings to build website in /docs └── index.Rmd : Template for website homepage
Files for the workflow
The recipe has a functional workflow out of the box:
checklist.xlsxcontains some dummy source data to show the functionality of the workflow. It will only be useful if you update it with your own data or replace it with another file (Excel or other). Note that updating or replacing the source data (file) will have consequences for the
dwc_mapping.Rmdscript: both are closely interlinked. See source data.
dwc_mapping.Rmdcontains functional mapping code, but the output files
data/processed/distribution.csvare mostly nonsense. Only by adapting the mapping script you will get proper Darwin Core files. See R Markdown.
Files for GitHub
.gitignore are used for versioning and to have a proper repository on GitHub. Open the Markdown file
README.md in RStudio or another text editor to see the hidden instructions on how to adapt it for your checklist.
If you are not planning to make your checklist mapping available on GitHub (which would be a pity), you can delete these files.
Files for the website
src/index.Rmd and the directory
docs are used to transform your
README.md and mapping script into a RMarkdown website that can be hosted on GitHub, like this one. This is a bit more advanced, but you can get started with how to generate the website and how to host it on GitHub.
If you are not planning to create a website, you can delete these files.
Convinced? Here's how to setup the recipe on your computer.
What you need first
- GitHub account: create one if don't have one yet
- GitHub Desktop: download and install it on your computer
- RStudio: download and install it on your computer
If you are familiar with git, you can also use it directly in RStudio, rather than installing GitHub Desktop.
Create your checklist repository
- Go to https://github.com/trias-project/checklist-recipe.
- In the top right, click
Fork. This will create a copy of the
checklist-reciperepository under your GitHub account.
Download and open your repository in RStudio
- In your newly created repository at
https://github.com/your_username/checklist-recipe, click the green
Clone or downloadbutton and select
Open in desktop. This will open GitHub Desktop and download the repository files to your computer.
- In GitHub Desktop select
Show in Explorer(PC) /
Show in Finder(Mac) from the top menu.
- In your file browser, open the
checklist-recipedirectory and doubleclick
checklist-recipe.Rprojto open the checklist recipe in RStudio.
You will use a number of R packages, which you will need to install first. Copy/paste the following in your R Studio Console:
install.packages(c("tidyverse", "magrittr", "here", "janitor", "readxl", "digest", "rgbif"))
Note: if you get a
Updating Loaded Packageswarning, click
Noto not restart the R session.
Filespane, go to
srcdirectory and open
In the header menu of the open file, click
Run > Run Allto run the mapping code.
If the script was able to run without problems, you are all set! 🎉 Nothing has changed in the output data though (you just overwrote it with the same data), because neither the script or source data were adapted. To understand the basics and the different sections of the mapping script, take a look at the other sections of this wiki.