Skip to content

Getting started

Damiano Oldoni edited this page Oct 19, 2019 · 9 revisions

So you want to use the Checklist recipe to kickstart your own checklist Darwin Core mapping? Awesome! This page has all the information to get you started. To learn more about the concepts of the recipe, browse the other sections of the wiki.

The workflow

The basic idea behind the Checklist recipe is:

source data → Darwin Core mapping script → generated Darwin Core files

By changing the source data and/or the mapping script, you can alter the generated Darwin Core files. The main advantage is repeatability: once you have done the mapping, you don't have to start from scratch if your source data has been updated. You can just run the mapping script again (with a little tweak here and there) and upload the generated files to a GBIF Integrated Publishing Toolkit for publication. And by having a mapping script, your mapping is also documented.

To know which files to adapt, you need to understand the structure of the recipe.

Structure

The structure of the recipe is based on Cookiecutter Data Science. Files and directories indicated with GENERATED should not be edited manually.

├── README.md              : Description of this repository
├── LICENSE                : Repository license
├── checklist-recipe.Rproj : RStudio project file
├── .gitignore             : Files and directories to be ignored by git
│
├── data
│   ├── raw                : Source data, input for mapping script
│   └── processed          : Darwin Core output of mapping script GENERATED
│
├── docs                   : Repository website GENERATED
│
└── src
    ├── dwc_mapping.Rmd    : Darwin Core mapping script, core functionality of this repository
    ├── _site.yml          : Settings to build website in /docs
    └── index.Rmd          : Template for website homepage

Files for the workflow

The recipe has a functional workflow out of the box:

data/raw/checklist.xlsxsrc/dwc_mapping.Rmddata/processed/taxon.csv & data/processed/distribution.csv

  • checklist.xlsx contains some dummy source data to show the functionality of the workflow. It will only be useful if you update it with your own data or replace it with another file (Excel or other). Note that updating or replacing the source data (file) will have consequences for the dwc_mapping.Rmd script: both are closely interlinked. See source data.
  • dwc_mapping.Rmd contains functional mapping code, but the output files data/processed/taxon.csv & data/processed/distribution.csv are mostly nonsense. Only by adapting the mapping script you will get proper Darwin Core files. See R Markdown.

Files for GitHub

The files README.md, LICENSE and .gitignore are used for versioning and to have a proper repository on GitHub. Open the Markdown file README.md in RStudio or another text editor to see the hidden instructions on how to adapt it for your checklist.

If you are not planning to make your checklist mapping available on GitHub (which would be a pity), you can delete these files.

Files for the website

The files src/_site.yml, src/index.Rmd and the directory docs are used to transform your README.md and mapping script into a RMarkdown website that can be hosted on GitHub, like this one. This is a bit more advanced, but you can get started with how to generate the website and how to host it on GitHub.

If you are not planning to create a website, you can delete these files.

Setup

Convinced? Here's how to setup the recipe on your computer.

What you need first

  • GitHub account: create one if don't have one yet
  • GitHub Desktop: download and install it on your computer
  • RStudio: download and install it on your computer

If you are familiar with git, you can also use it directly in RStudio, rather than installing GitHub Desktop.

Create your checklist repository

  1. Go to https://github.com/trias-project/checklist-recipe.
  2. In the top right, click Use this template. This will create a copy of the checklist-recipe repository under your GitHub account.

Download and open your repository in RStudio

  1. In your newly created repository at https://github.com/your_username/checklist-recipe, click the green Clone or download button and select Open in desktop. This will open GitHub Desktop and download the repository files to your computer.
  2. In GitHub Desktop select Repository > Show in Explorer (PC) / Show in Finder (Mac) from the top menu.
  3. In your file browser, open the checklist-recipe directory and doubleclick checklist-recipe.Rproj to open the checklist recipe in RStudio.

Run code

  1. You will use a number of R packages, which you will need to install first. Copy/paste the following in your R Studio Console:

    install.packages(c("tidyverse", "tidylog", "magrittr", "here", "janitor", "readxl", "digest", "rgbif"))

    Note: if you get a Updating Loaded Packages warning, click No to not restart the R session.

  2. Then in Files pane, go to src directory and open dwc_mapping.Rmd

  3. In the header menu of the open file, click Run > Run All to run the mapping code.

    RMarkdown_run_all

If the script was able to run without problems, you are all set! 🎉 Nothing has changed in the output data though (you just overwrote it with the same data), because neither the script or source data were adapted. To understand the basics and the different sections of the mapping script, take a look at the other sections of this wiki.

Happy cooking!