# Process occurrences

 This tutorial shows a typical pipeline to download occurrences data from [GBIF](https://www.gbif.org "GBIF") and mantain download history updated.

## Install and load packages

 If you don't, you need to install `RGBIF` and `trias` packages first.

In [4]:
install.packages("rgbif")
devtools::install_github("trias-project/trias")

Installing package into 'C:/R/Library'
(as 'lib' is unspecified)


package 'rgbif' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\damiano_oldoni\AppData\Local\Temp\RtmpuYO90D\downloaded_packages


Skipping install of 'trias' from a github remote, the SHA1 (d4fb15d7) has not changed since last install.
  Use `force = TRUE` to force installation


In case you have them already installed, just load them:

In [5]:
library(rgbif)
library(trias)

## Download GBIF occurrences

### For species of EU concern

#### Input

In this section we will focus on the procedure to download occurrence data for a (check)list of species saved in the file `eu_concern_species.tsv`. After importing the checklist, the taxonomic keys for all species are selected (`taxonKeys`).


In [8]:
input_checklist <- "./data/input/eu_concern_species.tsv"
eu_concern_species <- read.table(file = input_checklist, header = TRUE, sep = "\t")
taxonKeys <- eu_concern_species["gbif_taxonKey"]

You can see that the file contains 38 species.

In [11]:
length(taxonKeys$gbif_taxonKey)

Specifiy the countries of interest based on [ISO_3166-1_alpha-2](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2 "country codes"). For example:

In [14]:
countries <- list(country = c("BE", "PL"))

 It's time to spin up a download request for GBIF occurrence data:

In [15]:
gbif_download_key <- rgbif::occ_download(
  paste("taxonKey = ", paste(unlist(taxonKeys), collapse = ",")), 
  paste("country = ", paste(unlist(countries), collapse = ",")), 
  curlopts = list(verbose = TRUE))

#### Check status of download

Downloads can take a while to succeed. You can always check the status of a download request by means of its metadata by means of RGBIF function `occ_download_meta`:

In [19]:
metadata <- rgbif::occ_download_meta(key = gbif_download_key)
metadata$status

Status PREPARING or RUNNING means the download request is still not succeded and you will have to wait a little more. Still, you don't have to remember which downloads are still running or being prepared. The trias function `update_download_list` will do it automatically.

#### Write download to list of downloads and check pending downloads

All download requests are saved in a tsv file, called [gbif_downloads.tsv](./data/output/gbif_downloads.tsv "output file").
Seven columns are present:
* *gbif_download_key*: key associatd to a download request and automatically generated by GBIF,
* *input_checklist*: input checklist with taxonKyes,
* *input_country*: countries,
* *gbif_download_created*: download request datetime, 
* *gbif_download_status*: download status. One of the following: PREPARING, RUNNING, SUCCEDED, FAILED,
* *gbif_download_doi*: automatically assigned doi
 
 The function `update_download_list` does the following:
* inserts a new line in the file if the download key, *gbif_download_key*, is not present.
* all downloads in the file with status PREPARING or RUNNING are checked and, in case, updated to SUCCEEDED or FAILED.

In [25]:
trias::update_download_list(file = "./data/output/gbif_downloads.tsv", 
                            download_to_add = gbif_download_key, 
                            input_checklist = input_checklist, countries = countries)

ERROR: Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 1 did not have 2 elements
