Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include distribution regions #44

Merged
merged 43 commits into from
Oct 12, 2019
Merged

Conversation

damianooldoni
Copy link
Contributor

@damianooldoni damianooldoni commented Jul 2, 2019

This PR solves #43 and includes regional distributions in unified checklist.

Most of the changes are in 5_unify_information.Rmd. Minor changes in 6_dwc_mapping.Rmd.

Here below the new workflow, in bold the steps added, in italic the steps modified:

  1. Parse temporal (eventDate) information.
  2. Filter distributions: this was already done in @ref(filter-on-distribution)
  3. Map locality and locationId to regional or national level. (see table in Include distribution for regions #43)
  4. Add a Belgian distribution from regional distributions within a checklist if not present.
  5. Choose a single distribution within a checklist for each location. Partly changed by adding locality and locationId to group_by())
  6. Choose a single distribution across checklists. Partly changed by adding locality and locationId to group_by())
  7. Save to CSV.

In DWC mapping, only minor changes applied:

  1. distribution %<>% mutate(dwc_locationID = locationId) instead of distribution %<>% mutate(dwc_locationID = "ISO_3166-2:BE)
  2. distribution %<>% mutate(dwc_locality = locality) instead of distribution %<>% mutate(dwc_locality = "Belgium")

To avoid massive amount of warnings while transforming Inf/-Inf to integer, I split the mutate call to calculate startYear and endYear within checklists in two steps. The change has no influence on results, but it improve code and speed as no warnings have to be returned.

I applied to the two Rmd files the commando styler::style_file() as last commit. I advice to use it on all other mapping steps as well.

Some columns were not loaded correctly due to NAs in very first part of file, so R thought these columns were logicals instead of characters. For brevity, use .default parameter.
some verificationKey are linekd to multiple values in taxonKey. This field should be read as character because it can contains a pipe.
Via this approach we have some changes in distributions of unified, with more Belgian distributions.
Authomatic change while mapping descriptions. However, no change in results.
Applied to 5_***.Rmd and 6_***.Rmd
Copy link
Member

@peterdesmet peterdesmet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @damianooldoni. Sorry it took so long. Also included:

I removed the external location mapping file for Spuitkom etc. and just used recode()

@peterdesmet peterdesmet merged commit cd21268 into master Oct 12, 2019
@peterdesmet peterdesmet deleted the include-distribution-regions branch October 12, 2019 22:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants