Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify type columns occurrence data beforehand #25

Open
damianooldoni opened this issue Dec 7, 2018 · 4 comments
Open

Specify type columns occurrence data beforehand #25

damianooldoni opened this issue Dec 7, 2018 · 4 comments
Assignees

Comments

@damianooldoni
Copy link
Collaborator

damianooldoni commented Dec 7, 2018

Importing (big) occurrence downloads in R means fighting constantly against parsing failures. This is due to the fact that some fields have NAs in the first rows, sometimes hundreds of thousands.
One trick is to increase the number of rows R uses to guess type (parameter guess_max in read_delim() function). However, if the number of rows with NA is very high, parsing failures have to be solved by defining the type you expect to get. Doing it everytime for each file is time consumming. My idea is to write the specifications for each file occurrence data field. They are 237, as far my experience with occurrence downloads says to me. I already made a list of almost 90 fields few days ago. I put them together in a gist: https://gist.github.com/damianooldoni/01da78e5e55617798804db1804434754. I know, it's boring (very boring!) but it saves time in the future.
@peterdesmet : What do you think about putting it in trias package?

@peterdesmet
Copy link
Member

Adding it as such to TrIAS is one option, since you're almost there. Or, you have a look at finch, which is an R package for reading Darwin Core files. Maybe it is already implemented there and if not, that might be a nice addition.

@damianooldoni
Copy link
Collaborator Author

Nice! Thanks. I will get a look and I will let you know.

@damianooldoni
Copy link
Collaborator Author

Based on discussion in ropensci-archive/finch/issues/25, I would add the parsing types as R file in TrIAS package at the moment. What do you think? I will do it after PR #21 is done.

@peterdesmet
Copy link
Member

Ok. Or maybe easiest to read all columns as text and only recast when necessary?

@damianooldoni damianooldoni self-assigned this Dec 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants