Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unusual datetime classing #6

Closed
spgarbet opened this issue Feb 23, 2023 · 12 comments
Closed

Unusual datetime classing #6

spgarbet opened this issue Feb 23, 2023 · 12 comments

Comments

@spgarbet
Copy link
Member

spgarbet commented Feb 23, 2023

Got this email from a user:

Using the R REDCap API on a form I'm looking at the variables hoendat and hoentim. I'm using the R drinkREDCap function from rccola.

For hoendat I'm getting a variable of R classes labelled, POSIXct, and POSIXt. I don't understand the POSIXt part and was expecting a pure date variable with no time to be of class Date.

For hoentim I'm getting class labelled and times with an additional attribute 'format' with value h:m:s. But the variable values are fractions (I assume fraction of a day). 

I'd appreciate getting any further information about why these are set up this way. The most important thing to know is whether the same setup is used for all dates and times in the whole database, and that this is automatic without the user having an opportunity to introduce undesired variability in how dates and times are stored in the project database. I can deal with any format but need to have consistency.

@obregos
Copy link
Collaborator

obregos commented Feb 23, 2023

Reproduced this issue by creating a REDCap field date with type text (date_mdy) and REDCap field time with type text (time)
Pulling the data with drinkREDCap returned the same classes as described in this issue.

> class(df$date)
[1] "labelled" "POSIXct"  "POSIXt"  
> class(df$time)
[1] "labelled" "times" 

@spgarbet
Copy link
Member Author

> x <- as.POSIXct(Sys.Date())
> x
[1] "2023-02-22 18:00:00 CST"
> class(x)
[1] "POSIXct" "POSIXt" 

What is the expected class for a date? This is the default in R.

@nutterb
Copy link
Collaborator

nutterb commented Feb 23, 2023

I think https://github.com/vubiostat/redcapAPI/blob/main/R/fieldToVar.R#L62-L101 is the relevant code (fieldToVar is a bit of a workhorse).

as written, redcapAPI converts all date and date/time formats to POSIXct. For a date format, presumably the POSIXct object would end with "00:00:00". There are two reasons for treating dates this way.

  • First, REDCap exports all date variables as strings in YYYY-MM-DD HH:MM:SS format, which makes it pretty easy to convert to POSIXct without doing any extra work.
  • Second, I prefer POSIXct over Date (I'm not saying it's a good reason, just that it was a motivator).

I'm going to guess hoentim has either a REDCap field type of either time_mm_ss or time. These get converted to R objects using chron::times. (see: https://cran.r-project.org/web/packages/chron/index.html). You are correct that they are fractions of a day. The API writes these as strings in HH:MM or HH:MM:SS format.

If the user would prefer to customize the formats for their dates and/or times, they can set dates = FALSE in the call to have these variables returned as character vectors.

@spgarbet
Copy link
Member Author

The field in question is a field type of time.

@spgarbet
Copy link
Member Author

@nutterb Please review main...BSTATGEN-1057-date-time-classes

Also, why when this runs on CRAN do the test cases not execute? I thought they would reject test cases like this that fail without a file.

@nutterb
Copy link
Collaborator

nutterb commented Feb 27, 2023

These handlers seem like they will work. As I look at it, the only downside I can see is it doesn't permit any flexibility in how to apply the handlers. For instance, if you want to use a different time zone. But in fairness, the existing code doesn't do that either, so it isn't any different by default.

The tests don't run on CRAN because of https://github.com/nutterb/redcapAPI/blob/master/.Rbuildignore#L10 (when the package is built, it omits the file that actually executes the tests)

@spgarbet
Copy link
Member Author

I did consider that one could have a handler for a specific field, not a redcap field type. I couldn't figure out an easy interface that would work for both.

@spgarbet
Copy link
Member Author

spgarbet commented Feb 27, 2023

What if "handlers" applied to field type or field name? It could do very specific overrides. The only issue is the possibility of name collision, which is probably not that huge.

I don't understand on the time zone difference, one can specify any function including user defined functions, thus it could account for any time zone in the function given to handlers.

@nutterb
Copy link
Collaborator

nutterb commented Feb 27, 2023

I think I would fall back on the idea of "do one thing and do it well" here.

I'm not sure we provide an meaningful utility by giving users the ability to control how exportRecords processes each field when they can just as easily transform variables after exportRecords is finished executing. Given my biases (see below), I think adding this functionality is just punishing your future self with trying to debug it when users do something you didn't expect.

field type-level control, however, feels meaningful. Especially since I've locked you into dates = TRUE transforms date, datetime, and time variables into specific R classes. With the handlers have you written it, I can do something like handlers = list(date = identity, time = identity) and then process those fields the way that I want them.

Disclaimer of bias: I have strong opinions about analyzing date and date/time variables. Specifically, I have very rarely encountered cases where I would prefer a Date object over a POSIXct variable. I'd say 99% of the time, POSIXct is a better choice, and I think it's a solid default for exportRecords. I can't be quite so opinionated about the time variables, but the chron classes are well established and understood in a lot of R. I'm not sure what class you would want instead, unless you just wanted the character value of "hh:mm." In which case, handlers(time = identity) is a pretty good solution.

@spgarbet
Copy link
Member Author

I doubt a lot of folks would use this feature. Let's keep it simple and do it well.

@spgarbet
Copy link
Member Author

spgarbet commented Mar 1, 2023

I think the branch is ready to merge.

@spgarbet
Copy link
Member Author

spgarbet commented Mar 1, 2023

Last pull request solved this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants