New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_spss() doesn’t interpret dates properly #72

Closed
huftis opened this Issue May 27, 2015 · 6 comments

Comments

Projects
None yet
3 participants
@huftis
Copy link
Contributor

huftis commented May 27, 2015

The read_spss() function in haven (latest GitHub version) doesn’t interpret dates in SPSS files properly. It looks like they’re not converted to dates at all, but remain as numeric variables. Here’s an example file: http://huftis.org/nedlasting/spss/spss-dates.sav

Running the commands

library(haven)
d = read_spss("spss-dates.sav")
d

the result is

         date            datetime   timemm timemmss timemmssss
1 13638758400 2014-12-24 00:00:00 17:28:00 17:28:00   17:28:00
2 13509676800 2010-11-21 01:02:03  1:02:00  1:02:03    1:02:03

The first column should contain the dates 2014-12-24 and 2010-11-21. But it actually contains the number of seconds since 14th October 1582:

> as.Date(d$date/86400, origin="1582-10-14")
[1] "2014-12-24" "2010-11-21"
> as.POSIXct(d$date, origin="1582-10-14", tz="UTC")
[1] "2014-12-24 UTC" "2010-11-21 UTC"

System information:

Package: haven
Version: 0.2.0.9000
Maintainer: Hadley Wickham <hadley@rstudio.com>
Built: R 3.2.0; x86_64-suse-linux-gnu; 2015-05-27 17:10:06 UTC; unix

R Version:
platform = x86_64-suse-linux-gnu
arch = x86_64
os = linux-gnu
system = x86_64, linux-gnu
status = 
major = 3
minor = 2.0
year = 2015
month = 04
day = 16
svn rev = 68180
language = R
version.string = R version 3.2.0 (2015-04-16)
nickname = Full of Ingredients

Locale:
LC_CTYPE=nn_NO.UTF-8;LC_NUMERIC=C;LC_TIME=nn_NO.UTF-8;LC_COLLATE=nn_NO.UTF-8;LC_MONETARY=nn_NO.UTF-8;LC_MESSAGES=nn_NO.UTF-8;LC_PAPER=nn_NO.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=nn_NO.UTF-8;LC_IDENTIFICATION=C

Search Path:
.GlobalEnv, package:haven, package:stats,
package:graphics, package:grDevices, package:utils,
package:datasets, package:methods, Autoloads,
package:base
@larmarange

This comment has been minimized.

Copy link
Contributor

larmarange commented May 29, 2015

A similar problem seems to happen with read_dta and date only variables.

@hadley

This comment has been minimized.

Copy link
Member

hadley commented Jun 22, 2015

@huftis would you mind creating a version with dates 2010-01-01 & 1970-01-01 and times 0300 and 1500?

@hadley

This comment has been minimized.

Copy link
Member

hadley commented Jun 22, 2015

@larmarange could you please do the same thing with a dta file?

@hadley hadley closed this in 850fc08 Jun 22, 2015

@hadley

This comment has been minimized.

Copy link
Member

hadley commented Jun 22, 2015

@larmarange and open a new issue?

@huftis

This comment has been minimized.

Copy link
Contributor

huftis commented Jun 23, 2015

I have uploaded a file with the dates and times you requested (and the original example dates/date-times/times) to http://huftis.org/nedlasting/spss/spss-dates-expanded.sav

And here’s a screenshot of how it looks in SPSS: http://huftis.org/nedlasting/spss/spss-dates-expanded.png

@larmarange

This comment has been minimized.

Copy link
Contributor

larmarange commented Jun 23, 2015

I just updated haven to the last version and I have prepared a Stata file available at http://joseph.larmarange.net/stata_dates.dta

  • Variable date has %td format and is imported correctly.
  • datetime has %tc format and is imported correctly
  • date2 has %tdCCYY-NN-DD format, which means it's a date with custom display format. It's not imported as a date. In fact, any format starting with %td, regardless of the following character should be imported as a date (same thing with other Stata formats)
  • datetime2 has %tC format and seems to be imported correctly
  • year has %ty format (year format). It's imported correctly as an atomic vector, containing the year
  • months, week, quarter and halfyear are specific format from Stata, respectively %tm, %tw, %tq and %th. I'm not sure that there are corresponding formats available in R. So far they are imported as integers.

As requested, I have opened a new issue: #80

@lock lock bot locked and limited conversation to collaborators Jun 27, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.