Permalink
Fetching contributors…
Cannot retrieve contributors at this time
98 lines (62 sloc) 4.08 KB
---
title: "haven 1.0.0"
date: "2016-09-30"
---
```{r, include = FALSE}
library(haven)
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
I'm pleased to announce the release of haven. Haven is designed to faciliate the transfer of data between R and SAS, SPSS, and Stata. It makes it easy to read SAS, SPSS, and Stata file formats in to R data frames, and makes it easy to save your R data frames in to SAS, SPSS, and Stata if you need to collaborate with others using closed source statistical software. Install haven by running:
```R
install.packages("haven")
```
haven 1.0.0 is a major release, and indicates that haven is now largely feature complete and has been tested on many real world datasets. There are four major changes in this version of haven:
1. Improvements to the underlying ReadStat library
1. Better handling of "special" missing values
1. Improved date/time support
1. Support for other file metadata.
There were also a whole bunch of other minor improvements and bug fixes: you can see the complete list in the [release notes](http://haven.tidyverse.org/news/index.html#haven-1.0.0).
## ReadStat
Haven builds on top of the [ReadStat](http://github.com/WizardMac/ReadStat/issues) C library by [Evan Miller](http://www.evanmiller.org). This version of haven includes many improvements thanks to Evan's hard work on ReadStat:
* Can read binary/Ross compressed SAS files.
* Support for reading and writing Stata 14 data files.
* New `write_sas()` allows you to write data frames out to `sas7bdat`
files. This is still somewhat experimental.
* `read_por()` now actually works.
* Many other bug fixes and minor improvements.
## Missing values
haven 1.0.0 includes comprehensive support for the "special" types of missing values found in SAS, SPSS, and Stata. All three tools provide a global "system missing value", displayed as `.`. This is roughly equivalent to R's `NA`, although neither Stata nor SAS propagate missingness in numeric comparisons (SAS treats the missing value as the smallest possible number and Stata treats it as the largest possible number).
Each tool also provides a mechanism for recording multiple types of missingness:
* Stata has "extended" missing values, `.A` through `.Z`.
* SAS has "special" missing values, `.A` through `.Z` plus `._`.
* SPSS has per-column "user" missing values. Each column can declare
up to three distinct values or a range of values (plus one distinct
value) that should be treated as missing.
Stata and SAS only support tagged missing values for numeric columns. SPSS supports up to three distinct values for character columns. Generally, operations involving a user-missing type return a system missing value.
Haven models these missing values in two different ways:
* For SAS and Stata, haven provides `tagged_na()` which extend R's
regular `NA` to add a single character label.
* For SPSS, haven provides `labelled_spss()` that also models
user defined values and ranges.
Use `zap_missing()` if you just want to convert to R's regular `NA`s.
You can get more details in the [semantics vignette](http://haven.tidyverse.org/articles/semantics.html).
## Date/times
Support for date/times has substantially improved:
* `read_dta()` now recognises "%d" and custom date types.
* `read_sav()` now correctly recognises EDATE and JDATE formats as dates.
Variables with format DATE, ADATE, EDATE, JDATE or SDATE are imported as
`Date` variables instead of `POSIXct`.
* `write_dta()` and `write_sav()` support writing date/times.
* Support for `hms()` has been moved into the
[hms](https://github.com/rstats-db/hms) package. Time varibles now have
class `c("hms", "difftime")` and a `units` attribute with value "secs".
## Other metadata
Haven is slowly adding support for other types of metadata:
* Variable formats can be read and written. Similarly to
to variable labels, formats are stored as an attribute on the vector.
Use `zap_formats()` if you want to remove these attributes.
* Added support for reading file "label" and "notes". These are not currently
printed, but are stored in the attributes if you need to access them.