Skip to content

haven::write_xpt() produces malformed LABELV8 header for long variable labels (incompatible with SAS) #784

@ankonyeni

Description

@ankonyeni

There is a discrepancy between haven::write_xpt() and the SAS autocall macro loc2xpt when writing XPT v8 datasets that contain variable/column labels longer than 40 characters. An XPT file produced by write_xpt() cannot be read by SAS — SAS reports an error when attempting to import the file.

The XPT V8/V9 specification states that long labels are placed in a labelrecords section whose header looks like:

"HEADER RECORD*******LABELV8 HEADER RECORD!!!!!!!nnnnn "

where nnnnn is the number of variables for which long labels will be defined.

When I write a dataset with a single column whose label exceeds 40 characters using haven::write_xpt(), the LABELV8 header in the produced file looks like this:

"HEADER RECORD*******LABELV8 HEADER RECORD!!!!!!! 1 0 "

When the same dataset is written using the SAS autocall macro loc2xpt, the LABELV8 header looks like this:

"HEADER RECORD*******LABELV8 HEADER RECORD!!!!!!!1 "

The LABELV8 header generated by write_xpt() is not consistent with the XPT V8/V9 specification and differs from the header produced by loc2xpt. This inconsistency appears to be the cause of the SAS read error.

# Minimal example (replace path/filenames as needed)
library(haven)

df <- data.frame(x = 1)

# label longer than 40 characters
attr(df$x, "label") <- "This is a very long variable label that exceeds forty characters"

# write XPT (to current working directory)
haven::write_xpt(df, "test_longlabel.xpt")

Steps to reproduce

  1. Create a dataset with at least one variable whose label is > 40 characters (example above).
  2. Call haven::write_xpt() to write a v8 XPT file.
  3. Inspect the LABELV8 header in the file (open in hex/text viewer).
  4. Attempt to read the XPT file in SAS using xpt2loc macro — SAS fails to read it (error).

This issue prevents interoperability with SAS when writing XPT v8 files that include variable labels longer than 40 characters.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions