Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The exported file cannot be opened by SAS #224

Closed
Jiang-Li-backup opened this issue Oct 8, 2016 · 41 comments · Fixed by #708
Closed

The exported file cannot be opened by SAS #224

Jiang-Li-backup opened this issue Oct 8, 2016 · 41 comments · Fixed by #708
Labels
bug an unexpected problem or unintended behavior readstat

Comments

@Jiang-Li-backup
Copy link

I exported data to a SAS file. This file can be read by haven, but not the SAS. I am using SAS EG 7.1.

@hadley

This comment has been minimized.

@Jiang-Li-backup

This comment has been minimized.

@hadley

This comment has been minimized.

@Jiang-Li-backup

This comment has been minimized.

@rogerjdeangelis

This comment has been minimized.

@rogerjdeangelis
Copy link

Observation

Byte 33 has value 33 which indicates that the R SAS datasets use the layout of unix 64 bit.

I will try to read the R SAS dataset on Unix 64.

This should not matter, Win SAS should be able to read the dataset. I will look at the rest of the layout.
Unfortunately, I don't have WPS on Unix so I will create a unix SAS SAS dataset(not WPS) so more of
the R layout can be checked. SAS datasets seemed to have more filler which complicates the comparison.

At least the magic string matches

0-32 bytes Magic string
000000000000000000000000C2EA8160B31411CFBD92080009C7318C181F1011 WPS
000000000000000000000000C2EA8160B31411CFBD92080009C7318C181F1011 R

33-64 bytes
2222003333010232040000000000000000000301181F10112222003333010232 WPS
3300002200010031000000000000000000000000000000000000000000000000 R

@evanmiller
Copy link
Collaborator

A quick note about the "filler". SAS files are organized into a "header" followed by 1 or more "pages". By default haven will write a header of size 1024 bytes and pages of size 4096 bytes. This layout was common with older versions of SAS, but newer versions seem to use much larger header and page sizes. The sizes are defined as 32-bit integers at (or near) byte offsets 200 and 204. In any event, extra-large page sizes are the source of filler in the SAS files. If you skip to the beginning of each page it should be easier to analyze the file.

For binary debugging I use hecate.

@rogerjdeangelis
Copy link

Thanks for the info and the excellent work.

None of Matt's test datasets have byte 32=33 and byte 35=22. I have checked many other datasets created with different hardware and none of them have the 33/22 combination that write_sas creates

Test datasets for Matt SAS7BDAT

                      32 33 34 35 36 37

compress_no.sas7bdat 22 22 00 32 22 01 02 32 9.0101M3NET_ASRV........
compress_yes.sas7bdat 22 22 00 32 22 01 02 32 9.0101M3NET_ASRV........
lowbwt_i386.sas7bdat 22 22 00 32 22 01 02 32 9.0202M0W32_VSPRO.......
missing_values.sas7bdat 22 22 00 32 22 01 02 32 9.0202M0W32_VSPRO.......
obs_all_perf_1.sas7bdat 22 22 00 32 22 01 02 32 9.0101M3XP_PRO..........
adsl.sas7bdat 22 22 00 33 33 01 02 32 ....9.0202M3X64_ESRV....
eyecarex.sas7bdat 22 22 00 33 22 00 02 31 ....9.0000M0WIN.........
lowbwt_x64.sas7bdat 22 22 00 33 33 01 02 32 ....9.0202M2X64_VSPRO...
natlterr1994.sas7bdat 33 22 00 33 33 00 02 31 ........9.0101M3SunOS...
natlterr2006.sas7bdat 33 22 00 33 33 00 02 31 ........9.0101M3SunOS...
txzips.sas7bdat 33 22 33 22 00 33 33 01 02 31 ........9.0201M0Linux...

@evanmiller

This comment has been minimized.

@hadley hadley added bug an unexpected problem or unintended behavior readstat labels Jan 25, 2017
@realitix

This comment has been minimized.

@mnarasimhan02

This comment has been minimized.

@Kenkleinman

This comment has been minimized.

@rambler

This comment has been minimized.

@normark

This comment has been minimized.

@adam-garcia
Copy link

adam-garcia commented Feb 9, 2018

Though not a direct solution, if an R > SAS workflow is necessary for your work it looks like haven::write_sav() could be a temporary workaround. SAS can then read in the .sav file and variable labels, formats, etc persist (whereas they wouldn't with readr::write_csv(), etc).

So, in R:

library(tidyverse)
library(haven)
# View first 10 obs
diamonds(head, 10)

#> # A tibble: 10 x 10
#>    carat       cut color clarity depth table price     x     y     z
#>    <dbl>     <ord> <ord>   <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#>  1  0.23     Ideal     E     SI2  61.5    55   326  3.95  3.98  2.43
#>  2  0.21   Premium     E     SI1  59.8    61   326  3.89  3.84  2.31
#>  3  0.23      Good     E     VS1  56.9    65   327  4.05  4.07  2.31
#>  4  0.29   Premium     I     VS2  62.4    58   334  4.20  4.23  2.63
#>  5  0.31      Good     J     SI2  63.3    58   335  4.34  4.35  2.75
#>  6  0.24 Very Good     J    VVS2  62.8    57   336  3.94  3.96  2.48
#>  7  0.24 Very Good     I    VVS1  62.3    57   336  3.95  3.98  2.47
#>  8  0.26 Very Good     H     SI1  61.9    55   337  4.07  4.11  2.53
#>  9  0.22      Fair     E     VS2  65.1    61   337  3.87  3.78  2.49
#> 10  0.23 Very Good     H     VS1  59.4    61   338  4.00  4.05  2.39

#  Export to sav
diamonds %>%
  head(10) %>%
  write_sav( "<path-to-file>")

Then, in SAS:

/* Import from sav */
proc import out = diamonds
            datafile = "<path-to-file>"
            dbms = SAV replace;
run;
***
NOTE: The import data set has 10 observations and 10 variables.
NOTE: WORK.DIAMONDS data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
      real time           0.03 seconds
      cpu time            0.01 seconds
**;


/* View the data */
ods listing;
proc print data = diamonds;
run;

***
                                       The SAS System

 Obs    carat cut       color clarity    depth    table    price        x        y        z

   1     0.23 Ideal       E    SI2       61.50    55.00      326     3.95     3.98     2.43
   2     0.21 Premium     E    SI1       59.80    61.00      326     3.89     3.84     2.31
   3     0.23 Good        E    VS1       56.90    65.00      327     4.05     4.07     2.31
   4     0.29 Premium     I    VS2       62.40    58.00      334     4.20     4.23     2.63
   5     0.31 Good        J    SI2       63.30    58.00      335     4.34     4.35     2.75
   6     0.24 Very Good   J    VVS2      62.80    57.00      336     3.94     3.96     2.48
   7     0.24 Very Good   I    VVS1      62.30    57.00      336     3.95     3.98     2.47
   8     0.26 Very Good   H    SI1       61.90    55.00      337     4.07     4.11     2.53
   9     0.22 Fair        E    VS2       65.10    61.00      337     3.87     3.78     2.49
  10     0.23 Very Good   H    VS1       59.40    61.00      338     4.00     4.05     2.39

**;

@hadley

This comment has been minimized.

@rogerjdeangelis

This comment has been minimized.

@rogerjdeangelis

This comment has been minimized.

@Jiang-Li-backup

This comment has been minimized.

@bellafeng

This comment has been minimized.

@xiaodaigh
Copy link

same issue here

@hadley

This comment has been minimized.

@ningjingzhiyuan507

This comment has been minimized.

@al-obrien

This comment has been minimized.

@JoviaNierenberg

This comment has been minimized.

@caimiao0714

This comment has been minimized.

@carloswpla

This comment has been minimized.

@oljees

This comment has been minimized.

@mlaviolet

This comment has been minimized.

@anuj2054

This comment has been minimized.

hadley added a commit that referenced this issue Feb 4, 2019
@reikoch
Copy link

reikoch commented May 20, 2019

For all what it is worth - in https://github.com/reikoch/testfiles there is now a small dummy SAS v9 dataset with genuine v5 and v8 transport file formats next to it.

@datafj
Copy link

datafj commented Sep 22, 2019

Here is my way to export r data to SAS. It uses PROC IML in SAS to run the R code. So yes, it requires SAS and IML. But it should be able to import any R data.frame without problem.

Here is the SAS code:

/*
Import R data to SAS
rfile: r file name
dsn: SAS dataset name
*/
libname test "sas_data_folder";
%LET rfile = my_folder/my_filename.fst;
%LET dsn = test.sas_data_name;
PROC IML;
	rfile = "&rfile";
	SUBMIT rfile / R;
		df = fst::read.fst("&rfile") 
	ENDSUBMIT;
	RUN ImportDataSetFromR("&dsn","df");
QUIT;

I save r data into .fst, but you can use any format you want, just change the R code df = fst::read.fst("&rfile") accordingly. Keep the double quotes.

You don't have to leave R to run the above code. You can generate the SAS code file in R, and then use system to invoke SAS command line to run the SAS program.

You will also need to make sure your SAS can run R code by adding the following to "C:\Program Files\SASHome2\SASFoundation\9.4\nls\en\sasv9.cfg"

-RLANG
-SET R_HOME "C:\Program Files\R\R-3.6.1"

Here is my R code to implement the whole process :

# fst_file: full path of the fst file name
# sas_folder: path of the sas data folder
# sas_dsn: sas data set name
# sas_code_file: full path of the sas code file
ExportToSAS = function(fst_file, sas_folder, sas_dsn, sas_code_file)
{
    # create sas code file
    sas_code = 'LIBNAME mylib "%SAS_FOLDER%";
                PROC IML;
                    SUBMIT fst_file / R;
                        df = fst::read.fst("%FST_FILE%") 
                    ENDSUBMIT;
                    RUN ImportDataSetFromR("mylib.%SAS_DSN%","df");
                QUIT;'
    sas_code = gsub("%FST_FILE%", fst_file, sas_code, fixed = TRUE)
    sas_code = gsub("%SAS_FOLDER%", sas_folder, sas_code, fixed = TRUE)
    sas_code = gsub("%SAS_DSN%", sas_dsn, sas_code, fixed = TRUE)
    writeLines(sas_code, sas_code_file)
    
    # run the sas code file
    sas_log_file = gsub(".sas", ".log", sas_code_file, fixed = TRUE)
    sas_lst_file = gsub(".sas", ".lst", sas_code_file, fixed = TRUE)
    sas_command = '"c:/Program Files/SASHome2/SASFoundation/9.4/sas.exe" -CONFIG "c:/Program Files/SASHome2/SASFoundation/9.4/sasv9.cfg" -NOSPLASH -SYSIN "%SAS_CODE_FILE%" -LOG "%SAS_LOG_FILE%" -PRINT "%SAS_LST_FILE%"'
    sas_command = gsub("%SAS_CODE_FILE%", sas_code_file, sas_command, fixed = TRUE)
    sas_command = gsub("%SAS_LOG_FILE%", sas_log_file, sas_command, fixed = TRUE)
    sas_command = gsub("%SAS_LST_FILE%", sas_lst_file, sas_command, fixed = TRUE)
    system(sas_command, invisible = FALSE)
}

@mlaviolet
Copy link

As Hadley pointed out earlier, SAS is quite litigious and I don't see much hope for a solution. Third party applications like Stat/Transfer can handle the SAS proprietary format, but they've probably paid a good deal of money for the privilege. The best solution I see is to save in SAS transport format with haven::write_xpt() and then import into SAS. Keep in mind variable names are limited to 32 characters, Ability to import SAS datasets is much more important anyway.

@rogerjdeangelis
Copy link

rogerjdeangelis commented Nov 13, 2019 via email

@al-obrien
Copy link

As stated in several places in this thread there are some alternatives to haven::write_sas that work quite well. I have outlined a method I tend to use on StackOverflow, as I believe it may be a bit off topic to share here: https://stackoverflow.com/a/58845669/4481570

@szimmer
Copy link

szimmer commented Nov 30, 2021

Should this function just be removed from the package since it doesn't work?

@Kenkleinman
Copy link

Kenkleinman commented Nov 30, 2021 via email

@hadley
Copy link
Member

hadley commented Nov 30, 2021

I had hoped that we'd be able to resolve this problem in haven, but several years later we haven't made any progress, so I agree that we should remove this function.

@Kenkleinman
Copy link

Kenkleinman commented Nov 30, 2021 via email

@BioStatMatt
Copy link
Contributor

BioStatMatt commented Dec 4, 2021

Reading is easier because we don't need to understand everthing about the binary format to do it. Some of the parts we don't understand may be used directly or as a side-effect by SAS to authenticate the file as genuinely written by SAS software. And, SAS could change how this is done at any time without impacting backward compatibility. Getting data OUT of sas7bdat has always been the primary motivation; it solves the bidirectional compatibility problem. For these reasons, I've always felt that attempting to write sas7bdat is not worth the effort.

Edit: Just wanted to add that Evan and team have made huge advances in this effort over the years. Awesome work!

@aminadibi
Copy link

ahem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior readstat
Projects
None yet
Development

Successfully merging a pull request may close this issue.