New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

haven is limited in how many columns it will write to a SAS7BDAT file #335

Closed
MichaelTuchman opened this Issue Jan 9, 2018 · 7 comments

Comments

Projects
None yet
4 participants
@MichaelTuchman
Copy link

MichaelTuchman commented Jan 9, 2018

Before launching into my code, I do want to say that I really love the tidyverse and I work with it every day. You've made my life easier and I appreciate that.

I tried to use haven to write really wide dataset, and it failed. Here's a reproducible example.

## create a wide data frame with N rows
NR=1000 # number of rows
NC=20000 # number of columns
wide_example=data.frame(matrix(1:(NR*NC),nrow=NR))
library(haven)
write_sas(wide_example,"some_file.sas7bdat")

Fails with this error message

Error in write_sas_(data, normalizePath(path, mustWork = FALSE)) : 
  Writing failure: A row of data will not fit into the file format.

The system craps out between 250 and 400 rows. I would imagine it also varies by the type of data in each row, but I'm less interested in the particulars than in the fact that there is a limit. SAS itself does not impose such a limit. Is there a PAGESIZE parameter that can be tuned? Now, granted, I don't expect the package to be able to create any SAS file, or it would be SAS!. But if there are limits, I would like to see them documented somewhere.

Note

The following SAS code creates the same data set with variables X1 through X20000 (more or less) and the output shows the page size.

%let nr=1000;
%let nc=20000;
data wide_test;
drop row col xval;
  xval=1;
  array x{&nc.};
  do row=1 to &nr.;
    do col=1 to &nc.;
	  x{col}=xval;
	  xval=xval+1;
	end;
	output;
  end;
run;

proc contents data=wide_test;
run;

Wide_test.pdf

@MichaelTuchman MichaelTuchman changed the title haven is limited in how many columns it will write to SAS haven is limited in how many columns it will write to a SAS7BDAT file Jan 9, 2018

@normark

This comment has been minimized.

Copy link

normark commented Jan 10, 2018

Same thing mentioned in #272

@MichaelTuchman

This comment has been minimized.

Copy link

MichaelTuchman commented Jan 10, 2018

@evanmiller

This comment has been minimized.

Copy link
Contributor

evanmiller commented Jan 11, 2018

As mentioned in #272, this is a distinct issue that is triggered by the number of columns rather than the overall row length.

@evanmiller

This comment has been minimized.

Copy link
Contributor

evanmiller commented Jan 11, 2018

Should be fixed in ReadStat via WizardMac/ReadStat@4635136

@normark

This comment has been minimized.

Copy link

normark commented Jan 11, 2018

@evanmiller This is excellent, thank you for the swift action on this! I presume @hadley will pull the upstream changes if tests pass? 😊

@hadley hadley closed this in afd5cc0 Jan 11, 2018

@hadley

This comment has been minimized.

Copy link
Member

hadley commented Jan 11, 2018

Fix confirmed with latest haven:

NR <- 1000
NC <- 20000
wide_example <- as.data.frame(matrix(1:(NR * NC), nrow = NR))
write_sas(wide_example, tempfile())

(@MichaelTuchman for future reference writing to a tempfile() is easier for me since I don't then need to remember to delete that file)

@lock

This comment has been minimized.

Copy link

lock bot commented Jul 10, 2018

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Jul 10, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.