Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

haven is limited in how many columns it will write to a SAS7BDAT file #335

Closed
MichaelTuchman opened this issue Jan 9, 2018 · 7 comments
Closed

Comments

@MichaelTuchman
Copy link

@MichaelTuchman MichaelTuchman commented Jan 9, 2018

Before launching into my code, I do want to say that I really love the tidyverse and I work with it every day. You've made my life easier and I appreciate that.

I tried to use haven to write really wide dataset, and it failed. Here's a reproducible example.

## create a wide data frame with N rows
NR=1000 # number of rows
NC=20000 # number of columns
wide_example=data.frame(matrix(1:(NR*NC),nrow=NR))
library(haven)
write_sas(wide_example,"some_file.sas7bdat")

Fails with this error message

Error in write_sas_(data, normalizePath(path, mustWork = FALSE)) : 
  Writing failure: A row of data will not fit into the file format.

The system craps out between 250 and 400 rows. I would imagine it also varies by the type of data in each row, but I'm less interested in the particulars than in the fact that there is a limit. SAS itself does not impose such a limit. Is there a PAGESIZE parameter that can be tuned? Now, granted, I don't expect the package to be able to create any SAS file, or it would be SAS!. But if there are limits, I would like to see them documented somewhere.

Note

The following SAS code creates the same data set with variables X1 through X20000 (more or less) and the output shows the page size.

%let nr=1000;
%let nc=20000;
data wide_test;
drop row col xval;
  xval=1;
  array x{&nc.};
  do row=1 to &nr.;
    do col=1 to &nc.;
	  x{col}=xval;
	  xval=xval+1;
	end;
	output;
  end;
run;

proc contents data=wide_test;
run;

Wide_test.pdf

@MichaelTuchman MichaelTuchman changed the title haven is limited in how many columns it will write to SAS haven is limited in how many columns it will write to a SAS7BDAT file Jan 9, 2018
@normark
Copy link

@normark normark commented Jan 10, 2018

Same thing mentioned in #272

@MichaelTuchman
Copy link
Author

@MichaelTuchman MichaelTuchman commented Jan 10, 2018

@evanmiller
Copy link
Collaborator

@evanmiller evanmiller commented Jan 11, 2018

As mentioned in #272, this is a distinct issue that is triggered by the number of columns rather than the overall row length.

@evanmiller
Copy link
Collaborator

@evanmiller evanmiller commented Jan 11, 2018

Should be fixed in ReadStat via WizardMac/ReadStat@4635136

@normark
Copy link

@normark normark commented Jan 11, 2018

@evanmiller This is excellent, thank you for the swift action on this! I presume @hadley will pull the upstream changes if tests pass? 😊

@hadley hadley closed this in afd5cc0 Jan 11, 2018
@hadley
Copy link
Member

@hadley hadley commented Jan 11, 2018

Fix confirmed with latest haven:

NR <- 1000
NC <- 20000
wide_example <- as.data.frame(matrix(1:(NR * NC), nrow = NR))
write_sas(wide_example, tempfile())

(@MichaelTuchman for future reference writing to a tempfile() is easier for me since I don't then need to remember to delete that file)

@lock
Copy link

@lock lock bot commented Jul 10, 2018

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Jul 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants