New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when importing .por datafile (read.spss works) #35

Closed
jlegewie opened this Issue Mar 4, 2015 · 14 comments

Comments

Projects
None yet
3 participants
@jlegewie
Copy link

jlegewie commented Mar 4, 2015

I am getting this error when I try to open a SPSS por file:

Error in df_parse_por(clean_path(path)) : 
attempt to set index 0/0 in SET_STRING_ELT

Below is code that downloads the file. It works with read.spss.

library("haven")
library("foreign")
# download and unzip file to temporary folder
url <- "http://www.nyc.gov/html/nypd/downloads/zip/analysis_and_planning/2006_sqf.zip"
p1 <- file.path(tempdir(), basename(url))
download.file(url, p1, quiet = TRUE)
filename <- unzip(p1, list = TRUE)$Name[1]
unzip(p1, files = filename, exdir = tempdir())
# open file
p2 <- file.path(tempdir(), filename)
DF <- read_por(p2)
# Error in df_parse_por(clean_path(path)) : 
#  attempt to set index 0/0 in SET_STRING_ELT
DF <- foreign::read.spss(p2, use.value.labels = FALSE, to.data.frame = TRUE)

sessionInfo()

R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] foreign_0.8-62   haven_0.1.1.9000 Defaults_1.1-1  

loaded via a namespace (and not attached):
[1] Rcpp_0.11.4
@evanmiller

This comment has been minimized.

Copy link
Contributor

evanmiller commented Mar 5, 2015

@hadley this file imports OK using plain ReadStat so I assume the problem is in haven.

@jlegewie

This comment has been minimized.

Copy link

jlegewie commented Mar 5, 2015

Here is a link to another file with the same error http://www.nyc.gov/html/nypd/downloads/zip/analysis_and_planning/2007_sqf.zip.
For this file, read.spss also produces an error: error reading portable-file dictionary
But memisc::spss.portable.file can read the file.

spss.portable.file(p2) %>%
    as.data.set(stringsAsFactors = FALSE) %>%
    as.data.frame(stringsAsFactors = FALSE)

Probably the same problem but I thought I post considering the different results with read.spss and spss.portable.file.

@evanmiller

This comment has been minimized.

Copy link
Contributor

evanmiller commented Mar 6, 2015

@hadley -- a possible culprit is that the POR callbacks are out of order. The info_callback happens at the end of the file, because AFAIK it's not possible to determine the number of records in advance. Forgot about that until just now.

@hadley

This comment has been minimized.

Copy link
Member

hadley commented Mar 6, 2015

Oh I bet that's it. Will take a look in the next couple of weeks.

@evanmiller

This comment has been minimized.

Copy link
Contributor

evanmiller commented Mar 6, 2015

Thinking about this some more, it might be possible to deduce the record count from the variable manifest and the total file size.

@hadley

This comment has been minimized.

Copy link
Member

hadley commented Mar 6, 2015

Is the info a fixed position from the end of the file? Maybe you could process out of order

@evanmiller

This comment has been minimized.

Copy link
Contributor

evanmiller commented Mar 6, 2015

The record count does not appear explicitly anywhere in the file. The data just "stops". But if each record is a fixed width (will need to check this) then it should be possible to calculate the record count in advance.

@evanmiller

This comment has been minimized.

Copy link
Contributor

evanmiller commented Mar 6, 2015

No such luck -- individual values and hence records are variable width. So I don't think it's possible to get the record count without a full parse.

@hadley

This comment has been minimized.

Copy link
Member

hadley commented Jun 22, 2015

For now, I'm going to throw an error message for read_por() and come back to this when I have more time.

hadley added a commit that referenced this issue May 30, 2016

@hadley

This comment has been minimized.

Copy link
Member

hadley commented May 30, 2016

I've remove read_por() and I'm closing this issue since there doesn't seem to be much interest in reading por files

@hadley hadley closed this May 30, 2016

@evanmiller

This comment has been minimized.

Copy link
Contributor

evanmiller commented Jun 27, 2016

Any chance of revisiting POR support? I've recently changed the callback order to match the others. The only problem now is that obs_count is set to -1 -- however, you'll need to handle this case anyway in order to parse SAV files from Java SPSS Writer.

I've also added a complete POR writer to ReadStat, though its primary purpose is to provide test coverage for the reader.

@hadley

This comment has been minimized.

Copy link
Member

hadley commented Jul 1, 2016

I can take a look next week - I think it's going to be quite a lot of work though, since I'll need an strategy for dynamically reallocating the column vectors.

@hadley hadley reopened this Jul 1, 2016

@hadley

This comment has been minimized.

Copy link
Member

hadley commented Jul 3, 2016

@evanmiller would you consider adding a function that did a full parse to find the number of rows? I can add a helper myself, but it's much more likely that I'll have time to implement this if I don't need to rewrite the internals to automatically grow the vectors.

@evanmiller

This comment has been minimized.

Copy link
Contributor

evanmiller commented Jul 4, 2016

All right, I'll see about adding an option (or maybe a special return value from the info handler -- READSTAT_RETRY or similar).

@hadley hadley closed this in 94f20ed Aug 9, 2016

@lock lock bot locked and limited conversation to collaborators Jun 26, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.