Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when importing .por datafile (read.spss works) #35

Closed
jlegewie opened this issue Mar 4, 2015 · 14 comments
Closed

Error when importing .por datafile (read.spss works) #35

jlegewie opened this issue Mar 4, 2015 · 14 comments

Comments

@jlegewie
Copy link

@jlegewie jlegewie commented Mar 4, 2015

I am getting this error when I try to open a SPSS por file:

Error in df_parse_por(clean_path(path)) : 
attempt to set index 0/0 in SET_STRING_ELT

Below is code that downloads the file. It works with read.spss.

library("haven")
library("foreign")
# download and unzip file to temporary folder
url <- "http://www.nyc.gov/html/nypd/downloads/zip/analysis_and_planning/2006_sqf.zip"
p1 <- file.path(tempdir(), basename(url))
download.file(url, p1, quiet = TRUE)
filename <- unzip(p1, list = TRUE)$Name[1]
unzip(p1, files = filename, exdir = tempdir())
# open file
p2 <- file.path(tempdir(), filename)
DF <- read_por(p2)
# Error in df_parse_por(clean_path(path)) : 
#  attempt to set index 0/0 in SET_STRING_ELT
DF <- foreign::read.spss(p2, use.value.labels = FALSE, to.data.frame = TRUE)

sessionInfo()

R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] foreign_0.8-62   haven_0.1.1.9000 Defaults_1.1-1  

loaded via a namespace (and not attached):
[1] Rcpp_0.11.4
@evanmiller
Copy link
Collaborator

@evanmiller evanmiller commented Mar 5, 2015

@hadley this file imports OK using plain ReadStat so I assume the problem is in haven.

@jlegewie
Copy link
Author

@jlegewie jlegewie commented Mar 5, 2015

Here is a link to another file with the same error http://www.nyc.gov/html/nypd/downloads/zip/analysis_and_planning/2007_sqf.zip.
For this file, read.spss also produces an error: error reading portable-file dictionary
But memisc::spss.portable.file can read the file.

spss.portable.file(p2) %>%
    as.data.set(stringsAsFactors = FALSE) %>%
    as.data.frame(stringsAsFactors = FALSE)

Probably the same problem but I thought I post considering the different results with read.spss and spss.portable.file.

@evanmiller
Copy link
Collaborator

@evanmiller evanmiller commented Mar 6, 2015

@hadley -- a possible culprit is that the POR callbacks are out of order. The info_callback happens at the end of the file, because AFAIK it's not possible to determine the number of records in advance. Forgot about that until just now.

@hadley
Copy link
Member

@hadley hadley commented Mar 6, 2015

Oh I bet that's it. Will take a look in the next couple of weeks.

@evanmiller
Copy link
Collaborator

@evanmiller evanmiller commented Mar 6, 2015

Thinking about this some more, it might be possible to deduce the record count from the variable manifest and the total file size.

@hadley
Copy link
Member

@hadley hadley commented Mar 6, 2015

Is the info a fixed position from the end of the file? Maybe you could process out of order

@evanmiller
Copy link
Collaborator

@evanmiller evanmiller commented Mar 6, 2015

The record count does not appear explicitly anywhere in the file. The data just "stops". But if each record is a fixed width (will need to check this) then it should be possible to calculate the record count in advance.

@evanmiller
Copy link
Collaborator

@evanmiller evanmiller commented Mar 6, 2015

No such luck -- individual values and hence records are variable width. So I don't think it's possible to get the record count without a full parse.

@hadley
Copy link
Member

@hadley hadley commented Jun 22, 2015

For now, I'm going to throw an error message for read_por() and come back to this when I have more time.

hadley added a commit that referenced this issue May 30, 2016
#35
@hadley
Copy link
Member

@hadley hadley commented May 30, 2016

I've remove read_por() and I'm closing this issue since there doesn't seem to be much interest in reading por files

@hadley hadley closed this May 30, 2016
@evanmiller
Copy link
Collaborator

@evanmiller evanmiller commented Jun 27, 2016

Any chance of revisiting POR support? I've recently changed the callback order to match the others. The only problem now is that obs_count is set to -1 -- however, you'll need to handle this case anyway in order to parse SAV files from Java SPSS Writer.

I've also added a complete POR writer to ReadStat, though its primary purpose is to provide test coverage for the reader.

@hadley
Copy link
Member

@hadley hadley commented Jul 1, 2016

I can take a look next week - I think it's going to be quite a lot of work though, since I'll need an strategy for dynamically reallocating the column vectors.

@hadley hadley reopened this Jul 1, 2016
@hadley
Copy link
Member

@hadley hadley commented Jul 3, 2016

@evanmiller would you consider adding a function that did a full parse to find the number of rows? I can add a helper myself, but it's much more likely that I'll have time to implement this if I don't need to rewrite the internals to automatically grow the vectors.

@evanmiller
Copy link
Collaborator

@evanmiller evanmiller commented Jul 4, 2016

All right, I'll see about adding an option (or maybe a special return value from the info handler -- READSTAT_RETRY or similar).

@hadley hadley closed this in 94f20ed Aug 9, 2016
@lock lock bot locked and limited conversation to collaborators Jun 26, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants