New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
write_dta needs to check for valid Stata variable names #132
Comments
Figured out the issue here. It seems like nothing in the underlying C library is checking for valid names in Stata. So, the file is being written with variable (column) names like "Behavior - Drag" which is illegal in Stata. To be prototypical in the Stata world, any delimiters should be replaced be a single underscore and names converted to lowercase. It is fine to have "Behavior - Drag" for a variable label, but not for a variable name. |
Could you please point me to the rules for determining valid stata variable names? |
See also WizardMac/ReadStat#46 |
I've been burnt too many times with R's helpful auto-renaming rules, so I've opted to be strict here and throw and error. |
Then I load the file in Stata 14.1MP8 using:
The problem occurs when using the Stata command 'compress', which is used to optimize storage on disk of the file (e.g., downcasts types to the smallest type possible without loosing precision so things like 1.00000000000000000000000 would be cast as a 1-byte integer value rather than a float/double). In this case, I think there is a problem with the writing functions and how they insert binary zeros around the strings in the data frame (Stata uses binary zeros for padding a column so each record for a string column reserves the same number of bits for storage).
If I write the same data out to a csv:
Then load the same data in Stata:
The issue goes away. I couldn't capture the other error since it crashed Stata each time. I can post the .dta files in version 13 and 14 if you'd like to compare it to the output from Haven.
The text was updated successfully, but these errors were encountered: