Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
write_dta needs to check for valid Stata variable names #132
Then I load the file in Stata 14.1MP8 using:
The problem occurs when using the Stata command 'compress', which is used to optimize storage on disk of the file (e.g., downcasts types to the smallest type possible without loosing precision so things like 1.00000000000000000000000 would be cast as a 1-byte integer value rather than a float/double). In this case, I think there is a problem with the writing functions and how they insert binary zeros around the strings in the data frame (Stata uses binary zeros for padding a column so each record for a string column reserves the same number of bits for storage).
If I write the same data out to a csv:
Then load the same data in Stata:
The issue goes away. I couldn't capture the other error since it crashed Stata each time. I can post the .dta files in version 13 and 14 if you'd like to compare it to the output from Haven.
Figured out the issue here. It seems like nothing in the underlying C library is checking for valid names in Stata. So, the file is being written with variable (column) names like "Behavior - Drag" which is illegal in Stata. To be prototypical in the Stata world, any delimiters should be replaced be a single underscore and names converted to lowercase. It is fine to have "Behavior - Drag" for a variable label, but not for a variable name.