write_dta not working for data.frame which columns contain unicode characters #383
Started from v14, STATA support unicode letter appeared in column names. So the legal column names include: _, 0-9 and unicode letters (Not only latin characters).
However, the code in haven.R used to validate whether the names are legal/valid:
This is not correct, it should include another parameter: version, for version >= 14, and can use the following code for version >= 14:
However, validate_dta is not the only function to validate the column names.
The function 'dta_validate_name' in readstat_dta_write.c also check the column names. I tried to comment these lines:
It seems work, but I am not sure due to my limited experience in C.
The text was updated successfully, but these errors were encountered:
Newer DTA allows Unicode characters of the Letter character class to appear in column names. Proper validation will require some kind of Unicode library, so in the meantime just skip the check for multi- byte characters. (I.e. ASCII characters will continue to be validated) See tidyverse/haven#383