Thanks for the reply. The encoding is GB2312. I also test a version 14 dta file, same result as above.
With those three lines commented out, I could use Encoding(x)<- in R to set the correct encoding for var labels.
@hadley I've extended the C API to allow manual specification of the file encoding. The trouble is that pre-14 Stata uses the system encoding (usually Win 1252) but does not indicate what that encoding is anywhere in the file. For kicks I also allow specifying the output encoding, which defaults to UTF-8. Here's the API diff from WizardMac/ReadStat@c4e0d48:
// Usually inferred from the file, but sometimes a manual override is desirable.// In particular, pre-14 Stata uses the system encoding, which is usually Win 1252// but could be anything. `encoding' should be an iconv-compatible name.readstat_error_treadstat_set_input_character_encoding(readstat_parser_t *parser, constchar *encoding);
// Defaults to UTF-8. Pass in NULL to disable transliteration.readstat_error_treadstat_set_output_character_encoding(readstat_parser_t *parser, constchar *encoding);