Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chinese is mojibake in the generated file #20

Closed
zhengyuhe123 opened this issue Mar 1, 2019 · 6 comments
Closed

chinese is mojibake in the generated file #20

zhengyuhe123 opened this issue Mar 1, 2019 · 6 comments

Comments

@zhengyuhe123
Copy link

When I set valuelabel, variablelable and data in Chinese, the value of the field in the sav file is Mojibake.
image

image

@zhengyuhe123
Copy link
Author

$writer = new \SPSS\Sav\Writer([ 'header' => [ 'prodName' => '@(#) IBM SPSS STATISTICS 64-bit Macintosh 23.0.0.0', 'creationDate' => '05 Oct 18', 'creationTime' => '01:36:53', 'weightIndex' => 0, ], 'variables' => [ [ 'name' => 'test1', 'format' => Variable::FORMAT_TYPE_F, 'width' => 4, 'decimals' => 2, 'label' => 'test', 'values' => [ 1 => '1测试中文标签1', 2 => '2测试中文标签2', ], // 'missing' => [], 'columns' => 5, 'alignment' => Variable::ALIGN_RIGHT, 'measure' => Variable::MEASURE_SCALE, 'attributes' => [ '$@Role' => Variable::ROLE_PARTITION, ], 'data' => [1, 1, 1], ], [ 'name' => 'test2', 'format' => Variable::FORMAT_TYPE_A, 'width' => 100, 'label' => 'test', 'columns' => 100, 'alignment' => Variable::ALIGN_LEFT, 'measure' => Variable::MEASURE_NOMINAL, 'attributes' => [ '$@Role' => Variable::ROLE_SPLIT, ], 'data' => ['测试中文数据1', '测试中文数据2', '测试中文数据3'], ], ], ]);

@tiamo
Copy link
Owner

tiamo commented Mar 1, 2019

I'll check it as soon as possible.

@zhengyuhe123
Copy link
Author

I think the problem is on line 249.

for ($i = $segWidth; $i > 0; $i -= 8, $offset += 8) {
// $chunkSize = min($i, 8);
$val = mb_substr($value, $offset, 8);
if ($val == "") {
$this->writeOpcode($buffer, $dataBuffer, self::OPCODE_WHITESPACES);
} else {
$this->writeOpcode($buffer, $dataBuffer, self::OPCODE_RAW_DATA);
$dataBuffer->writeString($val, 8);
}
}

@mennodekker
Copy link
Contributor

Line 249:
$val = substr($value, $offset, 8);

That seems to do the trick for me. At least in spss the data looks the same. This probably goes for other parts of the code as well. The catch is that we do not chop chars that can be 1, 2 or 3 bytes but really segments of 8 bytes. I also tried mb_strcut but ended up having spaces between the segments.

@mennodekker
Copy link
Contributor

chinese

@mennodekker
Copy link
Contributor

I will push a fix with unit test later

mennodekker added a commit to mennodekker/spss that referenced this issue Mar 5, 2019
@tiamo tiamo closed this as completed in f5b3e55 Mar 5, 2019
tiamo added a commit that referenced this issue Mar 5, 2019
Fix for #20 read/write multibyte chars in free text
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants