Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix very long string #45

Closed
wants to merge 2 commits into from
Closed

Fix very long string #45

wants to merge 2 commits into from

Conversation

mennodekker
Copy link
Contributor

I hope this fixes the very long string implementation. I had some trouble since spss seems picky about the extra variables names. there can be a clash when names conflict. I did nto address that (yet)

@@ -31,7 +31,7 @@ public function write(Buffer $buffer)
if ($this->data) {
$data = [];
foreach ($this->data as $key => $value) {
$data[] = sprintf('%s=%05d%c', $key, $value, 0);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to docs the old implementation is correct (from https://www.gnu.org/software/pspp/pspp-dev/html_node/Very-Long-String-Record.html#Very-Long-String-Record)

char string_lengths[];

A list of key–value tuples, where key is the name of a variable, and value is its length. The key field is at most 8 bytes long and must match the name of a variable which appears in the variable record (see Variable Record). The value field is exactly 5 bytes long. It is a zero-padded, ASCII-encoded string that is the length of the variable. The key and value fields are separated by a ‘=’ byte. Tuples are delimited by a two-byte sequence {00, 09}. After the last tuple, there may be a single byte 00, or {00, 09}. The total length is count bytes.

@SamMousa
Copy link

SPSS needs the segment names of variables that are segments to have a 5 character overlap:
For example a var named LONG can have a segment named LONG1 but a var named LONGER can not, it needs segments to start with LONGE.

I have discovered that collisions in a segment name are not a problem, using question names like LONGER1 and LONGER2 which lead to segment names that collide works just fine.

Deciding the primary variable name like this will make sure there are no collisions for primary variable names:

$variable->name = strtoupper(substr($var->name, 0, 5) . strtr(base64_encode($idx), ['=' => '']));

This allows us to use the base64 alphabet to encode indexes allowing us to uniquely encode 64^3 variables with a common 5 letter prefix..

@mennodekker
Copy link
Contributor Author

Closing this one since there is a better pull to fix this now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants