Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change String magic number to be case insensitive #95

Open
tzaeschke opened this issue Feb 27, 2017 · 1 comment
Open

Change String magic number to be case insensitive #95

tzaeschke opened this issue Feb 27, 2017 · 1 comment

Comments

@tzaeschke
Copy link
Owner

The String magic number (used for indexes) currently use 8 bits each and they are (thus) case sensitive.

Cutting off the first 3 bits would allow to use 9.6 instead of 6 character and would make the numbers case-insensitive. Cutting of another bit would allow 12 characters to to stored.

Unfortunately, this would also affect ordering of results...

@tzaeschke
Copy link
Owner Author

Problem: We need to combine SORTING with CASE insensitivity

Approaches:
a) Two separate indexes
b) Double entries in the same index (?)
c) Define behavior during index creation, i.e. provide an option for index behaviour
d) Combine!

Solutions:
C) No bad at all...
D) Sorting means ALPHABETICALLY sorted, so only 0..9 / A..Z need to be sorted, the rest can be arbitrarily sorted. We need to preserve ordering only for data in the ranges (char code) 48..57, 65..90 (and 97..122). Everything else can be mapped to, for example, 32..47.

Algorithm:
a) if (x <32 || x >=128) {
map to 0..15 // x &= 0x3F
add 32
} else if (x >= 96) {
sub 32
}
b) sub 32 //move everything into 0..64
c) x <<= 2 //compress to 6 bit

This is case-insensitive, preserves the ordering and compresses the data to 6 bits, allowing 8 characters in the magic number. Obviously this approach assumes that Strings consist mostly of letters and numbers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant