Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect order with diacritics #34

Closed
cslee opened this issue Jan 18, 2019 · 2 comments · Fixed by #35
Closed

Incorrect order with diacritics #34

cslee opened this issue Jan 18, 2019 · 2 comments · Fixed by #35
Labels
bug Something isn't working discussion

Comments

@cslee
Copy link

cslee commented Jan 18, 2019

The order is incorrect when the string contains diacritics

> orderBy(['b', 'd', 'f', 'A', 'C', 'E'])
[ 'A', 'C', 'E', 'b', 'd', 'f' ]
// correct

> orderBy(['b', 'd', 'f', 'A', 'Cé', 'E'])
[ 'A', 'b', 'Cé', 'E', 'd', 'f' ]
// incorrect
@yobacca yobacca added bug Something isn't working discussion and removed bug Something isn't working labels Jan 18, 2019
@yobacca
Copy link
Owner

yobacca commented Jan 19, 2019

Thanks for posting this issue!

The reason is that this library compares strings without unicode characters differently than strings with unicode characters:

  • If two strings do not contain unicode characters, they will be compared based on their corresponding ascii decimal value. Therefore upper case letters are sorted before lower case letters.
['b', 'C', 'A', 'c', 'a', 'B'].sort((a, b) => {
  if (a < b) {
    return -1;
  }
  if (a > b) {
    return 1;
  }
  return 0;
})
// ['A', 'B', 'C', 'a', 'b', 'c']
  • If two strings do contain unicode characters, they will be compared using localeCompare. This method is being aware of the relation of small and capital letters.
['b', 'C', 'A', 'c', 'a', 'B'].sort((a, b) => a.localeCompare(b))
// ['a', 'A', 'b', 'B', 'c', 'C']

If an array contains both strings with unicode characters and strings without unicode characters, the sorting is done partly based on the ascii decimal values and partly using localeCompare. That´s why the output of your second example looks weird.

This issue made me think about natural sorting of strings. For humans, the strings "name", "NAME", "nAMe" and "Name" differ only in upper and lower case, otherwise they are the same. This interpretation is based on a cultural convention, namely the knowledge of the relationship between capital and small letters.

So the natural order of your examples should be as follows:

['b', 'd', 'f', 'A', 'C', 'E']
=> [ 'A', 'b', 'C', 'd', 'E', 'f' ]

['b', 'd', 'f', 'A', 'Cé', 'E']
=> [ 'A', 'b', 'Cé', 'd', 'E', 'f' ]

From this point of view, strings should generally be compared in a case insensitive manner like localeCompare does and not based on their ascii decimal value. I would therefore suggest to change this library accordingly and remove the case sensitive option as case sensitive sorting based on ascii decimal values is not related to natural sorting.

@cslee
Copy link
Author

cslee commented Jan 19, 2019

Thanks for your detailed explanation.

It makes sense that the letter case is not important in natural sorting. That's fine with me if the case sensitive option is removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working discussion
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants