Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix caps, keyboard group logic #39

Merged
merged 10 commits into from
Jan 17, 2021

Conversation

tvquizphd
Copy link
Contributor

@tvquizphd tvquizphd commented Jan 3, 2021

Update: For the latest status, see my latest comment.

This closes issue #37, closes issue #41, and ignores issue #46.

This addresses a retext-spell issue I've discovered that is fundamentally an nspell issue.

Addressing a bigger problem

In main, the notion of "Keyboard groups" doesn't really make sense as it replaces at a position (of a character in group) that is unrelated to the original position (of a character in word). This solution attempts to address that "root problem."

For another solution that ignores issue #41 but solves issue #37, see my alternative PR.

This solution splits the word at the original index of the character in the input word instead of the unrelated position of the character in the keyboard group. See my review comments.

It's ready to merge!

Copy link
Contributor Author

@tvquizphd tvquizphd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change has the effect that only upper-case replacements are suggested to replace upper-case characters, and only lower-case replacements are suggested to replace lower-case characters.

@@ -77,6 +77,8 @@ function suggest(value) {

while (++index < length) {
character = value.charAt(index)
before = value.slice(0, index)
after = value.slice(index + 1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we slice the word at the index of the character in the input word. From my perspective, this more accurately simulates keyboard errors than the current approach.

@@ -90,8 +92,6 @@ function suggest(value) {
continue
}

before = value.slice(0, position)
after = value.slice(position + 1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the current approach slice the word at the position of the character in an unrelated group string such as the keyboard group "qwertzuop"?

@tvquizphd

This comment has been minimized.

@tvquizphd tvquizphd changed the title Split word character index rather than group position Split word at character index rather than group position Jan 3, 2021
@tvquizphd tvquizphd marked this pull request as draft January 3, 2021 17:20
@tvquizphd tvquizphd marked this pull request as ready for review January 6, 2021 06:17
@tvquizphd tvquizphd marked this pull request as draft January 6, 2021 06:21
@tvquizphd

This comment has been minimized.

@wooorm
Copy link
Owner

wooorm commented Jan 8, 2021

Nice! I think the new suggestions ([ 'Brandi', 'Shandy', 'handy', 'Ghana', 'Grand', 'Grandee', 'Grands', 'Shanxi', 'hand', 'hands' ] for Ghandi) might be better than what was suggested before btw, I’m fine with changing that one

@tvquizphd tvquizphd force-pushed the fix_extra_uppercase_root_problem branch 2 times, most recently from 3a26271 to bb64a97 Compare January 9, 2021 05:10
@tvquizphd
Copy link
Contributor Author

tvquizphd commented Jan 9, 2021

Update: The PR on my own fork has since been merged.

I've reworked this PR to be more efficient, differ less from main, and keep the exact behavior of the tests @wooorm looked at yesterday. Those changes have been committed to this PR, but there are a couple changes I'm hesitant to commit.

To frame the decision that remains: I have opened a PR against this PR on my own fork. It should be very quick to look at the 4 lines that PR changes in order to get a sense of why I believe that PR makes more logical sense, but will cause more breaking changes for anyone who relies on the exact ordering of the results.

Practical results

I've compared this PR against the PR on my own fork using a small test suite of single-letter errors in all 5-letter English words starting with T and ending with e. You can see the comparison here with this PR on the left and the PR on my own fork on the right.

In general, the PR on my own fork tends to favor shorter suggestions (made by removals) unless there are more specific suggestions that are actually based on keyboard groups instead of more or less at random. Here are some examples of how the top suggestions differ:

  • Given "Tepex" my own fork suggests "Telex" instead of this PR's "Tepee"
    • because "x" and "e" are not in the same keyboard group.
  • Given "Thanj" my own fork suggests "Thanh" instead of this PR's "Thane"
    • because "j" and "h" are both in the keyboard group "jhn".
  • Given "Threa" my own fork suggests "Threw" instead of this PR's "Three"
    • because "a" and "w" are both in the keyboard group "qaw".
  • Given "Threk" my own fork suggests "Shrek" instead of this PR's "Three"
    • because "k" and "e" are not in same keyboard group.
  • Given "Tildy" my own fork suggests "Tidy" instead of this PR's "Tilde"
    • because "y" and "e" are not in the same keyboard group.

Odd observation

For all the words I've tested, this PR's top suggestion always ends in "e". That's explained by the fact that the fifth letter in each word will take replacements based on the "t" in "qwertzuop", which includes an "e". It's notable that "e" is the most common letter, but this benefit only befalls the fifth letter in words containing a t!

In essence, without the two changes on my own fork, the status quo in words with a "t" is to always prefer an "e" for the fifth letter... because "T" is the fifth key on the keyboard. It's madness, but it happens to work okay.

@wooorm
Copy link
Owner

wooorm commented Jan 9, 2021

Nice, thanks for all the research!

Your extra changes on the PR to the PR also look good. I do wonder how much they’d change the tests: from your explanation in the comment above I assume they’d make more sense, in which case, I’m 👍 on it

@tvquizphd tvquizphd force-pushed the fix_extra_uppercase_root_problem branch from 160efa2 to f16a3ee Compare January 9, 2021 19:15
@tvquizphd tvquizphd force-pushed the fix_extra_uppercase_root_problem branch 3 times, most recently from f4469c2 to 13ba741 Compare January 10, 2021 02:39
test/index.js Outdated Show resolved Hide resolved
test/index.js Outdated Show resolved Hide resolved
test/index.js Outdated Show resolved Hide resolved
@tvquizphd tvquizphd force-pushed the fix_extra_uppercase_root_problem branch 2 times, most recently from 81c0f1b to 7d6a8de Compare January 10, 2021 07:28
@tvquizphd tvquizphd force-pushed the fix_extra_uppercase_root_problem branch from 7d6a8de to 830d803 Compare January 10, 2021 07:42
test/index.js Outdated Show resolved Hide resolved
lib/suggest.js Outdated Show resolved Hide resolved
lib/suggest.js Outdated Show resolved Hide resolved
@tvquizphd

This comment has been minimized.

test/index.js Outdated Show resolved Hide resolved
@tvquizphd

This comment has been minimized.

@tvquizphd

This comment has been minimized.

@tvquizphd
Copy link
Contributor Author

tvquizphd commented Jan 10, 2021

I do wonder how much they’d change the tests: from your explanation in the comment above I assume they’d make more sense, in which case, I’m 👍 on it

There are essentially two new options for how the "Ghandi" case should be handled:

  • Look at this PR:
    • ['Brandi', 'Ghana', 'Grandee', 'Shanxi', 'hand', 'hands', 'handy']
    • Three results are based on a lowercase truncation of "Ghandi" to "handi".
  • Look at PR Fix keyboard edits, fix capitalization, and update tests #45:
    • ['Brandi', 'Ghana', 'Grandee', 'Hand', 'Hands', 'Handy', 'Hanoi', 'Hindi', 'Randi', 'Shanxi']
    • Four results depend on the uppercase truncation of "Ghandi" to "Handi".

For the difference between the two options, see issue #46.

@tvquizphd
Copy link
Contributor Author

tvquizphd commented Jan 10, 2021

I'd just like to clarify that this PR is ready to merge.

@tvquizphd

This comment has been minimized.

@tvquizphd
Copy link
Contributor Author

tvquizphd commented Jan 15, 2021

I realize the new issue #46 is off-topic for this PR, so I've reverted changes that address that issue. To make this PR easier to follow, I've collapsed any off-topic/outdated comments I've made. This PR now only addresses issues #37 and #41, as originally intended.

Now it's easiest to understand this PR by looking at the files changed. Despite all the commits I've added to this PR, it only alters lib/suggest by +11/-2 lines and test/index by +13/-7 lines.

@wooorm wooorm merged commit 9c002d2 into wooorm:main Jan 17, 2021
@wooorm wooorm changed the title Split word at character index rather than group position Fix caps, keyboard group logic Jan 17, 2021
@wooorm
Copy link
Owner

wooorm commented Jan 17, 2021

Awesome, thanks!

@wooorm
Copy link
Owner

wooorm commented Jan 17, 2021

released in 2.1.5!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants