Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No longer return all-caps for contractions with missing apostrophes #40

Merged
merged 3 commits into from
Jan 9, 2021

Conversation

tvquizphd
Copy link
Contributor

@tvquizphd tvquizphd commented Jan 6, 2021

This closes issue #36

Problem addressed

In main, the suggestions for "dont" included "DON'T" instead of "don't"

The generate function gives extra weight to any string added to result multiple times.
When generate evaluates the uppercased variant of the input "dont", ie "DONT":

  • On evaluating the character N and adding the character ':
    • generate injects both ' and an uppercase '.

Since ' and uppercase ' are identical, in main the same string "DON'T" is added twice to result.

Solution proposed

In this PR, we check that the injected character is not ' (or for that matter, any punctuation character that has no uppercase) before injecting an uppercase variant.

Possible alternatives

This PR was originally simplified by writing

upper && inject !== "'"

instead of

upper && inject !== inject.toUpperCase()

In English, ' is the only "lowercase" character in the affix file's TRY key that has no uppercase, but that is not the case in other languages. In all cases, the toUpperCase() check should actually make sense, as we should never want to simply double the attempted substitutions for characters that have no uppercase. This PR should in general have no effect on caseless languages such as Persian, Hebrew, Korean, and Nepali, as upper will never be true in those languages (excepting the case where uppercase latin characters have been mixed with non-latin text).

@wooorm
Copy link
Owner

wooorm commented Jan 8, 2021

This looks great!

One thing though: could you add a test that shows that this fixes #36?

@tvquizphd
Copy link
Contributor Author

This looks great!

One thing though: could you add a test that shows that this fixes #36?

Thanks, I'll add the contraction test now!

@tvquizphd
Copy link
Contributor Author

tvquizphd commented Jan 9, 2021

Thanks, I'll add the contraction test now!

I've added a test that checks for an exact list in the results.

Perhaps the test should only check for the existence of "don't" in the results instead of doing a deep check as seems to be the convention. This is out of scope for consideration in this PR, but just as an example the upcoming PR #39 actually passes the new contraction test exactly in its current state. As a counterexample, PR #39 would fail this new test if we incorporated the four lines currently on my own fork. Of course we could always change the test to produce a different list of exact results.

@wooorm wooorm merged commit bd92048 into wooorm:main Jan 9, 2021
@wooorm
Copy link
Owner

wooorm commented Jan 9, 2021

Awesome!

@wooorm
Copy link
Owner

wooorm commented Jan 9, 2021

Of course we could always change the test to produce a different list of exact results.

Agreed!

@wooorm
Copy link
Owner

wooorm commented Jan 17, 2021

Thanks, released in 2.1.5! ✨

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants