No longer return all-caps for contractions with missing apostrophes #40
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This closes issue #36
Problem addressed
In
main
, the suggestions for "dont" included "DON'T" instead of "don't"The
generate
function gives extra weight to any string added toresult
multiple times.When
generate
evaluates the uppercased variant of the input "dont", ie "DONT":N
and adding the character'
:generate
injects both'
and an uppercase'
.Since
'
and uppercase'
are identical, inmain
the same string "DON'T" is added twice toresult
.Solution proposed
In this PR, we check that the injected character is not
'
(or for that matter, any punctuation character that has no uppercase) before injecting an uppercase variant.Possible alternatives
This PR was originally simplified by writing
instead of
In English,
'
is the only "lowercase" character in the affix file'sTRY
key that has no uppercase, but that is not the case in other languages. In all cases, thetoUpperCase()
check should actually make sense, as we should never want to simply double the attempted substitutions for characters that have no uppercase. This PR should in general have no effect on caseless languages such as Persian, Hebrew, Korean, and Nepali, asupper
will never be true in those languages (excepting the case where uppercase latin characters have been mixed with non-latin text).'
,-
and·
..
.-
.'
and-
.'
and.
.'
,-
,·
,0
,1
,2
,3
,4
,5
,6
, and8
.-
and·
.-
,.
,&
, and;
.'
,-
, and/
.”
,¾
,«
,0x8D , 0x85,³
, and 0x99.-
,’
,!
, and.
.-
and.
.'
,-
,’
,.
,0
,1
,2
,3
,4
,5
,6
,7
,8
, and9
.:
,-
, and.