Conversation
This comment has been minimized.
This comment has been minimized.
| var value_ = String(value).toLowerCase() | ||
| var alt = String(alternative).toLowerCase() |
There was a problem hiding this comment.
I’m quite sure this is making your later code never run though?
It does two things: cast to string, and lowercase.
The casting could be done when the value isn’t a string
The lowercase could be done for each bigram maybe?
There was a problem hiding this comment.
Thanks for the quick review! Ah yeah, there is a bug, but for a different reason -- the later code is run because left and right are assigned to the original input params (value and alternative), rather than value_ and alt. However, the bigram inputs wouldn't be case-insensitive. Made another commit that fixes the case sensitivity, but still probably not optimal from a readability perspective. Will try to come up with a more pleasant to read refactoring.
There was a problem hiding this comment.
Okay I think I've got a more readable solution committed.
|
|
||
| // bigrams may also be passed as input arguments for improved efficiency | ||
| // when analyzing the same strings repeatedly, for example, when | ||
| // comparing the text of each file in a directory with the text of | ||
| // each file in another directory. | ||
|
|
||
| import {bigram} from 'n-gram' | ||
|
|
||
| const bigramifiedString1 = bigram('abc') // ['ab', 'bc'] | ||
| const bigramifiedString2 = bigram('xyz') // ['xy', 'yz'] | ||
|
|
||
| diceCoefficient(bigramifiedString1, bigramifiedString2) // => 0 |
There was a problem hiding this comment.
I don‘t think this needs to be in the Use section. but it should probably be in the API section, that arrays of strings are allowed, and a note that they should be bigrams?
There was a problem hiding this comment.
Moved the explanation to another code section, if that works.
Co-authored-by: Titus <tituswormer@gmail.com>
Co-authored-by: Titus <tituswormer@gmail.com>
Co-authored-by: Titus <tituswormer@gmail.com>
|
released, thanks! |
This PR implements #22 to skip "bigram-ifying" if an input is already a bigram by checking if the input is an array.
Used nested ternaries for the logic -- would understand if you'd prefer not having those, though.