-
Notifications
You must be signed in to change notification settings - Fork 172
Description
In line2pairs, line 53 and 56, it checks whether the input ngram and the output ngram overlap (completely or partially).
For complete overlap, I reckon you have to check if the first token of each ngram is the same and if both ngrams are the same length. However, this last check is performed with the input_order and output_order variables, that don't represent those particular ngrams' length but the maximum ngram's length to search for in the line. For example, if you have input_order = 1, output_order = 2 and overlap = True, you will never pass the input_order = output_order check and therefore you will eventually get ngrams paired with themselves.
The same thing happens with the partial overlap check.
Shouldn't line 53 be
if i == l and j == k:
instead of
if i == l and input_order == output_order:
And line 56
if len(set(range(i, i + j)) & set(range(l, l + k))) > 0:
instead of
if len(set(range(i, i + j)) & set(range(l, l + k))) > 0: