Skip to content

Is there a problem with overlap verification in line2pairs ? #18

@Ramskii

Description

@Ramskii

In line2pairs, line 53 and 56, it checks whether the input ngram and the output ngram overlap (completely or partially).
For complete overlap, I reckon you have to check if the first token of each ngram is the same and if both ngrams are the same length. However, this last check is performed with the input_order and output_order variables, that don't represent those particular ngrams' length but the maximum ngram's length to search for in the line. For example, if you have input_order = 1, output_order = 2 and overlap = True, you will never pass the input_order = output_order check and therefore you will eventually get ngrams paired with themselves.
The same thing happens with the partial overlap check.

Shouldn't line 53 be
if i == l and j == k:
instead of
if i == l and input_order == output_order:

And line 56
if len(set(range(i, i + j)) & set(range(l, l + k))) > 0:
instead of
if len(set(range(i, i + j)) & set(range(l, l + k))) > 0:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions