Fix entity offset double-conversion bug #26

Merged
merged 6 commits into from Feb 15, 2012

Conversation

Projects
None yet
2 participants
@j3h
Contributor

j3h commented Feb 14, 2012

The entity offset conversion routines sometimes could double-convert
an entity, when its offset adjustment caused it to be examined twice.
This commit uses different routines for conversion in either
direction, that each take special care not to double-convert entities.

This bug was revealed when testing against real user data.

j3h and others added some commits Feb 14, 2012

Fix corner-case bugs in entity offset conversion
The entity offset conversion routines sometimes could double-convert
an entity, when its offset adjustment caused it to be examined twice.
This commit uses different routines for conversion in either
direction, that each take special care not to double-convert entities.
@keitaf

This comment has been minimized.

Show comment
Hide comment
@keitaf

keitaf Feb 14, 2012

Contributor

Thank you for finding the bug and submitting a bug fix.

I tried to simplify your logic by removing nested while/for loops and having a common util function for both conversions.
master...fix_utf16_index

Can you please check and see if it looks OK?

Contributor

keitaf commented Feb 14, 2012

Thank you for finding the bug and submitting a bug fix.

I tried to simplify your logic by removing nested while/for loops and having a common util function for both conversions.
master...fix_utf16_index

Can you please check and see if it looks OK?

@j3h

This comment has been minimized.

Show comment
Hide comment
@j3h

j3h Feb 14, 2012

Contributor

Looks good. As long as the list of entities is ordered by start index, this is much more efficient and easier to understand. I documented the precondition that the list of entities must be ordered, and switched back to iterating forward, since its easier to understand and using the iterator fixes the double-conversion bug.

Contributor

j3h commented Feb 14, 2012

Looks good. As long as the list of entities is ordered by start index, this is much more efficient and easier to understand. I documented the precondition that the list of entities must be ordered, and switched back to iterating forward, since its easier to understand and using the iterator fixes the double-conversion bug.

@keitaf

This comment has been minimized.

Show comment
Hide comment
@keitaf

keitaf Feb 15, 2012

Contributor

Great! Yes actually there's no reason to iterate backward now.

Maybe we can sort entities by index in convertUnicodeIndices()?

Collections.<Entity>sort(entities, new Comparator<Entity>() {
  public int compare(Entity e1, Entity e2) {
    return e1.start - e2.start;
  }
});
Contributor

keitaf commented Feb 15, 2012

Great! Yes actually there's no reason to iterate backward now.

Maybe we can sort entities by index in convertUnicodeIndices()?

Collections.<Entity>sort(entities, new Comparator<Entity>() {
  public int compare(Entity e1, Entity e2) {
    return e1.start - e2.start;
  }
});
@keitaf

This comment has been minimized.

Show comment
Hide comment
@keitaf

keitaf Feb 15, 2012

Contributor

Also I ported this onto twitter-text-js. Can you please also check this?
twitter/twitter-text-js#45

Contributor

keitaf commented Feb 15, 2012

Also I ported this onto twitter-text-js. Can you please also check this?
twitter/twitter-text-js#45

Another simplification of index conversion.
Remove the explicit iterator and lift the restriction that the
entities must be in a particular order. Also simplify logic.
@keitaf

This comment has been minimized.

Show comment
Hide comment
@keitaf

keitaf Feb 15, 2012

Contributor

LGTM! Love the Object-oriented style.

Contributor

keitaf commented Feb 15, 2012

LGTM! Love the Object-oriented style.

keitaf pushed a commit that referenced this pull request Feb 15, 2012

Keita Fujii
Merge pull request #26 from j3h/master
Fix entity offset double-conversion bug

@keitaf keitaf merged commit 0b311aa into twitter-archive:master Feb 15, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment