Skip to content

[closure segmenter] Add incremental glyph grouping to the closure segmenter merging loop. #124

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Apr 30, 2025

Conversation

garretrieger
Copy link
Contributor

This refactors the closure segmenter implmentation to allow for incremental updates on glyph groupings after applying a segment merge. Prior to this change after each merge all glyph groupings would be fully recalculated from scratch, which results in O(N^2) like runtime.

By doing incremental updates we significantly reduce the amount of work performed since only things affected by the merge are recomputed on each iteration resulting in O(N) like runtime instead.

In an example test, the total time for computing the Noto Serif SC segmentation in the IFT demo (which starts with segments of one codepoint each) went from 2m15s to 45s. In this test case profiling shows that after this change computation is now primarily bottlenecked on glyph closure and brotli compression. Further follow on work is planned which should be able to significantly reduce the number of closure and brotli operations needed.

Additionally, to help better track the flow of information through the various parts of SegmentationContext a number of classes have been introduced that encapsulate specific groups of information. The following high level information is stored in context:

  1. requested segmentation: the input segmentation in terms of codepoints.
  2. glyph closure cache: helper for computing glyph closures that caches the results.
  3. glyph condition set: per glyph what conditions activate that glyph.
  4. glyph groupings: glyphs grouped by activation conditions.

Information flows through these items:

  1. Generated from the input and later updated by merging.
  2. Generated based on 1.
  3. Generated based on 1 and 3.

These pieces all support incremental update. For example if 1. is updated we can incrementally update the down stream items 3. and 4. Only needing to recompute the parts that change as a result of the changes in 1.

@garretrieger garretrieger merged commit e064b55 into w3c:main Apr 30, 2025
3 checks passed
@garretrieger garretrieger deleted the optimize_closure_util branch May 5, 2025 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant