Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast segmentation #229

Merged
merged 7 commits into from
Dec 1, 2019
Merged

Fast segmentation #229

merged 7 commits into from
Dec 1, 2019

Conversation

camertron
Copy link
Collaborator

The original regex-based segmentation implementation is horrendously slow. This PR increases it from ~200 iterations/sec for a short bit of sample text to ~23k iterations/sec. It works by way of a state table. Both the implementation and the state tables were borrowed from ICU.

@claassistantio
Copy link

claassistantio commented Dec 1, 2019

CLA assistant check
All committers have signed the CLA.

@coveralls
Copy link

coveralls commented Dec 1, 2019

Coverage Status

Coverage increased (+0.1%) to 95.647% when pulling 20a49a8 on fast_segmentation into 906aecf on master.

@camertron camertron merged commit ea16c16 into master Dec 1, 2019
@camertron camertron deleted the fast_segmentation branch December 1, 2019 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants