RegularParser block tokens #61

thunderer · 2018-01-30T21:03:07Z

Tokenizer in RegularParser broke down the text into single characters or words (with some exceptions) regardless of the token type. This PR improves the tokenizer regex to report the text fragments without syntax tokens as single string tokens which should improve performance and reduce idle string joining operations in the parsing process, especially in real-world use cases with long texts with few shortcodes inside.

…ingle string instead of each character separately

thunderer added the patch label Jan 30, 2018

thunderer added this to the 0.6 milestone Jan 30, 2018

thunderer self-assigned this Jan 30, 2018

thunderer force-pushed the regular-parser-block-tokens branch from 2fff329 to fdc6ce8 Compare January 30, 2018 21:03

improved RegularParser tokenizer to return non-functional tokens in s…

6234af9

…ingle string instead of each character separately

thunderer force-pushed the regular-parser-block-tokens branch from fdc6ce8 to 6234af9 Compare February 1, 2018 22:27

thunderer merged commit a699886 into master Feb 1, 2018

thunderer deleted the regular-parser-block-tokens branch February 1, 2018 22:35

thunderer mentioned this pull request Feb 1, 2018

Shortcodes ignored with multiple nested levels #47

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RegularParser block tokens #61

RegularParser block tokens #61

thunderer commented Jan 30, 2018 •

edited

Loading

RegularParser block tokens #61

RegularParser block tokens #61

Conversation

thunderer commented Jan 30, 2018 • edited Loading

thunderer commented Jan 30, 2018 •

edited

Loading