Fallback to RegEx based parser when using custom transformers or extractors #11335

RobinMalfait · 2023-06-01T20:02:49Z

Right now the Rust based parser can't work with custom transfomers or extractors.

In a perfect world we can implement as many custom parsers/extractors in Rust such that we don't need this at all. In an almost perfect world we can pass the transformer and extractor to the Rust based parser and call the callback functions to handle all of this. This is probably what we are going to do in the future but this requires more work to make sure that:

It works as expected
Doesn't result in (major) performance issues

Since it currently doesn't work with the Rust based parser, we can implement a fix for this in the meantime before we reach the "perfect" solution.

One solution to this problem is to check if we do have a custom transformer or a custom extractor and if we do, then we can bail on the Rust parser completely and just use the current regex based parser.

An alternative solution, the solution implemented here, is that we group the changedContent into 2 buckets. The bucket where we rely on the default transformer and extractor and a bucket where a custom transformer or extractor is used.

Then, the bucket where we use the default transformer and extractor can still rely on the way faster Rust based parser. For the other bucket we fallback to the regex based parser.

The nice part about this is that we can use both parsers at the same time, and the majority of the use cases should use the faster Rust based parser.

Right now the Rust based parser can't work with custom `transfomers` or `extractors`. In a perfect world we can implement as many custom parsers/extractors in Rust such that we don't need this at all. In an almost perfect world we can pass the transformer and extractor to the Rust based parser and call the callback functions to handle all of this. This is probably what we are going to do in the future but this requires more work to make sure that: 1. It works as expected 2. Doesn't result in (major) performance issues Since it currently doesn't work with the Rust based parser, we can implement a fix for this in the meantime before we reach the "perfect" solution. One solution to this problem is to check if we do have a custom transformer or a custom extractor and if we do, then we can bail on the Rust parser completely and just use the current regex based parser. An alternative solution, the solution implemented here, is that we group the `changedContent` into 2 buckets. The bucket where we rely on the default transformer and extractor and a bucket where a custom transformer or extractor is used. Then, the bucket where we use the default transformer and extractor can still rely on the way faster Rust based parser. For the other bucket we fallback to the regex based parser. The nice part about this is that we can use both parsers at the same time, and the majority of the use cases should use the faster Rust based parser.

RobinMalfait force-pushed the feat/handle-transfomers-and-extractors branch from f7863d3 to cac3968 Compare June 1, 2023 20:03

update changelog

4cc5035

RobinMalfait merged commit 5ddd9c4 into master Jun 1, 2023

RobinMalfait deleted the feat/handle-transfomers-and-extractors branch June 1, 2023 20:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fallback to RegEx based parser when using custom transformers or extractors #11335

Fallback to RegEx based parser when using custom transformers or extractors #11335

RobinMalfait commented Jun 1, 2023 •

edited

Loading

Fallback to RegEx based parser when using custom transformers or extractors #11335

Fallback to RegEx based parser when using custom transformers or extractors #11335

Conversation

RobinMalfait commented Jun 1, 2023 • edited Loading

RobinMalfait commented Jun 1, 2023 •

edited

Loading