Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fallback to RegEx based parser when using custom transformers or extractors #11335

Merged
merged 2 commits into from
Jun 1, 2023

Conversation

RobinMalfait
Copy link
Member

@RobinMalfait RobinMalfait commented Jun 1, 2023

Right now the Rust based parser can't work with custom transfomers or extractors.

In a perfect world we can implement as many custom parsers/extractors in Rust such that we don't need this at all. In an almost perfect world we can pass the transformer and extractor to the Rust based parser and call the callback functions to handle all of this. This is probably what we are going to do in the future but this requires more work to make sure that:

  1. It works as expected
  2. Doesn't result in (major) performance issues

Since it currently doesn't work with the Rust based parser, we can implement a fix for this in the meantime before we reach the "perfect" solution.

One solution to this problem is to check if we do have a custom transformer or a custom extractor and if we do, then we can bail on the Rust parser completely and just use the current regex based parser.

An alternative solution, the solution implemented here, is that we group the changedContent into 2 buckets. The bucket where we rely on the default transformer and extractor and a bucket where a custom transformer or extractor is used.

Then, the bucket where we use the default transformer and extractor can still rely on the way faster Rust based parser. For the other bucket we fallback to the regex based parser.

The nice part about this is that we can use both parsers at the same time, and the majority of the use cases should use the faster Rust based parser.

Right now the Rust based parser can't work with custom `transfomers` or
`extractors`.

In a perfect world we can implement as many custom parsers/extractors in
Rust such that we don't need this at all. In an almost perfect world we
can pass the transformer and extractor to the Rust based parser and call
the callback functions to handle all of this. This is probably what we
are going to do in the future but this requires more work to make sure
that:

1. It works as expected
2. Doesn't result in (major) performance issues

Since it currently doesn't work with the Rust based parser, we can
implement a fix for this in the meantime before we reach the "perfect"
solution.

One solution to this problem is to check if we do have a custom
transformer or a custom extractor and if we do, then we can bail on the
Rust parser completely and just use the current regex based parser.

An alternative solution, the solution implemented here, is that we group
the `changedContent` into 2 buckets. The bucket where we rely on the
default transformer and extractor and a bucket where a custom
transformer or extractor is used.

Then, the bucket where we use the default transformer and extractor can
still rely on the way faster Rust based parser. For the other bucket we
fallback to the regex based parser.

The nice part about this is that we can use both parsers at the same
time, and the majority of the use cases should use the faster Rust based
parser.
@RobinMalfait RobinMalfait force-pushed the feat/handle-transfomers-and-extractors branch from f7863d3 to cac3968 Compare June 1, 2023 20:03
@RobinMalfait RobinMalfait merged commit 5ddd9c4 into master Jun 1, 2023
@RobinMalfait RobinMalfait deleted the feat/handle-transfomers-and-extractors branch June 1, 2023 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant